Mount and Unmount Data Lake in Databricks

Databricks is a unified big data processing and analytics cloud platform that transforms and processes huge volumes of data. Apache Spark is the building block of Databricks, an in-memory analytics engine for big data and machine learning. Databricks can connect to various sources for data ingestion. This article will show how to mount and unmount data lake in Databricks.

Pre-requisites:
To mount a location, you would need the following:
1. Databricks service in Azure, GCP, or AWS cloud.
2. A Databricks cluster.
3. A basic understanding of Databricks and how to create notebooks.

What is Mounting in Databricks?

Mounting object storage to DBFS allows easy access to object storage as if they were on the local file system. Once a location e.g., blob storage or Amazon S3 bucket is mounted, we can use the same mount location to access the external drive.

Generally, we use dbutils.fs.mount() command to mount a location in Databricks.

How to mount a data lake in Databricks?

Let us now see how to mount Azure data lake gen2 in Databricks.

First thing first, let’s create blob storage and container. Blob storage should look like in the below image.

New Container should look like in the below image.

To mount an ADLS gen2 we will need the below details to connect to a location.

ContainerName = "yourcontainerName"
azure_blobstorage_name = "blobstoragename"
mountpointname = "/mnt/azureops"
secret_key ="xxxxxxxxxxx"

Once we have this information, we can use below code snippet to connect the data lake with Databricks.

dbutils.fs.mount(source = f"wasbs://{ContainerName}@{azure_blobstorage_name}.blob.core.windows.net",mount_point = Mountpointname ,extra_configs = {"fs.azure.account.key."+azure_blobstorage_name+".blob.core.windows.net":secret_key})

Mount and unmount data lake in Databricks

How to check all the mount points in Databricks?

dbutils.fs.mounts()

How to unmount a location?

dbutils.fs.unmount(mount_point)

Let’s use all the above commands in action.

The objective is to add a mount point if it does not exist.

if all(mount.mountPoint != archival_mount_name for mount in dbutils.fs.mounts()):
     dbutils.fs.mount(source = f"wasbs://{ContainerName}@{azure_blobstorage_name}.blob.core.windows.net",mount_point = Mountpointname ,extra_configs = {"fs.azure.account.key."+azure_blobstorage_name+".blob.core.windows.net":archival_secret_key})

Pro tips:
1. Instead of using a storage account key, we can also mount a location using a SAS token URL or service principal
2. Databricks provide a free community version where you can learn and explore Databricks. You can signup here.
3. If you’re aiming to obtain the Databricks certified Data Engineer Associate certification, take a look at these helpful tips.

Notebook Reference

mount_unmount Download

See more

Download Now

Pavan Bangad

9+ years of experience in building data warehouse and big data application.
Helping customers in their digital transformation journey in cloud.
Passionate about data engineering.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.