Mount and Unmount Data Lake in Databricks

Mount and unmount data lake in Databricks

Databricks is a unified big data processing and analytics cloud platform that is used to transform and process huge volumes of data. Apache Spark is the building block of Databricks which is an in-memory analytics engine for big data and machine learning. Databricks can connect to various sources for data ingestion. In this article, we will see how to mount and unmount data lake in Databricks.

Pre-requisites:
To mount a location, you would need:
1. Databricks service in Azure, GCP, or AWS cloud.
2. A Databricks cluster.
3. A basic understanding of Databricks and how to create notebooks.

What is Mounting in Databricks?

Mounting object storage to DBFS allows easy access to object storage as if they were on the local file system. Once a location e.g. blob storage or Amazon S3 bucket is mounted, we can use the same mount location to access the external drive.

Generally, we use dbutils.fs.mount() command to mount a location in Databricks.

How to mount a data lake in Databricks?

Let us now see how to mount Azure data lake gen2 in Databricks.

First thing first, let’s create blob storage and container. Blob storage should look like in the below image.

New Container should look like in the below image.

To mount an ADLS gen2 we will need the below details to connect to a location.

ContainerName = "yourcontainerName"
azure_blobstorage_name = "blobstoragename"
mountpointname = "/mnt/azureops"
secret_key ="xxxxxxxxxxx"

Once we have this information, we can use below code snippet to connect the data lake with Databricks.

dbutils.fs.mount(source = f"wasbs://{ContainerName}@{azure_blobstorage_name}.blob.core.windows.net",mount_point = Mountpointname ,extra_configs = {"fs.azure.account.key."+azure_blobstorage_name+".blob.core.windows.net":secret_key})
Mount and unmount data lake in Databricks

How to check all the mount points in Databricks?

dbutils.fs.mounts()

How to unmount a location?

dbutils.fs.unmount(mount_point)

Let’s use all the above commands in action.

The objective is to add a mount point if it does not exist already.

if all(mount.mountPoint != archival_mount_name for mount in dbutils.fs.mounts()):
     dbutils.fs.mount(source = f"wasbs://{ContainerName}@{azure_blobstorage_name}.blob.core.windows.net",mount_point = Mountpointname ,extra_configs = {"fs.azure.account.key."+azure_blobstorage_name+".blob.core.windows.net":archival_secret_key})

Pro tips:
1. Instead of using a storage account key we can also mount a location using a SAS token URL or service principal
2. Databricks provide a free community version where you can learn and explore Databricks. you can signup here.

Notebook Reference

Pavan Bangad

9+ years of experience in building data warehouse and big data application.
Helping customers in their digital transformation journey in cloud.
Passionate about data engineering.