Databricks is a unified big data processing and analytics cloud platform that transforms and processes huge volumes of data. Apache Spark is the building block of Databricks, an in-memory analytics engine for big data and machine learning. Databricks can connect to various sources for data ingestion. This article will show how to mount and unmount data lake in Databricks.
Pre-requisites:
To mount a location, you would need the following:
1. Databricks service in Azure, GCP, or AWS cloud.
2. A Databricks cluster.
3. A basic understanding of Databricks and how to create notebooks.
What is Mounting in Databricks?
Mounting object storage to DBFS allows easy access to object storage as if they were on the local file system. Once a location e.g., blob storage or Amazon S3 bucket is mounted, we can use the same mount location to access the external drive.
Generally, we use dbutils.fs.mount
() command to mount a location in Databricks.
How to mount a data lake in Databricks?
Let us now see how to mount Azure data lake gen2 in Databricks.
First thing first, let’s create blob storage and container. Blob storage should look like in the below image.
New Container should look like in the below image.
To mount an ADLS gen2 we will need the below details to connect to a location.
ContainerName = "yourcontainerName"
azure_blobstorage_name = "blobstoragename"
mountpointname = "/mnt/azureops"
secret_key ="xxxxxxxxxxx"
Once we have this information, we can use below code snippet to connect the data lake with Databricks.
dbutils.fs.mount(source = f"wasbs://{ContainerName}@{azure_blobstorage_name}.blob.core.windows.net",mount_point = Mountpointname ,extra_configs = {"fs.azure.account.key."+azure_blobstorage_name+".blob.core.windows.net":secret_key})
How to check all the mount points in Databricks?
dbutils.fs.mounts()
How to unmount a location?
dbutils.fs.unmount(mount_point)
Let’s use all the above commands in action.
The objective is to add a mount point if it does not exist.
if all(mount.mountPoint != archival_mount_name for mount in dbutils.fs.mounts()):
dbutils.fs.mount(source = f"wasbs://{ContainerName}@{azure_blobstorage_name}.blob.core.windows.net",mount_point = Mountpointname ,extra_configs = {"fs.azure.account.key."+azure_blobstorage_name+".blob.core.windows.net":archival_secret_key})
Pro tips:
1. Instead of using a storage account key, we can also mount a location using a SAS token URL or service principal
2. Databricks provide a free community version where you can learn and explore Databricks. You can signup here.
3. If you’re aiming to obtain the Databricks certified Data Engineer Associate certification, take a look at these helpful tips.
Notebook Reference