Call a notebook from another notebook in Databricks

Databricks is a unified big data processing and analytics cloud platform that is used to transform and process huge volumes of data. Apache Spark is the building block of Databricks which is an in-memory analytics engine for big data and machine learning. In this article, we will see how to call a notebook from another notebook in Databricks and how to manage the execution context of a notebook.

What is Databricks notebook and execution context?

Notebooks in Databricks are used to write spark code to process and transform data. Notebooks support Python, Scala, SQL, and R languages.

Whenever we execute a notebook in Databricks, it attaches a cluster (computation resource) to the notebook and creates an execution context.

If you want to run Databricks notebook inside another notebook, you would need:
1. Databricks service in Azure, GCP, or AWS cloud.
2. A Databricks cluster.
3. A basic understanding of Databricks and how to create notebooks.

Methods to call a notebook from another notebook in Databricks

There are 2 methods to run a Databricks notebook inside another Databricks notebook.

1. Using the %run command :

%run command invokes the notebook in the same notebook context meaning any variable or function declared in the parent notebook can use in the child notebook.

The sample command would look like the below.

%run [notebook path] $paramter1="Value1" $paramterN="valueN"
Call a notebook from another notebook in Databricks using %run method
Example – Use %run function to call a notebook inside another notebook.

This method is suitable if you want to define a notebook that holds all the constant variables or a centralized shared function library. And you want to refer to them in the calling or child notebook.

Hey, what if we just need to execute the child’s notebook in a different notebook context? The next method describes how to achieve this.

2. Using the function :

This function will run the notebook in a new notebook context.

The syntax of this function is, timeout_in_seconds, parameters)


Notebook_path -> path of the target notebook.
Timeout_in_seconds – > the notebook will throw an exception if it is not completed in specified time
parameters – > Used to send parameters to child notebook. Parameters should be specified in json format e.g. {‘paramter1’: ‘value1’, ‘paramter2’: ‘value2’}

Call a notebook from another notebook in Databricks using the function
Example – Use to call a notebook inside another notebook.

We can call the N numbers of the notebook by calling this function in the parent notebook

This will run all the notebooks sequentially.

Run Databricks notebooks in parallel

If you wish to run multiple Databricks notebooks in parallel, you can ThreadPoolExecutor use the python library. This library is useful to create multiple threads which can run notebooks in parallel.

Import the library as follows:

from concurrent.futures import ThreadPoolExecutor

You can read more about ThreadPoolExecutor here. and here is a sample code that code explorer already wrote for running the notebook in parallel.

Attaching the same notebook used in this blog:

Pro tips:
1. We can use Azure data factory for running notebooks in parallel. Refer to this post to learn more.
2. With both the above methods we can only pass string parameters to the child notebook. Objects are not allowed.
3. Databricks provide a free community version where you can learn and explore Databricks. you can signup here.

9+ years of experience in building data warehouse and big data application.
Helping customers in their digital transformation journey in cloud.
Passionate about data engineering.