Shared Integration Runtime in Azure Data Factory

Azure Data Factory is an ETL and orchestrator tool for building cloud-native data engineering pipelines. It has a lot of source connectors available, and this list is overgrowing. Azure Data Factory uses Integration Runtimes as a computing infrastructure to execute data movement and transformation activities. This article describes how to set up self-hosted shared integration runtime in Azure Data Factory.

What is Integration Runtime?

Integration Runtime (IR) in Azure Data Factory provides a computational infrastructure for pipelines. It is also instrumental in securing data movement in the cloud. Read more about it here.

It consists of three types of Integration runtimes:

  1. Azure: This is the default one that Azure fully manages. Suitable to connect to Azure resources. Since Azure manages it, the data movement is in the public network.
  2. Self-hosted: This IR can connect to on-prem sources and secure data movement between the systems. This IR supports the data movement in the private network.
  3. Azure-SSIS: This is used for running SSIS packages in Azure Data Factory

Let’s see it in Action

This section describes how to set up self-hosted shared integration runtime in Azure Data Factory.

Pre-requisites:
1. Azure subscription and a resource group with Azure Data Factory.
2. Self Hosted Integration runtime setup in Azure Data Factory.

  1. Select the integration runtime that needs to be shared.

Select the integration runtime that must be shared with other Azure Data Factories and edit.

Shared Integration Runtime in Azure Data Factory

Copy the ResourceID and save it. We will use it in our next steps.

Click on Grant Permission to another Data Factory or user-assigned managed identity button, as shown in the above image.

2. Grant permission to target Azure data factory:

Grant access to target Azure Data Factory which will use shared integration runtime.

3. Now, go to the Target Data Factory and add an Integration Runtime

Click New and select Azure, self-hosted

Under External Resources, select Linked Self-Hosted and Click Continue.

It will ask few details about the shared IR.

Name: put the name of the new IR.

Resource ID: copy and paste the resource ID we have previously saved and click on Create.

and we are done!

Now we are using the same IR in 2 different azure data factory instances.

Pro tips:
1. Please make sure you have Microsoft.Authorization/roleAssignments/write permission on subscription.
2. Self-hosted IR, requires a virtual machine and Integration runtime software. This means we need to pay the cost of the hosted virtual machine. By sharing the IR between the multiple Data Factory instances, we can use the same virtual machine amongst multiple Data Factories.

See more

Pavan Bangad

9+ years of experience in building data warehouse and big data application.
Helping customers in their digital transformation journey in cloud.
Passionate about data engineering.