Azure Data Factory is an ETL and orchestrator tool for building cloud-native data engineering pipelines. It has a lot of source connectors available and this list is overgrowing. Azure Data Factory uses Integration Runtimes as a compute infrastructure to execute data movement and data transformation activities. This article describes how to set up self-hosted shared integration runtime in Azure Data Factory.
What is Integration Runtime?
Integration Runtime (IR) in Azure Data Factory provides computational infrastructure for pipelines to run. It is also instrumental to secure data movement in the cloud. Read more about it here.
It consists of three types of Integration runtimes:
- Azure: This is the default one that is fully managed by Azure. Suitable to connect to Azure resources. Since it is managed by Azure, the data movement is in the public network.
- Self-hosted: This IR can be used to connect to on-prem sources as well as secure data movement between the systems. This IR supports the data movement in the private network.
- Azure-SSIS: This is used for running SSIS packages in Azure Data Factory
Let’s see it in Action
This section describes how to set up self-hosted shared integration runtime in Azure Data Factory.
1. Azure subscription and a resource group with Azure Data Factory.
2. Self Hosted Integration runtime setup in Azure Data Factory.
- Select the integration runtime that needs to be shared.
Select the integration runtime which needs to be shared with other Azure Data Factories and edit.
Copy the ResourceID and save it. We will use in our next steps.
Click on Grant Permission to another Data Factory or user-assigned managed identity button as shown in the above image.
2. Grant permission to target Azure data factory:
Grant access to target Azure Data Factory which will use shared integration runtime.
3. Now, go to the Target Data Factory and add an Integration Runtime
Click New and select Azure, self-hosted
Under External Resources, select Linked Self-Hosted and Click Continue.
It will ask few details about the shared IR.
Name: put the name of the new IR.
Resource ID: copy and paste the resource ID which we have saved previously and click on Create.
and we are done!
Now we are using the same IR in 2 different azure data factory instances.
1. Please make sure you have Microsoft.Authorization/roleAssignments/write permission on subscription.
2. Self-hosted IR, requires a virtual machine and Integration runtime software. This means we need to pay the cost of the hosted virtual machine. By sharing the IR between the multiple Data Factory instances, we can use the same virtual machine amongst multiple Data Factories.