Recently, Microsoft announced that they now support automated Azure Data Factory publish using the ADFUtilities NPM (@microsoft/azure-data-factory-utilities) library. This library provides basic functionality to validate and generate an ARM template given a set of Data Factory resources. This article describes how to implement Azure Data Factory Deployment using Azure DevOps and ADFUtilities NPM library.
Background
In Azure Data Factory, continuous integration and continuous delivery (CI/CD) mean moving Data Factory artifacts from one environment, such as dev, test, production, etc. We have been using the Data Factory CI/CD process with the adf_publish branch. The main issue with this approach is that the developer must manually click the publish button on the Data Factory UI master branch to deploy the code to the dev Data Factory. This generates the ARM Template artifacts in the adf_publish branch for higher environment deployments.
With the Automated publish improvement, we can create an ARM Template artifact using the Azure DevOps build pipeline with a trigger of master branch update (pull request). Deploying these artifacts to the dev and other environments can then be managed with the release pipeline (CD).
Microsoft will support both Azure Data Factory CI CD flows for Data Factory deployments, So below are some of the key differences between the old and the new approach.
Old CI/CD flow | New CI/CD flow | |
---|---|---|
Code validation and publish to dev | manual | using the CI/CD pipeline with the ADFUtilities NPM package. |
ARM template generation | once code is published to dev Data Factory manually from the master branch. | After the successful execution of the build pipeline. |
Release pipeline source | adf_publish branch artifacts. | build pipeline artifacts |
Build pipeline (CI)
Pre-requisites:
1. Azure DevOps Account.
2. Data Factory in a dev environment with Azure Repos Git integration.
Before we start with the build pipeline, we need to create a file named package.json
in the master branch of the Data Factory repository and copy the code below.
{
"scripts":{
"build":"node node_modules/@microsoft/azure-data-factory-utilities/lib/index"
},
"dependencies":{
"@microsoft/azure-data-factory-utilities":"^0.1.3"
}
}
The file should look like the below image.
Follow the below steps to create CI (build) pipeline for automated Azure Data Factory publish.
1. Create a new build pipeline in the Azure DevOps project.
2. Select Azure Repos Git as your code repository.
3. From the Azure Repos, select the repo that contains the Data Factory code. This is the repository where you have Data Factory DevOps integration.
4. Select Start Pipeline as your build pipeline type.
5. Replace the default YAML code with the below code. You would also make some changes to the code: replace <subscription-id> with your subscriptionid, <resourcegroup-name> with the resource group name, and <dev data factory name> with Dev Data Factory name.
trigger:
- master #change master to something else if master is not your main branch
pool:
vmImage: 'ubuntu-latest'
steps:
- task: NodeTool@0
inputs:
versionSpec: '18.x'
displayName: 'Install Node.js'
- task: Npm@1
inputs:
command: 'install'
verbose: true
displayName: 'Install npm package'
- task: Npm@1
inputs:
command: 'custom'
customCommand: 'run build validate $(Build.Repository.LocalPath) /subscriptions/<subscription-id>/resourceGroups/<resourcegroup-name>/providers/Microsoft.DataFactory/factories/<dev data factory name>'
displayName: 'Validate'
# Validate and then generate the ARM template into the destination folder. Same as clicking "Publish" from UX
# The ARM template generated is not published to the ‘Live’ version of the factory. Deployment should be done using a release pipeline.
- task: Npm@1
inputs:
command: 'custom'
customCommand: 'run build export $(Build.Repository.LocalPath) /subscriptions/<subscription-id>/resourceGroups/<resourcegroup-name>/providers/Microsoft.DataFactory/factories/<dev data factory name> "ArmTemplate"'
displayName: 'Validate and Generate ARM template'
# Publish the Artifact to be used as a source for a release pipeline
- task: PublishPipelineArtifact@1
inputs:
targetPath: '$(Build.Repository.LocalPath)/ArmTemplate'
artifact: 'ArmTemplates'
publishLocation: 'pipeline'
6. Click Save.
The outcome of the above build process will generate the ARM Template as a linked artifact. Note here that the source of this artifact is the Data Factory master branch.
Now that we have the build pipeline setup, let us create a release pipeline to deploy code to the dev Data Factory. In this approach, the release pipeline for Data Factory utilizes the build pipeline linked artifact instead of the adf_publish branch.
Release pipeline (CD)
Pre-requisites:
1. Service connection to dev Azure resource group.
Follow the below steps to create a release pipeline for Data Factory build deployment.
- Select Pipelines, and then select Releases from the left side of the Azure DevOps.
- Click New release pipeline under New dropdown. Select the Empty job template.
- Provide Stage name. This generally denotes a dev environment.
- Click Add an artifact and then Build. And select the appropriate Project & Source from the drop-down. Source alias can be the default (typically _name_of_the_build).
- Click on the Stage (dev) box hyperlink. Add a new task. Search for ARM Template Deployment.
6. Selec Service connection to dev Azure resource group in Azure resource manager connection. Provide relevant Subscription, Resource group & Location. Select Template location as Linked artifact
. Browse ARMTemplateForFactory.json file
in Template, and ARMTemplateParametersForFactory.json
in Template parameters. In the case of Overwrite template parameters, we need to manually provide all parameter names and values from ARMTemplateForFactory.json file. This is because build artifacts are generated at runtime. Give a suitable Deployment name, for example, ‘adf_deployment’. Click Save.
7. You can also enable a continuous deployment trigger from the icon on the Artifacts box.
Handle ADF triggers
Refer to this article to disable and enable triggers in the data factory deployment pipeline.
Pro tips:
1. In the real world, you must set Data Factory and global parameters in a release pipeline. Setting up global parameters differs slightly in the ADFUtilities NPM package-based deployment. Read more about it here.
2. Similar to dev Data Factory deployment, we can add more stages for higher environments like test, production, etc. deployment.
3. If you get an error “Resource <adf_name> was disallowed by policy. Error Type: PolicyViolation, Policy Definition Name : Restrict resources without specific tag, Policy Assignment Name “. Your organization has defined a policy to disallow resource creation in Azure without specific tags. In such a case, Azure DevOps CI-CD pipeline cannot be used for ADF deployment. As per Microsoft docs, we need to use the Export and Import Data factory approach.
4. If you wish to export and import data factories between resource groups, refer to this article.
5. Refer to this post to know how to implement CICD pipeline for Azure SQL Database deployments.
See more
Kunal Rathi
With over 13 years of experience in data engineering and analytics, I've assisted countless clients in gaining valuable insights from their data. As a dedicated supporter of Data, Cloud and DevOps, I'm excited to connect with individuals who share my passion for this field. If my work resonates with you, we can talk and collaborate.
Hi Kunal, I’ve been putting your blog post to use and I am missing something. I am trying to find the folder that is used for the build because it contains a prepostdeployment script. I can however, not find this anywhere in the repository that I linked. I wanted to add the scripts that MS put on their documentation page to disable and restart the triggers. Mind you I have 0 experience with YAML
Hi Kage,
Below is the script you can use to disable ADF triggers. Use Azure PowerShell task for this.
$triggersADF = Get-AzDataFactoryV2Trigger -DataFactoryName $(FactoryName) -ResourceGroupName $(ResourceGroupName)
$triggersADF | ForEach-Object { Stop-AzDataFactoryV2Trigger -ResourceGroupName $(ResourceGroupName) -DataFactoryName $(FactoryName) -Name $_.name -Force }
Script to enable ADF triggers.Use Azure PowerShell task for this.
$triggersADF = Get-AzDataFactoryV2Trigger -DataFactoryName $(FactoryName) -ResourceGroupName $(ResourceGroupName)
$triggersADF | ForEach-Object { Start-AzDataFactoryV2Trigger -ResourceGroupName $(ResourceGroupName) -DataFactoryName $(FactoryName) -Name $_.name -Force }
Make sure to provide variable names to above script.
Hope this helps.
Hi Kunal,
I guess I’m doing something wrong, because I run into some errors
PublishConfigService: _getLatestPublishConfig – retrieving config file.
LocalFileClientService: Unable to list files for: integrationRuntime, error: Error: ENOENT: no such file or directory, scandir ‘/home/vsts/work/1/s/integrationRuntime’
LocalFileClientService: Unable to list files for: dataset, error: Error: ENOENT: no such file or directory, scandir ‘/home/vsts/work/1/s/dataset’
LocalFileClientService: Unable to list files for: trigger, error: Error: ENOENT: no such file or directory, scandir ‘/home/vsts/work/1/s/trigger’
LocalFileClientService: Unable to list files for: dataflow, error: Error: ENOENT: no such file or directory, scandir ‘/home/vsts/work/1/s/dataflow’
LocalFileClientService: Unable to list files for: credential, error: Error: ENOENT: no such file or directory, scandir ‘/home/vsts/work/1/s/credential’
LocalFileClientService: Unable to list files for: managedVirtualNetwork, error: Error: ENOENT: no such file or directory, scandir ‘/home/vsts/work/1/s/managedVirtualNetwork’
ERROR === LocalFileClientService: Unable to read file: /home/vsts/work/1/s/arm-template-parameters-definition.json, error: {“stack”:”Error: ENOENT: no such file or directory, open ‘/home/vsts/work/1/s/arm-template-parameters-definition.json'”,”message”:”ENOENT: no such file or directory, open ‘/home/vsts/work/1/s/arm-template-parameters-definition.json'”,”errno”:-2,”code”:”ENOENT”,”syscall”:”open”,”path”:”/home/vsts/work/1/s/arm-template-parameters-definition.json”}
WARNING === ArmTemplateUtils: _getUserParameterDefinitionJson – Unable to load custom param file from repo, will use default file. Error: {“stack”:”Error: ENOENT: no such file or directory, open ‘/home/vsts/work/1/s/arm-template-parameters-definition.json'”,”message”:”ENOENT: no such file or directory, open ‘/home/vsts/work/1/s/arm-template-parameters-definition.json'”,”errno”:-2,”code”:”ENOENT”,”syscall”:”open”,”path”:”/home/vsts/work/1/s/arm-template-parameters-definition.json”}
Are they familiar to you and could you provide a possible solution?
Kind Regards,
Dickkieee
Great article! I just found this after figuring out a similar approach myself, and this describes it well!
Currently I’m trying to figure out how to update global parameters using the DevOps pipeline for different environments and am looking forwards to your article on that.
I see a global parameters file included as part of the NPM-generated ARM template, though I’m not sure how to use a pipeline to override a global parameter name in the xxxGlobalParameters.json file.