3

Automated Azure Data Factory publish with DevOps CI/CD pipeline

automated-adf publish with azure devops cicd pipeline
5
(1)

Recently, Microsoft has announced that they now support automated Azure Data Factory publish using the ADFUtilities NPM (@microsoft/azure-data-factory-utilities) library. This library provides basic functionality to validate and generate an ARM template given a set of Data Factory resources.

Background

In Azure Data Factory, continuous integration and continuous delivery (CI/CD) means moving Data Factory artifacts from one environment, such as dev to test, production, etc. We have been using the Data Factory CI/CD process with the adf_publish branch. The main issue with this approach is that the developer has to manually click on publish button on the Data Factory UI master branch to deploy the code to the dev Data Factory. This then generates the ARM Template artifacts in the adf_publish branch for higher environment deployments.

With the Automated publish improvement, we can create an ARM Template artifact using the Azure DevOps build pipeline with a trigger of master branch update (pull request). Deployment of these artifacts to the dev and other environments can then be managed with the release pipeline (CD).

Microsoft will support both the CI/CD flows for Data Factory deployments, So, below are some of the key differences between the old and the new approach.

Old CI/CD flowNew CI/CD flow
Code validation and publish to devmanualusing the CI/CD pipeline with the ADFUtilities NPM package.
ARM template generation once code is published to dev Data Factory manually from the master branch.After the successful execution of the build pipeline.
Release pipeline sourceadf_publish branch artifacts.build pipeline artifacts

The Build Pipeline (CI)

Prerequisite:

  1. Azure DevOps Account.
  2. Data Factory in a dev environment with Azure Repos Git integration.

Before we start with the build pipeline, we need to create a file named package.json in the master branch of Data Factory repository and copy the below code.

{
    "scripts":{
        "build":"node node_modules/@microsoft/azure-data-factory-utilities/lib/index"
    },
    "dependencies":{
        "@microsoft/azure-data-factory-utilities":"^0.1.3"
    }
}

File should look like the below image.

@microsoft/azure-data-factory-utilities

Follow the below steps to create CI (build) pipeline for automated Azure Data Factory publish.

1. Create a new build pipeline in the Azure DevOps project.
2. Select Azure Repos Git as your code repository.
3. From the Azure Repos, select the repo that contains Data Factory code. This is the repository where you have Data Factory DevOps integration.
4. Select Start Pipeline as your build pipeline type.
5. Replace the default YAML code with the below code. You would also make some changes to the code: replace <subscription-id> with your subscriptionid, <resourcegroup-name> with resource group name and <dev data factory name> with Dev Data Factory name.

trigger:
- master #change master to something else if master is not your main branch
pool:
  vmImage: 'ubuntu-latest'
steps:
- task: NodeTool@0
  inputs:
    versionSpec: '10.x'
  displayName: 'Install Node.js'

- task: Npm@1
  inputs:
    command: 'install'
    verbose: true
  displayName: 'Install npm package'
- task: Npm@1
  inputs:
    command: 'custom'
    customCommand: 'run build validate $(Build.Repository.LocalPath) /subscriptions/<subscription-id>/resourceGroups/<resourcegroup-name>/providers/Microsoft.DataFactory/factories/<dev data factory name>'
  displayName: 'Validate'

# Validate and then generate the ARM template into the destination folder. Same as clicking "Publish" from UX
# The ARM template generated is not published to the ‘Live’ version of the factory. Deployment should be done using a release pipeline. 
- task: Npm@1
  inputs:
    command: 'custom'
    customCommand: 'run build export $(Build.Repository.LocalPath) /subscriptions/<subscription-id>/resourceGroups/<resourcegroup-name>/providers/Microsoft.DataFactory/factories/<dev data factory name> "ArmTemplate"'
  displayName: 'Validate and Generate ARM template'

# Publish the Artifact to be used as a source for a release pipeline
- task: PublishPipelineArtifact@1
  inputs:
    targetPath: '$(Build.Repository.LocalPath)/ArmTemplate'
    artifact: 'ArmTemplates'
    publishLocation: 'pipeline'

6. Click Save.

azure-data-factory-utilities
Steps to create the build pipeline for the Data Factory

The outcome of the above build process will generate the ARM Template as a linked artifact. Note here that the source of this artifact is Data Factory master branch.

Now that we have the build pipeline setup, let us create a release pipeline to deploy code to the dev Data Factory. In this approach, the release pipeline for Data Factory utilizes the build pipeline linked artifact instead of the adf_publish  branch.

The release Pipeline (CD)

Prerequisite:

1, Service connection to dev Azure resource group.

Follow the below steps to create a release pipeline for Data Factory build deployment.

  1. Select Pipelines, and then select Releases from the left side of the Azure DevOps.
  2. Click New release pipeline under New dropdown. Select the Empty job template.
  3. Provide Stage name. This generally denotes a dev environment.
  4. Click Add an artifact and then Build. And select the appropriate Project & Source from the drop-down. Source alias can be the default (typically _name_of_the_build).
  5. Click on the Stage (dev) box hyperlink. Add a new task. Search for ARM Template Deployment.
automated publish improvement in adf cicd
Steps to configure Data Factory release pipeline.

6. Selec Service connection to dev Azure resource group in Azure resource manager connection. Provide relevant Subscription, Resource group & Location. Select Template location as Linked artifact. Browse ARMTemplateForFactory.json file in Template, and ARMTemplateParametersForFactory.json in Template parameters. In the case of Overwrite template parameters, we need to manually provide all parameter names and values from ARMTemplateForFactory.json file. This is because build artifacts are generated runtime. Give a suitable Deployment name for example ‘adf_deployment’. Click Save.
7. You can also enable continuous deployment trigger from the icon on the Artifacts box.

CI/CD ADF with ADFUtilities NPM package,
Steps to configure ARM Template task for Data Factory artifacts.

Pro Tips:

  1. In the real world, you will have to set Data Factory parameters and global parameters in a release pipeline. Setting up global parameters differs slightly in the ADFUtilities NPM package-based deployment. Perhaps, It has to be done using another ARM template task.
  2. We need to disable triggers in Data Factory before the deployment and enable them again once the deployment is completed.
  3. Similar to dev Data Factory deployment, we can add more stages for higher environments like test, production, etc. deployment.

I will write another detailed article to describe release pipeline for Data Factory deployment to multiple environments.

Share your thoughts about this new approach.

How useful was this post?

You don't need to login for this!

As you found this post useful...

Follow us on social media!

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?

Spread the love

Kunal Rathi

Been working on the Microsoft BI stack for close to a decade. Aspiring Data Architect, Cloud enthusiast.

3 Comments

  1. Hi Kunal, I’ve been putting your blog post to use and I am missing something. I am trying to find the folder that is used for the build because it contains a prepostdeployment script. I can however, not find this anywhere in the repository that I linked. I wanted to add the scripts that MS put on their documentation page to disable and restart the triggers. Mind you I have 0 experience with YAML

  2. Hi Kage,
    Below is the script you can use to disable ADF triggers. Use Azure PowerShell task for this.
    $triggersADF = Get-AzDataFactoryV2Trigger -DataFactoryName $(FactoryName) -ResourceGroupName $(ResourceGroupName)

    $triggersADF | ForEach-Object { Stop-AzDataFactoryV2Trigger -ResourceGroupName $(ResourceGroupName) -DataFactoryName $(FactoryName) -Name $_.name -Force }

    Script to enable ADF triggers.Use Azure PowerShell task for this.

    $triggersADF = Get-AzDataFactoryV2Trigger -DataFactoryName $(FactoryName) -ResourceGroupName $(ResourceGroupName)

    $triggersADF | ForEach-Object { Start-AzDataFactoryV2Trigger -ResourceGroupName $(ResourceGroupName) -DataFactoryName $(FactoryName) -Name $_.name -Force }

    Make sure to provide variable names to above script.
    Hope this helps.

  3. Hi Kunal,

    I guess I’m doing something wrong, because I run into some errors

    PublishConfigService: _getLatestPublishConfig – retrieving config file.
    LocalFileClientService: Unable to list files for: integrationRuntime, error: Error: ENOENT: no such file or directory, scandir ‘/home/vsts/work/1/s/integrationRuntime’
    LocalFileClientService: Unable to list files for: dataset, error: Error: ENOENT: no such file or directory, scandir ‘/home/vsts/work/1/s/dataset’
    LocalFileClientService: Unable to list files for: trigger, error: Error: ENOENT: no such file or directory, scandir ‘/home/vsts/work/1/s/trigger’
    LocalFileClientService: Unable to list files for: dataflow, error: Error: ENOENT: no such file or directory, scandir ‘/home/vsts/work/1/s/dataflow’
    LocalFileClientService: Unable to list files for: credential, error: Error: ENOENT: no such file or directory, scandir ‘/home/vsts/work/1/s/credential’
    LocalFileClientService: Unable to list files for: managedVirtualNetwork, error: Error: ENOENT: no such file or directory, scandir ‘/home/vsts/work/1/s/managedVirtualNetwork’

    ERROR === LocalFileClientService: Unable to read file: /home/vsts/work/1/s/arm-template-parameters-definition.json, error: {“stack”:”Error: ENOENT: no such file or directory, open ‘/home/vsts/work/1/s/arm-template-parameters-definition.json'”,”message”:”ENOENT: no such file or directory, open ‘/home/vsts/work/1/s/arm-template-parameters-definition.json'”,”errno”:-2,”code”:”ENOENT”,”syscall”:”open”,”path”:”/home/vsts/work/1/s/arm-template-parameters-definition.json”}
    WARNING === ArmTemplateUtils: _getUserParameterDefinitionJson – Unable to load custom param file from repo, will use default file. Error: {“stack”:”Error: ENOENT: no such file or directory, open ‘/home/vsts/work/1/s/arm-template-parameters-definition.json'”,”message”:”ENOENT: no such file or directory, open ‘/home/vsts/work/1/s/arm-template-parameters-definition.json'”,”errno”:-2,”code”:”ENOENT”,”syscall”:”open”,”path”:”/home/vsts/work/1/s/arm-template-parameters-definition.json”}

    Are they familiar to you and could you provide a possible solution?

    Kind Regards,

    Dickkieee

Leave a Reply