This repo contains example Azure Databricks CICD patterns using Databricks CLI and YAML pipelines.
I also wrote a blog post Managing Azure Databricks Workspace IP Access Lists via CICD to explain some things used in this repo.
For Azure Databricks workspace authentication, in this repo I'm using OAuth machine-to-machine (M2M) authentication for unattended access to Databricks resources with a service principal using OAuth. See https://docs.databricks.com/aws/en/dev-tools/auth/oauth-m2m.
These token values for __ADB_DEV_SPN_ID__ __ADB_DEV_HOST__ are detected and replaced by the token replacement task in the build and release pipeline templates with the new values driven from secrets in an Azure DevOps Library referenced by the calling pipeline.
targets:
dev:
mode: production
default: true
run_as:
service_principal_name: "__ADB_DEV_SPN_ID__"
workspace:
host: "__ADB_DEV_HOST__"
root_path: /Workspace/${bundle.name}
auth_type: oauth-m2m
sync:
exclude:
- devopsThese environment variable values for DATABRICKS_HOST, DATABRICKS_CLIENT_ID, and DATABRICKS_CLIENT_SECRET which are required for oauth-m2m authentication are passed in from the calling pipeline which references the related Azure DevOps Library secrets.
# Validate Databricks Bundle
- bash: |
databricks bundle validate -t ${{ parameters.BUNDLE_TARGET }} --log-level ${{ parameters.DATABRICKS_LOG_LEVEL }}
env:
DATABRICKS_HOST: ${{ parameters.ADB_HOST }} #required for oauth-m2m authentication
DATABRICKS_CLIENT_ID: ${{ parameters.ADB_CLIENT_ID }} #required for oauth-m2m authentication
DATABRICKS_CLIENT_SECRET: ${{ parameters.ADB_CLIENT_SECRET }} #required for oauth-m2m authentication
displayName: "Validate ${{ parameters.BUNDLE_TARGET }} Databricks Bundle"- stage: Build_Dev
displayName: "Build Dev"
dependsOn: []
jobs:
- template: /devops/templates/ado-build-template.yml@self
parameters:
BUNDLE_TARGET: "dev"
ADB_HOST: $(ADB_DEV_HOST) #value replaced from referenced Azure DevOps Library secret
ADB_CLIENT_ID: $(ADB_DEV_SPN_ID) #value replaced from referenced Azure DevOps Library secret
ADB_CLIENT_SECRET: $(ADB_DEV_SPN_SECRET) #value replaced from referenced Azure DevOps Library secret
DATABRICKS_SDK_VERSION: ${{ variables.DATABRICKS_SDK_VERSION }}
DATABRICKS_LOG_LEVEL: ${{ variables.DATABRICKS_LOG_LEVEL }}
ADO_ENVIRONMENT: "Dev-Databricks" #Update this to desired environment name in Azure DevOps
AGENT_POOL: ${{ variables.AGENT_POOL }}The ./workspace-ip-access-lists folder in this repo contains the .json files that are used by the ado-release-template.yml devops template file to manage IP access lists for the workspace respective to each databricks environment.
Key References:
- https://learn.microsoft.com/en-us/azure/databricks/security/network/front-end/ip-access-list
- https://docs.databricks.com/en/security/network/front-end/ip-access-list-workspace.html
- https://learn.microsoft.com/en-us/azure/databricks/security/network/front-end/ip-access-list-workspace
| Input name | Type | required | example |
|---|---|---|---|
| label | string | yes for Create and Update operations |
"ALLOW_AZURE_DATABRICKS_PRODFIX_SUBNETS" |
| list_type | string | yes for Create and Update operations |
"ALLOW" or "BLOCK" |
| ip_addresses | array | yes for Create and Update operations |
["10.0.0.0/25","10.0.100.0/25"] |
| ip_access_list_id | string | yes for Update and Delete operations |
"a559572d-1730-4ce4-203z-75506242f04h" |
| operation | string | yes always | "CREATE" or "UPDATE" or "DELETE" |
Enable or Disable IP access lists
- stage: Release_PRD
displayName: "Release PRD"
dependsOn: [Build_PRD]
condition: ${{ variables.Condition }}
jobs:
- template: /devops/templates/ado-release-template.yml@self
parameters:
BUNDLE_TARGET: "prd"
ADB_HOST: $(ADB_PRD_HOST)
ADB_CLIENT_ID: $(ADB_PRD_SPN_ID)
ADB_CLIENT_SECRET: $(ADB_PRD_SPN_SECRET)
ADB_ENABLE_IP_ACCESS_LISTS: false #Update to true if you want to enable IP access lists for the Databricks workspace
DATABRICKS_SDK_VERSION: ${{ variables.DATABRICKS_SDK_VERSION }}
DATABRICKS_LOG_LEVEL: ${{ variables.DATABRICKS_LOG_LEVEL }}
ADB_JOB_NAMES: "init_job"
ADO_ENVIRONMENT: "Prod-Databricks"
AGENT_POOL: ${{ variables.AGENT_POOL }}Create a new access list
[
{
"label": "ALLOW_EXAMPLE_CORP_NETWORK1",
"list_type": "ALLOW",
"ip_addresses": ["192.168.0.0/23"],
"operation": "CREATE"
}
]Update an existing access list
- note: the required
ip_access_list_idvalue can be retrieved from a previous build or release pipeline run (within the 'Check/Configure Databricks IP Access Lists' step)
[
{
"label": "ALLOW_EXAMPLE_CORP_NETWORK1",
"list_type": "ALLOW",
"ip_addresses": ["192.168.0.0/23", "192.168.100.0/23"],
"ip_access_list_id": "a559572d-1730-4ce4-203z-75506242f04h",
"operation": "UPDATE"
}
]Delete an existing access list
- note: the required
ip_access_list_idvalue can be retrieved from a previous build or release pipeline run (within the 'Check/Configure Databricks IP Access Lists' step)
[
{
"label": "ALLOW_EXAMPLE_CORP_NETWORK1",
"list_type": "ALLOW",
"ip_addresses": ["192.168.0.0/23", "192.168.100.0/23"],
"ip_access_list_id": "a559572d-1730-4ce4-203z-75506242f04h",
"operation": "DELETE"
}
]