Skip to content

Commit

Permalink
Testing Overhaul
Browse files Browse the repository at this point in the history
* Adding tests/environment folder to store datasets and bicep templates for test sources
* Added scripts to create databricks jobs and a notebook to mount storage on Databricks
* Making test environments more consistent across notebooks (secret scope, environment variables)
* Handle of tests were modified to correct mistakes not caught in source controlled versions
* Added documentation for testing environment including what secrets are used and what they look like
* Adding requirements.txt file for environment deployment
* Hive tests should run without additional intervention (i.e. use CREATE IF NOT EXISTS)
* Removing production env deployment
* Remove the wasbs with parameters test
* After updating all jobdefs to be ready for upload, the run-tests script needed to look at .name instead of .settings.name
  * Unfortunately, when calling the jobs API, it returns a .settings.name which must be used
  • Loading branch information
wjohnson committed Dec 18, 2022
1 parent fa5c9c4 commit 9c71e45
Show file tree
Hide file tree
Showing 60 changed files with 1,486 additions and 709 deletions.
37 changes: 10 additions & 27 deletions .github/workflows/build-release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -85,18 +85,19 @@ jobs:
name: FunctionZip
path: ./artifacts

- name: Azure Functions Action
- name: Deploy Azure Function to Integration Env
uses: Azure/[email protected]
with:
app-name: ${{ secrets.INT_FUNC_NAME }}
package: ./artifacts/FunctionZip.zip
publish-profile: ${{ secrets.INT_PUBLISH_PROFILE }}

- uses: azure/login@v1
- name: Azure Login
uses: azure/login@v1
with:
creds: ${{ secrets.INT_AZ_CLI_CREDENTIALS }}

- name: Azure CLI script
- name: Compare and Update App Settings on Deployed Function
uses: azure/CLI@v1
with:
azcliversion: 2.34.1
Expand All @@ -108,7 +109,7 @@ jobs:

# Start up Synapse Pool and Execute Tests
- name: Start Integration Synapse SQL Pool
run: source tests/integration/manage-sql-pool.sh start ${{ secrets.INT_SUBSCRIPTION_ID }} ${{ secrets.INT_RG_NAME }} ${{ secrets.INT_SYNAPSE_WKSP_NAME }} ${{ secrets.INT_SYNAPSE_SQLPOOL_NAME }}
run: source tests/integration/manage-sql-pool.sh start ${{ secrets.INT_SUBSCRIPTION_ID }} ${{ secrets.INT_SYNAPSE_SQLPOOL_RG_NAME }} ${{ secrets.INT_SYNAPSE_WKSP_NAME }} ${{ secrets.INT_SYNAPSE_SQLPOOL_NAME }}
env:
AZURE_CLIENT_ID: ${{ secrets.AZURE_CLIENT_ID }}
AZURE_CLIENT_SECRET: ${{ secrets.AZURE_CLIENT_SECRET }}
Expand All @@ -124,6 +125,10 @@ jobs:
token = ${{ secrets.INT_DATABRICKS_ACCESS_TOKEN }}" > ./config.ini
export DATABRICKS_CONFIG_FILE=./config.ini
- name: Confirm Databricks CLI is configured
run: databricks clusters spark-versions
env:
DATABRICKS_CONFIG_FILE: ./config.ini

- name: Cleanup Integration Environment
run: python ./tests/integration/runner.py --cleanup --dontwait None None None
Expand All @@ -144,7 +149,7 @@ jobs:
DATABRICKS_CONFIG_FILE: ./config.ini

- name: Stop Integration Synapse SQL Pool
run: source tests/integration/manage-sql-pool.sh stop ${{ secrets.INT_SUBSCRIPTION_ID }} ${{ secrets.INT_RG_NAME }} ${{ secrets.INT_SYNAPSE_WKSP_NAME }} ${{ secrets.INT_SYNAPSE_SQLPOOL_NAME }}
run: source tests/integration/manage-sql-pool.sh stop ${{ secrets.INT_SUBSCRIPTION_ID }} ${{ secrets.INT_SYNAPSE_SQLPOOL_RG_NAME }} ${{ secrets.INT_SYNAPSE_WKSP_NAME }} ${{ secrets.INT_SYNAPSE_SQLPOOL_NAME }}
env:
AZURE_CLIENT_ID: ${{ secrets.AZURE_CLIENT_ID }}
AZURE_CLIENT_SECRET: ${{ secrets.AZURE_CLIENT_SECRET }}
Expand Down Expand Up @@ -172,25 +177,3 @@ jobs:
with:
artifacts: ~/artifacts/FunctionZip.zip
token: ${{ secrets.GITHUB_TOKEN }}

deployProductionEnvironment:
name: Release to Production Environment
needs: [createRelease]
runs-on: ubuntu-latest
environment:
name: Production
steps:
- uses: actions/checkout@v3

- name: Download Artifact
uses: actions/download-artifact@v3
with:
name: FunctionZip
path: ./artifacts

- name: Azure Functions Action
uses: Azure/[email protected]
with:
app-name: ${{ secrets.FUNC_NAME }}
package: ./artifacts/FunctionZip.zip
publish-profile: ${{ secrets.PUBLISH_PROFILE }}
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -161,3 +161,4 @@ build

# Ignore local settings
localsettingsdutils.py
*.ini
122 changes: 122 additions & 0 deletions tests/environment/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# Deploying the Test Environment

## Deploying the Connector

## Deploying the Data Sources

```
az deployment group create \
--template-file ./tests/environment/sources/adlsg2.bicep \
--resource-group db2pvsasources
```

## Manual Steps

Create a config.ini file:

```ini
databricks_workspace_host_id = adb-workspace.id
databricks_personal_access_token = PERSONAL_ACCESS_TOKEN
databricks_spark3_cluster = CLUSTER_ID
databricks_spark2_cluster = CLUSTER_ID
```

Assign Service Principal Storage Blob Data Contributor to the main ADLS G2 instance

Add Service Principal as user in Databricks.

Enable mount points with `./tests/environment/dbfs/mounts.py`

Add Key Vault Secrets
* `tenant-id`
* `storage-service-key`
* `azuresql-username`
* `azuresql-password`
* `azuresql-jdbc-conn-str` should be of the form `jdbc:sqlserver://SERVER_NAME.database.windows.net:1433;database=DATABASE_NAME;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;`
* `synapse-storage-key`
* `synapse-query-username`
* `synapse-query-password`
* Update SQL Db and Synapse Server with AAD Admin
* Add Service Principal for Databricks to connect to SQL sources

Set the following system environments:

* `SYNAPSE_SERVICE_NAME`
* `STORAGE_SERVICE_NAME`
* `SYNAPSE_STORAGE_SERVICE_NAME`

Upload notebooks in `./tests/integration/spark-apps/notebooks/` to dbfs' `/Shared/examples/`

* Manually for now. TODO: Automate this in Python

Compile the following apps and upload them to `/dbfs/FileStore/testcases/`

* `./tests/integration/spark-apps/jarjobs/abfssInAbfssOut/` with `./gradlew build`
* `./tests/integration/spark-apps/pythonscript/pythonscript.py` by just uploading.
* `./tests/integration/spark-apps/wheeljobs/abfssintest/` with `python -m build`

Upload the job definitions using the python script `python .\tests\environment\dbfs\create-job.py`

## Github Actions

* AZURE_CLIENT_ID
* AZURE_CLIENT_SECRET
* AZURE_TENANT_ID
* INT_AZ_CLI_CREDENTIALS
```json
{
"clientId": "xxxx",
"clientSecret": "yyyy",
"subscriptionId": "zzzz",
"tenantId": "μμμμ",
"activeDirectoryEndpointUrl": "https://login.microsoftonline.com",
"resourceManagerEndpointUrl": "https://management.azure.com/",
"activeDirectoryGraphResourceId": "https://graph.windows.net/",
"sqlManagementEndpointUrl": "https://management.core.windows.net:8443/",
"galleryEndpointUrl": "https://gallery.azure.com/",
"managementEndpointUrl": "https://management.core.windows.net/"
}
```
* INT_DATABRICKS_ACCESS_TOKEN
* INT_DATABRICKS_WKSP_ID: adb-xxxx.y
* INT_FUNC_NAME
* INT_PUBLISH_PROFILE from the Azure Function's publish profile XML
* INT_PURVIEW_NAME
* INT_RG_NAME
* INT_SUBSCRIPTION_ID
* INT_SYNAPSE_SQLPOOL_NAME
* INT_SYNAPSE_WKSP_NAME
* INT_SYNAPSE_WKSP_NAME

## config.json

```json
{
"datasets":{
"datasetName": {
"schema": [
"field1",
"field2"
],
"data": [
[
"val1",
"val2"
]
]
}
},
"jobs": {
"job-name": [
[
("storage"|"sql"|"noop"),
("csv"|"delta"|"azuresql"|"synapse"),
"rawdata/testcase/one/",
"exampleInputA"
]
]
}
}

```
Loading

0 comments on commit 9c71e45

Please sign in to comment.