Testing Overhaul

* Adding tests/environment folder to store datasets and bicep templates for test sources * Added scripts to create databricks jobs and a notebook to mount storage on Databricks * Making test environments more consistent across notebooks (secret scope, environment variables) * Handle of tests were modified to correct mistakes not caught in source controlled versions * Added documentation for testing environment including what secrets are used and what they look like * Adding requirements.txt file for environment deployment * Hive tests should run without additional intervention (i.e. use CREATE IF NOT EXISTS) * Removing production env deployment * Remove the wasbs with parameters test * After updating all jobdefs to be ready for upload, the run-tests script needed to look at .name instead of .settings.name * Unfortunately, when calling the jobs API, it returns a .settings.name which must be used
microsoft · Dec 18, 2022 · 9c71e45 · 9c71e45
1 parent fa5c9c4
commit 9c71e45
Show file tree

Hide file tree

Showing 60 changed files with 1,486 additions and 709 deletions.
diff --git a/.github/workflows/build-release.yml b/.github/workflows/build-release.yml
@@ -85,18 +85,19 @@ jobs:
         name: FunctionZip
         path: ./artifacts
 
-    - name: Azure Functions Action
+    - name: Deploy Azure Function to Integration Env
       uses: Azure/[email protected]
       with:
         app-name: ${{ secrets.INT_FUNC_NAME }}
         package: ./artifacts/FunctionZip.zip
         publish-profile: ${{ secrets.INT_PUBLISH_PROFILE }}
 
-    - uses: azure/login@v1
+    - name: Azure Login
+      uses: azure/login@v1
       with:
         creds: ${{ secrets.INT_AZ_CLI_CREDENTIALS }}
 
-    - name: Azure CLI script
+    - name: Compare and Update App Settings on Deployed Function
       uses: azure/CLI@v1
       with:
         azcliversion: 2.34.1
@@ -108,7 +109,7 @@ jobs:
 
     # Start up Synapse Pool and Execute Tests
     - name: Start Integration Synapse SQL Pool
-      run: source tests/integration/manage-sql-pool.sh start ${{ secrets.INT_SUBSCRIPTION_ID }} ${{ secrets.INT_RG_NAME }} ${{ secrets.INT_SYNAPSE_WKSP_NAME }} ${{ secrets.INT_SYNAPSE_SQLPOOL_NAME }}
+      run: source tests/integration/manage-sql-pool.sh start ${{ secrets.INT_SUBSCRIPTION_ID }} ${{ secrets.INT_SYNAPSE_SQLPOOL_RG_NAME }} ${{ secrets.INT_SYNAPSE_WKSP_NAME }} ${{ secrets.INT_SYNAPSE_SQLPOOL_NAME }}
       env:
         AZURE_CLIENT_ID: ${{ secrets.AZURE_CLIENT_ID }}
         AZURE_CLIENT_SECRET: ${{ secrets.AZURE_CLIENT_SECRET }}
@@ -124,6 +125,10 @@ jobs:
         token = ${{ secrets.INT_DATABRICKS_ACCESS_TOKEN }}" > ./config.ini
         export DATABRICKS_CONFIG_FILE=./config.ini
 
+    - name: Confirm Databricks CLI is configured
+      run: databricks clusters spark-versions
+      env:
+        DATABRICKS_CONFIG_FILE: ./config.ini
 
     - name: Cleanup Integration Environment
       run: python ./tests/integration/runner.py --cleanup --dontwait None None None
@@ -144,7 +149,7 @@ jobs:
         DATABRICKS_CONFIG_FILE: ./config.ini
 
     - name: Stop Integration Synapse SQL Pool
-      run: source tests/integration/manage-sql-pool.sh stop ${{ secrets.INT_SUBSCRIPTION_ID }} ${{ secrets.INT_RG_NAME }} ${{ secrets.INT_SYNAPSE_WKSP_NAME }} ${{ secrets.INT_SYNAPSE_SQLPOOL_NAME }}
+      run: source tests/integration/manage-sql-pool.sh stop ${{ secrets.INT_SUBSCRIPTION_ID }} ${{ secrets.INT_SYNAPSE_SQLPOOL_RG_NAME }} ${{ secrets.INT_SYNAPSE_WKSP_NAME }} ${{ secrets.INT_SYNAPSE_SQLPOOL_NAME }}
       env:
         AZURE_CLIENT_ID: ${{ secrets.AZURE_CLIENT_ID }}
         AZURE_CLIENT_SECRET: ${{ secrets.AZURE_CLIENT_SECRET }}
@@ -172,25 +177,3 @@ jobs:
       with:
         artifacts: ~/artifacts/FunctionZip.zip
         token: ${{ secrets.GITHUB_TOKEN }}
-
-  deployProductionEnvironment:
-    name: Release to Production Environment
-    needs: [createRelease]
-    runs-on: ubuntu-latest
-    environment: 
-      name: Production
-    steps:
-    - uses: actions/checkout@v3
-
-    - name: Download Artifact
-      uses: actions/download-artifact@v3
-      with:
-        name: FunctionZip
-        path: ./artifacts
-
-    - name: Azure Functions Action
-      uses: Azure/[email protected]
-      with:
-        app-name: ${{ secrets.FUNC_NAME }}
-        package: ./artifacts/FunctionZip.zip
-        publish-profile: ${{ secrets.PUBLISH_PROFILE }}
diff --git a/.gitignore b/.gitignore
@@ -161,3 +161,4 @@ build
 
 # Ignore local settings
 localsettingsdutils.py
+*.ini
diff --git a/tests/environment/README.md b/tests/environment/README.md
@@ -0,0 +1,122 @@
+# Deploying the Test Environment
+
+## Deploying the Connector
+
+## Deploying the Data Sources
+
+```
+az deployment group create \
+--template-file ./tests/environment/sources/adlsg2.bicep \
+--resource-group db2pvsasources
+
+```
+
+## Manual Steps
+
+Create a config.ini file:
+
+```ini
+databricks_workspace_host_id = adb-workspace.id
+databricks_personal_access_token = PERSONAL_ACCESS_TOKEN
+databricks_spark3_cluster = CLUSTER_ID
+databricks_spark2_cluster = CLUSTER_ID
+```
+
+Assign Service Principal Storage Blob Data Contributor to the main ADLS G2 instance
+
+Add Service Principal as user in Databricks.
+
+Enable mount points with `./tests/environment/dbfs/mounts.py`
+
+Add Key Vault Secrets
+  * `tenant-id`
+  * `storage-service-key`
+  * `azuresql-username`
+  * `azuresql-password`
+  * `azuresql-jdbc-conn-str` should be of the form `jdbc:sqlserver://SERVER_NAME.database.windows.net:1433;database=DATABASE_NAME;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;`
+  * `synapse-storage-key`
+  * `synapse-query-username`
+  * `synapse-query-password`
+* Update SQL Db and Synapse Server with AAD Admin
+* Add Service Principal for Databricks to connect to SQL sources
+
+Set the following system environments:
+
+* `SYNAPSE_SERVICE_NAME`
+* `STORAGE_SERVICE_NAME`
+* `SYNAPSE_STORAGE_SERVICE_NAME`
+
+Upload notebooks in `./tests/integration/spark-apps/notebooks/` to dbfs' `/Shared/examples/`
+
+* Manually for now. TODO: Automate this in Python
+
+Compile the following apps and upload them to `/dbfs/FileStore/testcases/`
+
+* `./tests/integration/spark-apps/jarjobs/abfssInAbfssOut/` with `./gradlew build`
+* `./tests/integration/spark-apps/pythonscript/pythonscript.py` by just uploading.
+* `./tests/integration/spark-apps/wheeljobs/abfssintest/` with `python -m build`
+
+Upload the job definitions using the python script `python .\tests\environment\dbfs\create-job.py`
+
+## Github Actions
+
+* AZURE_CLIENT_ID
+* AZURE_CLIENT_SECRET
+* AZURE_TENANT_ID
+* INT_AZ_CLI_CREDENTIALS
+  ```json
+  {
+    "clientId": "xxxx",
+    "clientSecret": "yyyy",
+    "subscriptionId": "zzzz",
+    "tenantId": "μμμμ",
+    "activeDirectoryEndpointUrl": "https://login.microsoftonline.com",
+    "resourceManagerEndpointUrl": "https://management.azure.com/",
+    "activeDirectoryGraphResourceId": "https://graph.windows.net/",
+    "sqlManagementEndpointUrl": "https://management.core.windows.net:8443/",
+    "galleryEndpointUrl": "https://gallery.azure.com/",
+    "managementEndpointUrl": "https://management.core.windows.net/"
+  }
+  ```
+* INT_DATABRICKS_ACCESS_TOKEN
+* INT_DATABRICKS_WKSP_ID: adb-xxxx.y
+* INT_FUNC_NAME
+* INT_PUBLISH_PROFILE from the Azure Function's publish profile XML
+* INT_PURVIEW_NAME
+* INT_RG_NAME
+* INT_SUBSCRIPTION_ID
+* INT_SYNAPSE_SQLPOOL_NAME
+* INT_SYNAPSE_WKSP_NAME
+* INT_SYNAPSE_WKSP_NAME
+
+## config.json
+
+```json
+{
+  "datasets":{
+    "datasetName": {
+      "schema": [
+        "field1",
+        "field2"
+      ],
+      "data": [
+        [
+          "val1",
+          "val2"
+        ]
+      ]
+    }
+  },
+  "jobs": {
+    "job-name": [
+      [
+        ("storage"|"sql"|"noop"),
+        ("csv"|"delta"|"azuresql"|"synapse"),
+        "rawdata/testcase/one/",
+        "exampleInputA"
+      ]
+    ]
+  }
+}
+
+```