Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix schema validation #146

Merged
merged 4 commits into from
Feb 7, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions databricks_template_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
"input_project_name": {
"order": 2,
"type": "string",
"default": "my-mlops-project",
"default": "my_mlops_project",
"description": "\nProject Name. Default",
"pattern": "^[^ .\\\\/]{3,}$",
"pattern_match_failure_message": "Project name must be at least 3 characters long and cannot contain the following characters: \"\\\", \"/\", \" \" and \".\".",
Expand Down Expand Up @@ -131,7 +131,7 @@
"order": 11,
"type": "string",
"description": "\nWhether to use the Model Registry with Unity Catalog",
"default": "yes",
"default": "no",
"enum": ["yes", "no"],
"skip_prompt_if": {
"properties": {
Expand All @@ -145,7 +145,7 @@
"order": 12,
"type": "string",
"description": "\nName of schema to use when registering a model in Unity Catalog. \nNote that this schema must already exist, and we recommend keeping the name the same as the project name as well as giving the service principals the right access. Default",
"default": "{{ .input_project_name }}",
"default": "{{if (eq .input_include_models_in_unity_catalog `no`)}}schema{{else}}{{ .input_project_name }}{{end}}",
"pattern": "^[^ .\\-\\/]*$",
"pattern_match_failure_message": "Valid schema names cannot contain any of the following characters: \" \", \".\", \"-\", \"\\\", \"/\"",
"skip_prompt_if": {
Expand Down
7 changes: 4 additions & 3 deletions template/{{.input_root_dir}}/docs/mlops-setup.md.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -82,9 +82,10 @@ For your convenience, we also have a [Terraform module](https://registry.terrafo
#### Configure Service Principal (SP) permissions
If the created project uses **Unity Catalog**, we expect a catalog to exist with the name of the deployment target by default.
For example, if the deployment target is dev, we expect a catalog named dev to exist in the workspace.
If you want to use different catalog names, please update the targets declared in the {{ if (eq .input_setup_cicd_and_project `CICD_and_Project`)}}[{{ .input_project_name }}/databricks.yml](../{{template `project_name_alphanumeric_underscore` .}}/databricks.yml)
and [{{ .input_project_name }}/resources/ml-artifacts-resource.yml](../{{template `project_name_alphanumeric_underscore` .}}/resources/ml-artifacts-resource.yml) {{ else }} `databricks.yml` and `resources/ml-artifacts-resource.yml` {{ end }} files.
If changing the staging, prod, or test deployment targets, you'll need to update the workflows located in the .github/workflows directory.
If you want to use different catalog names, please update the target names declared in the
{{- if (eq .input_setup_cicd_and_project `CICD_and_Project`)}}[{{ .input_project_name }}/databricks.yml](../{{template `project_name_alphanumeric_underscore` .}}/databricks.yml)
{{- else }} `databricks.yml` {{ end }} file.
If changing the staging, prod, or test deployment targets, you'll also need to update the workflows located in the .github/workflows directory.

The SP must have proper permission in each respective environment and the catalog for the environments.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -207,6 +207,30 @@ logic in `features` and run the feature engineering pipeline in the `GenerateAnd
* Python 3.8+
* Install feature engineering code and test dependencies via `pip install -I -r requirements.txt` from project root directory.
* The features transform code uses PySpark and brings up a local Spark instance for testing, so [Java (version 8 and later) is required](https://spark.apache.org/docs/latest/#downloading).
{{- if (eq .input_include_models_in_unity_catalog `yes`) }}
* Access to UC catalog and schema
We expect a catalog to exist with the name of the deployment target by default.
For example, if the deployment target is dev, we expect a catalog named dev to exist in the workspace.
If you want to use different catalog names, please update the target names declared in the [databricks.yml](./databricks.yml) file.
{{- if (eq .input_setup_cicd_and_project `CICD_and_Project`) }}
If changing the staging, prod, or test deployment targets, you'll also need to update the workflows located in the .github/workflows directory.
{{- end }}

For the ML training job, you must have permissions to read the input Delta table and create experiment and models.
i.e. for each environment:
- USE_CATALOG
- USE_SCHEMA
- MODIFY
- CREATE_MODEL
- CREATE_TABLE

For the batch inference job, you must have permissions to read input Delta table and modify the output Delta table.
i.e. for each environment
- USAGE permissions for the catalog and schema of the input and output table.
- SELECT permission for the input table.
- MODIFY permission for the output table if it pre-dates your job.
{{- end }}

#### Run unit tests
You can run unit tests for your ML code via `pytest tests`.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ include:

# Deployment Target specific values for workspace
targets:
dev:
dev: {{ if (eq .input_include_models_in_unity_catalog `yes`)}} # UC Catalog Name {{ end }}
default: true
workspace:
# TODO: add dev workspace URL
Expand Down
Loading