Skip to content

Latest commit

 

History

History
57 lines (41 loc) · 3.39 KB

ml-developer-guide.md

File metadata and controls

57 lines (41 loc) · 3.39 KB

ML Developer Guide

(back to main README)

Table of contents

Iterating on ML code

Deploy ML code and assets to dev workspace using bundles

Refer to Local development and dev workspace to use databricks CLI bundles to deploy ML code together with asset configs to the dev workspace.

This will allow you to develop locally and use databricks CLI bundles to deploy to your dev workspace to test out code and config changes.

Develop on Databricks using Databricks Repos

Prerequisites

You'll need:

  • Access to run commands on a cluster running Databricks Runtime ML version 11.0 or above in your dev Databricks workspace
  • To set up Databricks Repos: see instructions below

Configuring Databricks Repos

To use Repos, set up git integration in your dev workspace.

If the current project has already been pushed to a hosted Git repo, follow the UI workflow to clone it into your dev workspace and iterate.

Otherwise, e.g. if iterating on ML code for a new project, follow the steps below:

  • Follow the UI workflow for creating a repo, but uncheck the "Create repo by cloning a Git repository" checkbox.
  • Install the dbx CLI via pip install --upgrade dbx
  • Run databricks configure --profile stack_demo_public-dev --token --host <your-dev-workspace-url>, passing the URL of your dev workspace. This should prompt you to enter an API token
  • Create a personal access token in your dev workspace and paste it into the prompt from the previous step
  • From within the root directory of the current project, use the dbx sync tool to copy code files from your local machine into the Repo by running dbx sync repo --profile stack_demo_public-dev --source . --dest-repo your-repo-name, where your-repo-name should be the last segment of the full repo name (/Repos/username/your-repo-name)

Running code on Databricks

You can iterate on the sample ML code by running the provided stack_demo_public/training/notebooks/Train.py notebook on Databricks using Repos.

Next Steps

If you're iterating on ML code for an existing, already-deployed ML project, follow Submitting a Pull Request to submit your code for testing and production deployment.

Otherwise, if exploring a new ML problem and satisfied with the results (e.g. you were able to train a model with reasonable performance on your dataset), you may be ready to productionize your pipeline. To do this, follow the MLOps Setup Guide to set up CI/CD and deploy production training/inference pipelines.

(back to main README)