Skip to content

kiookp/dolly

This branch is 79 commits behind databrickslabs/dolly:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

03bf385 · Mar 24, 2023

History

5 Commits
Mar 24, 2023
Mar 24, 2023
Mar 24, 2023
Mar 24, 2023
Mar 24, 2023
Mar 24, 2023
Mar 24, 2023
Mar 24, 2023
Mar 24, 2023
Mar 24, 2023
Mar 24, 2023
Mar 24, 2023
Mar 24, 2023
Mar 24, 2023
Mar 24, 2023

Repository files navigation

Dolly

This fine-tunes the GPT-J 6B model on the Alpaca dataset using a Databricks notebook. Please note that while GPT-J 6B is Apache 2.0 licensed, the Alpaca dataset is licensed under Creative Commons NonCommercial (CC BY-NC 4.0).

Get Started Training

  • Add the dolly repo to Databricks (under Repos click Add Repo, enter https://github.com/databrickslabs/dolly.git, then click Create Repo).
  • Start a 12.2 LTS ML (includes Apache Spark 3.3.2, GPU, Scala 2.12) single-node cluster with node type having 8 A100 GPUs (e.g. Standard_ND96asr_v4 or p4d.24xlarge).
  • Open the train_dolly notebook in the dolly repo, attach to your GPU cluster, and run all cells. When training finishes, the notebook will save the model under /dbfs/dolly_training.

Running Unit Tests Locally

pyenv local 3.8.13
python -m venv .venv
. .venv/bin/activate
pip install -r requirements_dev.txt
./run_pytest.sh