Dolly

This fine-tunes the GPT-J 6B model on the Alpaca dataset using a Databricks notebook. Please note that while GPT-J 6B is Apache 2.0 licensed, the Alpaca dataset is licensed under Creative Commons NonCommercial (CC BY-NC 4.0).

Get Started Training

Add the dolly repo to Databricks (under Repos click Add Repo, enter https://github.com/databrickslabs/dolly.git, then click Create Repo).
Start a 12.2 LTS ML (includes Apache Spark 3.3.2, GPU, Scala 2.12) single-node cluster with node type having 8 A100 GPUs (e.g. Standard_ND96asr_v4 or p4d.24xlarge).
Open the train_dolly notebook in the dolly repo, attach to your GPU cluster, and run all cells. When training finishes, the notebook will save the model under /dbfs/dolly_training.

pyenv local 3.8.13
python -m venv .venv
. .venv/bin/activate
pip install -r requirements_dev.txt
./run_pytest.sh

Name	Name	Last commit message	Last commit date
Latest commit matthayes Update README.md Mar 24, 2023 03bf385 · Mar 24, 2023 History 5 Commits
config	config	Trainer config fixes	Mar 24, 2023
test	test	Move directory, add test instructions	Mar 24, 2023
training	training	Trainer config fixes	Mar 24, 2023
.coveragerc	.coveragerc	Hello Dolly	Mar 24, 2023
.gitignore	.gitignore	Hello Dolly	Mar 24, 2023
CONTRIBUTING.md	CONTRIBUTING.md	Hello Dolly	Mar 24, 2023
LICENSE	LICENSE	Hello Dolly	Mar 24, 2023
NOTICE	NOTICE	Hello Dolly	Mar 24, 2023
README.md	README.md	Update README.md	Mar 24, 2023
__init__.py	__init__.py	Hello Dolly	Mar 24, 2023
pytest.ini	pytest.ini	Hello Dolly	Mar 24, 2023
requirements.txt	requirements.txt	Hello Dolly	Mar 24, 2023
requirements_dev.txt	requirements_dev.txt	Move directory, add test instructions	Mar 24, 2023
run_pytest.sh	run_pytest.sh	Move directory, add test instructions	Mar 24, 2023
train_dolly.py	train_dolly.py	Move directory, add test instructions	Mar 24, 2023