Analyze musical scores with language models!
First, create a new virtualenv or conda environment with python & pip. Then:
pip install transformers datasets accelerate music21 deepspeed
Finally, note that the scripts generate some very large files, like datasets or model weights. Please DO NOT commit these to git! The default names for these files are in the gitignore, but please be careful.
Run generate_data.py
to process scores from composers in music21's database into the model's language.
This script produces dataset/<composername>.jsonl
for each composer. To create a training dataset, use
cat
to merge the composers you want into one file called data.jsonl
.
cat bach.jsonl mozart.jsonl > data.jsonl
Filenames can be repeated to include them multiple times. This can be used to influence the composition
of the dataset, or balance it. For instance palestrina.jsonl
is the largest by a wide margin, so other
composers might need to be repeated to get more diverse and interesting generations.
Use train.py
to train the model. This script finetunes Eleuther's Pythia model for the score text
format. Right now we use Pythia for a few reasons:
- Easy to access from Hugging Face, no downloading delta weights and merging like the official llama.
- Offers many scaling options for different hardware configurations. 70m is very manageable and can
be trained on average hardware without
deepspeed
- Pythia is trained on The Pile, which contains code text. Pretraining on code (or other formal languages like mathematics) is useful if the score text format resembles code.
Check out one of the pythia model cards for more info.
Parameters are selected in train_cfg.json
.
In particular, check out these parameters:
model_name:
Set by default to EleutherAI/pythia-70m-deduped, the smallest Pythia variant.
If you have enough VRAM, you can select 160m, 410m, etc.
max_length:
The sequence length the model can process, in tokens. Reduce if you're running out of VRAM.
batchsize:
How many items to process in a batch. Use the largest batchsize your VRAM allows. Reduce if
out of VRAM. When reducing batchsize, you may also need to reduce lr.
TODO:
- Support checkpointing and checkpoint loading.
Deepspeed implements optimizations for training that can significantly reduce the VRAM requirement, at the cost of additional main RAM use. To use deepspeed,
accelerate launch --config_file accel_config.yaml train.py
If you receive an error about CPU Adam like this: AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Then try reinstalling deepspeed with these arguments: DS_BUILD_CPU_ADAM=1 BUILD_UTILS=1 pip install deepspeed -U
I had this problem on ROCm setups but not CUDA.
Use infer.py
to output text with a trained model. This scripts loads the model
in a directory named score-lm
, which is created by the training script when it finishes.
python infer.py --help
python infer.py -p "|"
The -p
argument is the text to start generating from. Since a |
is the start of a new measure,
it is the usual way to start generation of a new score. You could also manually enter your own score,
and let it continue from that.
temperature & topk
are the key parameters to modify during training. Both will result
in more randomness when set higher. At low temperatures (<0.5) you'll likely get the same
chord repeated over and over, and at higher temperatures you'll get scores that Nancarrow would love.
TODO:
- expose these on the CLI
infer.py
will take the generated output from the model and try to reconstruct a musical score
out of it. Naturally, the model output is imperfect so the decoder will ignore and fix certain
errors such as:
- Invalid note names (like H) or octaves (like -6) are ignored. This is done on a note by note basis within a chord, so a single bad note doesn't drop the entire chord.
- Invalid durations (anything too small for musescore to display) are changed to 0.5.
- Nothing is done to duplicates, but it's probably worth fixing. Musescore throws a warning, but renders them anyway.
- We simplify ties to be only
start
orNone
, ignoring thecontinue
orstop
codes in music21. - Currently, we do not follow the measure tokens generated by the model, and instead append
everything into one long stream and let music21 figure out where the measure boundaries should be.
- It's a future goal to benchmark the model's ability to generate notes that properly sum to a measure.
- Non-power-of-two duration (like triplets) handling seems bugged and needs more investigation, but there aren't many triplets in the training data at the moment. They render but Musecore produces warnings.
Deepspeed hits an out of memory issue on WSL2, which is discussed in this thread Apply the workaround suggest, removing the pin memory call.