Skip to content

Commit 77536b9

Browse files
authored
Merge pull request #4 from Stability-AI/os_release_2
Rename to stable-audio-tools and remove unsupported features for open-source release
2 parents 56ebc55 + 228a579 commit 77536b9

File tree

72 files changed

+30
-1885
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

72 files changed

+30
-1885
lines changed

README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
1-
# harmonai-tools
1+
# stable-audio-tools
22
Training and inference code for audio generation models
33

44
# Install
55

66
The library can be installed from PyPI with:
77
```bash
8-
$ pip install harmonai-tools
8+
$ pip install stable-audio-tools
99
```
1010

1111
To run the training scripts or inference code, you'll want to clone this repository, navigate to the root, and run:
@@ -37,7 +37,7 @@ $ python3 ./train.py --dataset-config /path/to/dataset/config --model-config /pa
3737
The `--name` parameter will set the project name for your Weights and Biases run.
3838

3939
## Training wrappers and model unwrapping
40-
`harmonai-tools` uses PyTorch Lightning to facilitate multi-GPU and multi-node training.
40+
`stable-audio-tools` uses PyTorch Lightning to facilitate multi-GPU and multi-node training.
4141

4242
When a model is being trained, it is wrapped in a "training wrapper", which is a `pl.LightningModule` that contains all of the relevant objects needed only for training. That includes things like discriminators for autoencoders, EMA copies of models, and all of the optimizer states.
4343

@@ -97,7 +97,7 @@ Additional optional flags for `train.py` include:
9797
- RNG seed for PyTorch, helps with deterministic training
9898

9999
# Configurations
100-
Training and inference code for `harmonai-tools` is based around JSON configuration files that define model hyperparameters, training settings, and information about your training dataset.
100+
Training and inference code for `stable-audio-tools` is based around JSON configuration files that define model hyperparameters, training settings, and information about your training dataset.
101101

102102
## Model config
103103
The model config file defines all of the information needed to load a model for training or inference. It also contains the training configuration needed to fine-tune a model or train from scratch.
@@ -118,7 +118,7 @@ The following properties are defined in the top level of the model configuration
118118
- The training configuration for the model, varies based on `model_type`. Provides parameters for training as well as demos.
119119

120120
## Dataset config
121-
`harmonai-tools` currently supports two kinds of data sources: local directories of audio files, and WebDataset datasets stored in Amazon S3.
121+
`stable-audio-tools` currently supports two kinds of data sources: local directories of audio files, and WebDataset datasets stored in Amazon S3.
122122

123123
# Todo
124124
- [ ] Add documentation for dataset configs

defaults.ini

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
[DEFAULTS]
33

44
#name of the run
5-
name = harmonai_tools
5+
name = stable_audio_tools
66

77
# the batch size
88
batch_size = 8

docs/autoencoders.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ The *decoder* takes in a d-channel latent sequence and upsamples it back to the
77

88
Autoencoders are trained with a combination of reconstruction and adversarial losses in order to create a compact and invertible representation of raw audio data that allows downstream models to work in a data-compressed "latent space", with various desirable and controllable properties such as reduced sequence length, noise resistance, and discretization.
99

10-
The autoencoder architectures defined in `harmonai-tools` are largely fully-convolutional, which allows autoencoders trained on small lengths to be applied to arbitrary-length sequences. For example, an autoencoder trained on 1-second samples could be used to encode 45-second inputs to a latent diffusion model.
10+
The autoencoder architectures defined in `stable-audio-tools` are largely fully-convolutional, which allows autoencoders trained on small lengths to be applied to arbitrary-length sequences. For example, an autoencoder trained on 1-second samples could be used to encode 45-second inputs to a latent diffusion model.
1111

1212
# Model configs
1313
The model config file for an autoencoder should set the `model_type` to `autoencoder`, and the `model` object should have the following properties:

docs/pretransforms.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Pretransforms
22
Many models require some fixed transform to be applied to the input audio before the audio is passed in to the trainable layers of the model, as well as a corresponding inverse transform to be applied to the outputs of the model. We refer to these as "pretransforms".
33

4-
At the moment, `harmonai-tools` supports two pretransforms, frozen autoencoders for latent diffusion models and wavelet decompositions.
4+
At the moment, `stable-audio-tools` supports two pretransforms, frozen autoencoders for latent diffusion models and wavelet decompositions.
55

66
Pretransforms have a similar interface to autoencoders with "encode" and "decode" functions defined for each pretransform.
77

@@ -28,7 +28,7 @@ Example:
2828
The original [Latent Diffusion paper](https://arxiv.org/abs/2112.10752) found that rescaling the latent series to unit variance before performing diffusion improved quality. To this end, we expose a `scale` property on autoencoder pretransforms that will take care of this rescaling. The scale should be set to the original standard deviation of the latents, which can be determined experimentally, or by looking at the `latent_std` value during training. The pretransform code will divide by this scale factor in the `encode` function and multiply by this scale in the `decode` function.
2929

3030
## Wavelet pretransform
31-
`harmonai-tools` also exposes wavelet decomposition as a pretransform. Wavelet decomposition is a quick way to trade off sequence length for channels in autoencoders, while maintaining a multi-band implicit bias.
31+
`stable-audio-tools` also exposes wavelet decomposition as a pretransform. Wavelet decomposition is a quick way to trade off sequence length for channels in autoencoders, while maintaining a multi-band implicit bias.
3232

3333
Wavelet pretransforms take the following properties:
3434

fit_pca.py

Lines changed: 0 additions & 94 deletions
This file was deleted.

harmonai_tools/configs/model_configs/autoencoders/dac_1024_64_stereo_vae_44k.json

Lines changed: 0 additions & 79 deletions
This file was deleted.

harmonai_tools/configs/model_configs/autoencoders/dac_1024_64_stereo_vae_44k_distilled.json

Lines changed: 0 additions & 106 deletions
This file was deleted.

0 commit comments

Comments
 (0)