generated from kyegomez/Python-Package-Template
-
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
158 additions
and
34 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,67 +1,191 @@ | ||
[![Multi-Modality](agorabanner.png)](https://discord.com/servers/agora-999382051935506503) | ||
|
||
# Python Package Template | ||
# M1: Music Generation via Diffusion Transformers 🎵🔬 | ||
|
||
[![Join our Discord](https://img.shields.io/badge/Discord-Join%20our%20server-5865F2?style=for-the-badge&logo=discord&logoColor=white)](https://discord.gg/agora-999382051935506503) [![Subscribe on YouTube](https://img.shields.io/badge/YouTube-Subscribe-red?style=for-the-badge&logo=youtube&logoColor=white)](https://www.youtube.com/@kyegomez3242) [![Connect on LinkedIn](https://img.shields.io/badge/LinkedIn-Connect-blue?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/kye-g-38759a207/) [![Follow on X.com](https://img.shields.io/badge/X.com-Follow-1DA1F2?style=for-the-badge&logo=x&logoColor=white)](https://x.com/kyegomezb) | ||
|
||
A easy, reliable, fluid template for python packages complete with docs, testing suites, readme's, github workflows, linting and much much more | ||
|
||
M1 is a research project exploring large-scale music generation using diffusion transformers. This repository contains the implementation of our proposed architecture combining recent advances in diffusion models, transformer architectures, and music processing. | ||
|
||
## Installation | ||
## 🔬 Research Overview | ||
|
||
You can install the package using pip | ||
We propose a novel approach to music generation that combines: | ||
- Diffusion-based generative modeling | ||
- Multi-query attention mechanisms | ||
- Hierarchical audio encoding | ||
- Text-conditional generation | ||
- Scalable training methodology | ||
|
||
```bash | ||
pip install -e . | ||
``` | ||
### Key Hypotheses | ||
|
||
# Usage | ||
```python | ||
print("hello world") | ||
1. Diffusion transformers can capture long-range musical structure better than traditional autoregressive models | ||
2. Multi-query attention mechanisms can improve training efficiency without sacrificing quality | ||
3. Hierarchical audio encoding preserves both local and global musical features | ||
4. Text conditioning enables semantic control over generation | ||
|
||
## 🏗️ Architecture | ||
|
||
``` | ||
┌─────────────────┐ | ||
│ Time Encoding │ | ||
└────────┬────────┘ | ||
│ | ||
┌──────────────┐ ┌──────▼─────────┐ | ||
│ Audio Input ├──► mel spectrogram ──────────► │ | ||
└──────────────┘ │ Diffusion │ | ||
│ Transformer │ ──► Generated Audio | ||
┌──────────────┐ ┌─────────────┐ │ Block │ | ||
│ Text Input ├──► │ T5 Encoder ├──────────► │ | ||
└──────────────┘ └─────────────┘ └───────────────┘ | ||
``` | ||
|
||
### Implementation Details | ||
|
||
```python | ||
# Key architectural dimensions | ||
MODEL_CONFIG = { | ||
'dim': 512, # Base dimension | ||
'depth': 12, # Number of transformer layers | ||
'heads': 8, # Attention heads | ||
'dim_head': 64, # Dimension per head | ||
'mlp_dim': 2048, # FFN dimension | ||
'dropout': 0.1 # Dropout rate | ||
} | ||
|
||
# Audio processing parameters | ||
AUDIO_CONFIG = { | ||
'sample_rate': 16000, | ||
'n_mels': 80, | ||
'n_fft': 1024, | ||
'hop_length': 256 | ||
} | ||
``` | ||
|
||
### Code Quality 🧹 | ||
## 📊 Proposed Experiments | ||
|
||
### Phase 1: Architecture Validation | ||
- [ ] Baseline model training on synthetic data | ||
- [ ] Ablation studies on attention mechanisms | ||
- [ ] Time embedding comparison study | ||
- [ ] Audio encoding architecture experiments | ||
|
||
### Phase 2: Dataset Construction | ||
We plan to build a research dataset from multiple sources: | ||
|
||
1. **Initial Development Dataset** | ||
- 10k Creative Commons music samples | ||
- Focused on single-instrument recordings | ||
- Clear genre categorization | ||
|
||
2. **Scaled Dataset** (Future Work) | ||
- Spotify API integration | ||
- SoundCloud API integration | ||
- Public domain music archives | ||
|
||
### Phase 3: Training & Evaluation | ||
Planned training configurations: | ||
```yaml | ||
initial_training: | ||
batch_size: 32 | ||
gradient_accumulation: 4 | ||
learning_rate: 1e-4 | ||
warmup_steps: 1000 | ||
max_steps: 100000 | ||
|
||
evaluation_metrics: | ||
- spectral_convergence | ||
- magnitude_error | ||
- musical_consistency | ||
- genre_accuracy | ||
``` | ||
- `make style` to format the code | ||
- `make check_code_quality` to check code quality (PEP8 basically) | ||
- `black .` | ||
- `ruff . --fix` | ||
## 🛠️ Development Setup | ||
### Tests 🧪 | ||
```bash | ||
# Clone repository | ||
git clone https://github.com/m1-research/m1-music.git | ||
cd m1-music | ||
|
||
[`pytests`](https://docs.pytest.org/en/7.1.x/) is used to run our tests. | ||
# Create environment | ||
conda create -n m1 python=3.10 | ||
conda activate m1 | ||
|
||
### Publish on PyPi 🚀 | ||
# Install dependencies | ||
pip install -r requirements.txt | ||
|
||
**Important**: Before publishing, edit `__version__` in [src/__init__](/src/__init__.py) to match the wanted new version. | ||
# Run tests | ||
pytest tests/ | ||
``` | ||
|
||
## 📝 Project Structure | ||
|
||
``` | ||
poetry build | ||
poetry publish | ||
m1/ | ||
├── configs/ # Training configurations | ||
├── m1/ | ||
│ ├── models/ # Model architectures | ||
│ ├── diffusion/ # Diffusion scheduling | ||
│ ├── data/ # Data loading/processing | ||
│ └── training/ # Training loops | ||
├── notebooks/ # Research notebooks | ||
├── scripts/ # Training scripts | ||
└── tests/ # Unit tests | ||
``` | ||
|
||
### CI/CD 🤖 | ||
## 🧪 Current Status | ||
|
||
This is an active research project in early stages. Current focus: | ||
- [ ] Implementing and testing base architecture | ||
- [ ] Setting up data processing pipeline | ||
- [ ] Designing initial experiments | ||
- [ ] Building evaluation framework | ||
|
||
## 📚 References | ||
|
||
We use [GitHub actions](https://github.com/features/actions) to automatically run tests and check code quality when a new PR is done on `main`. | ||
Key papers informing this work: | ||
- "Diffusion Models Beat GANs on Image Synthesis" (Dhariwal & Nichol, 2021) | ||
- "Structured Denoising Diffusion Models" (Sohl-Dickstein et al., 2015) | ||
- "High-Resolution Image Synthesis with Latent Diffusion Models" (Rombach et al., 2022) | ||
|
||
On any pull request, we will check the code quality and tests. | ||
## 🤝 Contributing | ||
|
||
When a new release is created, we will try to push the new code to PyPi. We use [`twine`](https://twine.readthedocs.io/en/stable/) to make our life easier. | ||
We welcome research collaborations! Areas where we're looking for contributions: | ||
- Novel architectural improvements | ||
- Efficient training methodologies | ||
- Evaluation metrics | ||
- Dataset curation tools | ||
|
||
The **correct steps** to create a new realease are the following: | ||
- edit `__version__` in [src/__init__](/src/__init__.py) to match the wanted new version. | ||
- create a new [`tag`](https://git-scm.com/docs/git-tag) with the release name, e.g. `git tag v0.0.1 && git push origin v0.0.1` or from the GitHub UI. | ||
- create a new release from GitHub UI | ||
## 📬 Contact | ||
|
||
The CI will run when you create the new release. | ||
For research collaboration inquiries: | ||
- Submit an issue | ||
- Start a discussion | ||
- Email: [email protected] | ||
|
||
# Docs | ||
We use MK docs. This repo comes with the zeta docs. All the docs configurations are already here along with the readthedocs configs. | ||
## ⚖️ License | ||
|
||
This research code is released under the MIT License. | ||
|
||
## 🔍 Citation | ||
|
||
If you use this code in your research, please cite: | ||
```bibtex | ||
@misc{m1music2024, | ||
title={M1: Experimental Music Generation via Diffusion Transformers}, | ||
author={M1 Research Team}, | ||
year={2024}, | ||
publisher={GitHub}, | ||
journal={GitHub repository}, | ||
howpublished={\url{https://github.com/m1-research/m1-music}} | ||
} | ||
``` | ||
|
||
## 🚧 Disclaimer | ||
|
||
This is experimental research code: | ||
- Architecture and training procedures may change significantly | ||
- Not yet optimized for production use | ||
- Results and capabilities are being actively researched | ||
- Breaking changes should be expected | ||
|
||
# License | ||
MIT | ||
We're sharing this code to foster collaboration and advance the field of AI music generation research. |