Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
kyegomez authored Oct 29, 2024
1 parent bc64996 commit ed32f16
Showing 1 changed file with 158 additions and 34 deletions.
192 changes: 158 additions & 34 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,67 +1,191 @@
[![Multi-Modality](agorabanner.png)](https://discord.com/servers/agora-999382051935506503)

# Python Package Template
# M1: Music Generation via Diffusion Transformers 🎵🔬

[![Join our Discord](https://img.shields.io/badge/Discord-Join%20our%20server-5865F2?style=for-the-badge&logo=discord&logoColor=white)](https://discord.gg/agora-999382051935506503) [![Subscribe on YouTube](https://img.shields.io/badge/YouTube-Subscribe-red?style=for-the-badge&logo=youtube&logoColor=white)](https://www.youtube.com/@kyegomez3242) [![Connect on LinkedIn](https://img.shields.io/badge/LinkedIn-Connect-blue?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/kye-g-38759a207/) [![Follow on X.com](https://img.shields.io/badge/X.com-Follow-1DA1F2?style=for-the-badge&logo=x&logoColor=white)](https://x.com/kyegomezb)

A easy, reliable, fluid template for python packages complete with docs, testing suites, readme's, github workflows, linting and much much more

M1 is a research project exploring large-scale music generation using diffusion transformers. This repository contains the implementation of our proposed architecture combining recent advances in diffusion models, transformer architectures, and music processing.

## Installation
## 🔬 Research Overview

You can install the package using pip
We propose a novel approach to music generation that combines:
- Diffusion-based generative modeling
- Multi-query attention mechanisms
- Hierarchical audio encoding
- Text-conditional generation
- Scalable training methodology

```bash
pip install -e .
```
### Key Hypotheses

# Usage
```python
print("hello world")
1. Diffusion transformers can capture long-range musical structure better than traditional autoregressive models
2. Multi-query attention mechanisms can improve training efficiency without sacrificing quality
3. Hierarchical audio encoding preserves both local and global musical features
4. Text conditioning enables semantic control over generation

## 🏗️ Architecture

```
┌─────────────────┐
│ Time Encoding │
└────────┬────────┘
┌──────────────┐ ┌──────▼─────────┐
│ Audio Input ├──► mel spectrogram ──────────► │
└──────────────┘ │ Diffusion │
│ Transformer │ ──► Generated Audio
┌──────────────┐ ┌─────────────┐ │ Block │
│ Text Input ├──► │ T5 Encoder ├──────────► │
└──────────────┘ └─────────────┘ └───────────────┘
```

### Implementation Details

```python
# Key architectural dimensions
MODEL_CONFIG = {
'dim': 512, # Base dimension
'depth': 12, # Number of transformer layers
'heads': 8, # Attention heads
'dim_head': 64, # Dimension per head
'mlp_dim': 2048, # FFN dimension
'dropout': 0.1 # Dropout rate
}

# Audio processing parameters
AUDIO_CONFIG = {
'sample_rate': 16000,
'n_mels': 80,
'n_fft': 1024,
'hop_length': 256
}
```

### Code Quality 🧹
## 📊 Proposed Experiments

### Phase 1: Architecture Validation
- [ ] Baseline model training on synthetic data
- [ ] Ablation studies on attention mechanisms
- [ ] Time embedding comparison study
- [ ] Audio encoding architecture experiments

### Phase 2: Dataset Construction
We plan to build a research dataset from multiple sources:

1. **Initial Development Dataset**
- 10k Creative Commons music samples
- Focused on single-instrument recordings
- Clear genre categorization

2. **Scaled Dataset** (Future Work)
- Spotify API integration
- SoundCloud API integration
- Public domain music archives

### Phase 3: Training & Evaluation
Planned training configurations:
```yaml
initial_training:
batch_size: 32
gradient_accumulation: 4
learning_rate: 1e-4
warmup_steps: 1000
max_steps: 100000

evaluation_metrics:
- spectral_convergence
- magnitude_error
- musical_consistency
- genre_accuracy
```
- `make style` to format the code
- `make check_code_quality` to check code quality (PEP8 basically)
- `black .`
- `ruff . --fix`
## 🛠️ Development Setup
### Tests 🧪
```bash
# Clone repository
git clone https://github.com/m1-research/m1-music.git
cd m1-music

[`pytests`](https://docs.pytest.org/en/7.1.x/) is used to run our tests.
# Create environment
conda create -n m1 python=3.10
conda activate m1

### Publish on PyPi 🚀
# Install dependencies
pip install -r requirements.txt

**Important**: Before publishing, edit `__version__` in [src/__init__](/src/__init__.py) to match the wanted new version.
# Run tests
pytest tests/
```

## 📝 Project Structure

```
poetry build
poetry publish
m1/
├── configs/ # Training configurations
├── m1/
│ ├── models/ # Model architectures
│ ├── diffusion/ # Diffusion scheduling
│ ├── data/ # Data loading/processing
│ └── training/ # Training loops
├── notebooks/ # Research notebooks
├── scripts/ # Training scripts
└── tests/ # Unit tests
```

### CI/CD 🤖
## 🧪 Current Status

This is an active research project in early stages. Current focus:
- [ ] Implementing and testing base architecture
- [ ] Setting up data processing pipeline
- [ ] Designing initial experiments
- [ ] Building evaluation framework

## 📚 References

We use [GitHub actions](https://github.com/features/actions) to automatically run tests and check code quality when a new PR is done on `main`.
Key papers informing this work:
- "Diffusion Models Beat GANs on Image Synthesis" (Dhariwal & Nichol, 2021)
- "Structured Denoising Diffusion Models" (Sohl-Dickstein et al., 2015)
- "High-Resolution Image Synthesis with Latent Diffusion Models" (Rombach et al., 2022)

On any pull request, we will check the code quality and tests.
## 🤝 Contributing

When a new release is created, we will try to push the new code to PyPi. We use [`twine`](https://twine.readthedocs.io/en/stable/) to make our life easier.
We welcome research collaborations! Areas where we're looking for contributions:
- Novel architectural improvements
- Efficient training methodologies
- Evaluation metrics
- Dataset curation tools

The **correct steps** to create a new realease are the following:
- edit `__version__` in [src/__init__](/src/__init__.py) to match the wanted new version.
- create a new [`tag`](https://git-scm.com/docs/git-tag) with the release name, e.g. `git tag v0.0.1 && git push origin v0.0.1` or from the GitHub UI.
- create a new release from GitHub UI
## 📬 Contact

The CI will run when you create the new release.
For research collaboration inquiries:
- Submit an issue
- Start a discussion
- Email: [email protected]

# Docs
We use MK docs. This repo comes with the zeta docs. All the docs configurations are already here along with the readthedocs configs.
## ⚖️ License

This research code is released under the MIT License.

## 🔍 Citation

If you use this code in your research, please cite:
```bibtex
@misc{m1music2024,
title={M1: Experimental Music Generation via Diffusion Transformers},
author={M1 Research Team},
year={2024},
publisher={GitHub},
journal={GitHub repository},
howpublished={\url{https://github.com/m1-research/m1-music}}
}
```

## 🚧 Disclaimer

This is experimental research code:
- Architecture and training procedures may change significantly
- Not yet optimized for production use
- Results and capabilities are being actively researched
- Breaking changes should be expected

# License
MIT
We're sharing this code to foster collaboration and advance the field of AI music generation research.

0 comments on commit ed32f16

Please sign in to comment.