Update README.md

Agora-Lab-AI · Oct 29, 2024 · ed32f16 · ed32f16
1 parent bc64996
commit ed32f16
Showing 1 changed file with 158 additions and 34 deletions.
diff --git a/README.md b/README.md
@@ -1,67 +1,191 @@
 [![Multi-Modality](agorabanner.png)](https://discord.com/servers/agora-999382051935506503)
 
-# Python Package Template
+# M1: Music Generation via Diffusion Transformers 🎵🔬
 
 [![Join our Discord](https://img.shields.io/badge/Discord-Join%20our%20server-5865F2?style=for-the-badge&logo=discord&logoColor=white)](https://discord.gg/agora-999382051935506503) [![Subscribe on YouTube](https://img.shields.io/badge/YouTube-Subscribe-red?style=for-the-badge&logo=youtube&logoColor=white)](https://www.youtube.com/@kyegomez3242) [![Connect on LinkedIn](https://img.shields.io/badge/LinkedIn-Connect-blue?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/kye-g-38759a207/) [![Follow on X.com](https://img.shields.io/badge/X.com-Follow-1DA1F2?style=for-the-badge&logo=x&logoColor=white)](https://x.com/kyegomezb)
 
-A easy, reliable, fluid template for python packages complete with docs, testing suites, readme's, github workflows, linting and much much more
 
+M1 is a research project exploring large-scale music generation using diffusion transformers. This repository contains the implementation of our proposed architecture combining recent advances in diffusion models, transformer architectures, and music processing.
 
-## Installation
+## 🔬 Research Overview
 
-You can install the package using pip
+We propose a novel approach to music generation that combines:
+- Diffusion-based generative modeling
+- Multi-query attention mechanisms
+- Hierarchical audio encoding
+- Text-conditional generation
+- Scalable training methodology
 
-```bash
-pip install -e .
-```
+### Key Hypotheses
 
-# Usage
-```python
-print("hello world")
+1. Diffusion transformers can capture long-range musical structure better than traditional autoregressive models
+2. Multi-query attention mechanisms can improve training efficiency without sacrificing quality
+3. Hierarchical audio encoding preserves both local and global musical features
+4. Text conditioning enables semantic control over generation
 
+## 🏗️ Architecture
+
+```
+                                              ┌─────────────────┐
+                                              │  Time Encoding  │
+                                              └────────┬────────┘
+                                                      │
+┌──────────────┐                              ┌──────▼─────────┐
+│ Audio Input  ├──► mel spectrogram ──────────►               │
+└──────────────┘                              │   Diffusion    │
+                                              │  Transformer   │ ──► Generated Audio
+┌──────────────┐    ┌─────────────┐          │     Block      │
+│ Text Input   ├──► │ T5 Encoder  ├──────────►               │
+└──────────────┘    └─────────────┘          └───────────────┘
 ```
 
+### Implementation Details
 
+```python
+# Key architectural dimensions
+MODEL_CONFIG = {
+    'dim': 512,          # Base dimension
+    'depth': 12,         # Number of transformer layers
+    'heads': 8,          # Attention heads
+    'dim_head': 64,      # Dimension per head
+    'mlp_dim': 2048,     # FFN dimension
+    'dropout': 0.1       # Dropout rate
+}
+
+# Audio processing parameters
+AUDIO_CONFIG = {
+    'sample_rate': 16000,
+    'n_mels': 80,
+    'n_fft': 1024,
+    'hop_length': 256
+}
+```
 
-### Code Quality 🧹
+## 📊 Proposed Experiments
+
+### Phase 1: Architecture Validation
+- [ ] Baseline model training on synthetic data
+- [ ] Ablation studies on attention mechanisms
+- [ ] Time embedding comparison study
+- [ ] Audio encoding architecture experiments
+
+### Phase 2: Dataset Construction
+We plan to build a research dataset from multiple sources:
+
+1. **Initial Development Dataset**
+   - 10k Creative Commons music samples
+   - Focused on single-instrument recordings
+   - Clear genre categorization
+
+2. **Scaled Dataset** (Future Work)
+   - Spotify API integration
+   - SoundCloud API integration
+   - Public domain music archives
+
+### Phase 3: Training & Evaluation
+Planned training configurations:
+```yaml
+initial_training:
+  batch_size: 32
+  gradient_accumulation: 4
+  learning_rate: 1e-4
+  warmup_steps: 1000
+  max_steps: 100000
+
+evaluation_metrics:
+  - spectral_convergence
+  - magnitude_error
+  - musical_consistency
+  - genre_accuracy
+```
 
-- `make style` to format the code
-- `make check_code_quality` to check code quality (PEP8 basically)
-- `black .`
-- `ruff . --fix`
+## 🛠️ Development Setup
 
-### Tests 🧪
+```bash
+# Clone repository
+git clone https://github.com/m1-research/m1-music.git
+cd m1-music
 
-[`pytests`](https://docs.pytest.org/en/7.1.x/) is used to run our tests.
+# Create environment
+conda create -n m1 python=3.10
+conda activate m1
 
-### Publish on PyPi 🚀
+# Install dependencies
+pip install -r requirements.txt
 
-**Important**: Before publishing, edit `__version__` in [src/__init__](/src/__init__.py) to match the wanted new version.
+# Run tests
+pytest tests/
+```
+
+## 📝 Project Structure
 
 ```
-poetry build
-poetry publish
+m1/
+├── configs/               # Training configurations
+├── m1/
+│   ├── models/           # Model architectures
+│   ├── diffusion/        # Diffusion scheduling
+│   ├── data/             # Data loading/processing
+│   └── training/         # Training loops
+├── notebooks/            # Research notebooks
+├── scripts/              # Training scripts
+└── tests/                # Unit tests
 ```
 
-### CI/CD 🤖
+## 🧪 Current Status
+
+This is an active research project in early stages. Current focus:
+- [ ] Implementing and testing base architecture
+- [ ] Setting up data processing pipeline
+- [ ] Designing initial experiments
+- [ ] Building evaluation framework
+
+## 📚 References
 
-We use [GitHub actions](https://github.com/features/actions) to automatically run tests and check code quality when a new PR is done on `main`.
+Key papers informing this work:
+- "Diffusion Models Beat GANs on Image Synthesis" (Dhariwal & Nichol, 2021)
+- "Structured Denoising Diffusion Models" (Sohl-Dickstein et al., 2015)
+- "High-Resolution Image Synthesis with Latent Diffusion Models" (Rombach et al., 2022)
 
-On any pull request, we will check the code quality and tests.
+## 🤝 Contributing
 
-When a new release is created, we will try to push the new code to PyPi. We use [`twine`](https://twine.readthedocs.io/en/stable/) to make our life easier. 
+We welcome research collaborations! Areas where we're looking for contributions:
+- Novel architectural improvements
+- Efficient training methodologies
+- Evaluation metrics
+- Dataset curation tools
 
-The **correct steps** to create a new realease are the following:
-- edit `__version__` in [src/__init__](/src/__init__.py) to match the wanted new version.
-- create a new [`tag`](https://git-scm.com/docs/git-tag) with the release name, e.g. `git tag v0.0.1 && git push origin v0.0.1` or from the GitHub UI.
-- create a new release from GitHub UI
+## 📬 Contact
 
-The CI will run when you create the new release.
+For research collaboration inquiries:
+- Submit an issue
+- Start a discussion
+- Email: [email protected]
 
-# Docs
-We use MK docs. This repo comes with the zeta docs. All the docs configurations are already here along with the readthedocs configs.
+## ⚖️ License
+
+This research code is released under the MIT License. 
+
+## 🔍 Citation
+
+If you use this code in your research, please cite:
+```bibtex
+@misc{m1music2024,
+  title={M1: Experimental Music Generation via Diffusion Transformers},
+  author={M1 Research Team},
+  year={2024},
+  publisher={GitHub},
+  journal={GitHub repository},
+  howpublished={\url{https://github.com/m1-research/m1-music}}
+}
+```
 
+## 🚧 Disclaimer
 
+This is experimental research code:
+- Architecture and training procedures may change significantly
+- Not yet optimized for production use
+- Results and capabilities are being actively researched
+- Breaking changes should be expected
 
-# License
-MIT
+We're sharing this code to foster collaboration and advance the field of AI music generation research.