EriDiffusion - Multi-Model Diffusion Trainer

Pure Rust implementation for training modern diffusion models with GPU acceleration.

Supported Models

Currently Implemented

SDXL - Stable Diffusion XL 1.0
SD 3.5 - Stable Diffusion 3.5 (Medium/Large/Large-Turbo)
Flux - Black Forest Labs Flux (Dev/Schnell)

Planned Models

Image Models:

Flex - Next-gen architecture
OmniGen 2 - Multi-modal generation
HiDream - High-resolution synthesis
Chroma - Advanced color model
Sana - Efficient transformer
Kolors - Bilingual diffusion model

Video Models:

Wan Vace 2.1 - Video generation
LTX - Long-form video synthesis
Hunyuan - Multi-modal video model

Features

Current Features

✅ LoRA Training: Low-rank adaptation for all supported models
✅ FLAME Framework: Pure Rust tensor framework with automatic differentiation
✅ GPU-Only: Industry-standard GPU requirement (no CPU fallback)
✅ ComfyUI Compatible: Saves LoRA weights in ComfyUI format
✅ Memory Optimized: Designed for 24GB VRAM with gradient checkpointing
✅ Integrated Sampling: Generate samples during training to monitor progress
✅ 8-bit Adam: Memory-efficient optimizer
✅ Mixed Precision: BF16/FP16 training support

Planned Features

🚧 Full Finetune: Complete model fine-tuning (not just LoRA)
🚧 DoRA: Weight-Decomposed Low-Rank Adaptation
🚧 LoKr: Low-rank Kronecker product adaptation
🚧 Multi-GPU: Distributed training support
🚧 FSDP: Fully Sharded Data Parallel training
🚧 Flash Attention 3: Latest attention optimizations

Requirements

CUDA-capable GPU (required - no CPU training support)
24GB+ VRAM recommended
CUDA 11.0 or higher
Rust 1.70+
FLAME Framework (included as local dependency)

Important: FLAME Framework

This project uses FLAME (Flexible Learning and Adaptive Memory Engine), a pure Rust tensor framework with:

GPU-accelerated automatic differentiation
Conv2D forward and backward passes with CUDA kernels
Advanced gradient modifications (clipping, normalization, noise injection)
Native Rust implementation without Python dependencies

Setup

Clone the repository:

git clone https://github.com/CodeAlexx/EriDiffusion.git
cd EriDiffusion

The project includes FLAME as a local dependency in the flame/ directory.
Build the project:

cargo build --release

The executable will be at target/release/trainer. Copy it to your PATH or project root:

# Option 1: Copy to project root
cp target/release/trainer .

# Option 2: Install to system (requires sudo)
sudo cp target/release/trainer /usr/local/bin/

Usage

Training Different Models

EriDiffusion uses a single trainer binary that automatically detects the model type from your YAML configuration:

# After building, run from project root:
./trainer config/sdxl_lora_24gb_optimized.yaml
./trainer config/sd35_lora_training.yaml  
./trainer config/flux_lora_24gb.yaml

# Or with full path:
trainer /path/to/config/sdxl_lora_24gb_optimized.yaml

The trainer reads the model architecture from the YAML and automatically routes to the correct training pipeline.

Configuration

Each model has its own config file with model-specific settings:

Model paths (must be local .safetensors files)
Dataset location
Training parameters
LoRA rank and alpha
Sampling settings

Example Config Structure

model:
  name_or_path: "/path/to/sdxl_model.safetensors"
  is_sdxl: true

network:
  type: "lora"
  linear: 16  # LoRA rank
  linear_alpha: 16

train:
  batch_size: 1
  steps: 2000
  gradient_accumulation: 4
  lr: 1e-4
  optimizer: "adamw8bit"
  gradient_checkpointing: true

Output

LoRA weights saved to output/[model_name]/checkpoints/
Sample images saved to output/[model_name]/samples/
All outputs are ComfyUI-compatible

GPU Memory Usage

With default settings on 24GB GPU:

Batch size 1: ~18-20GB
With gradient checkpointing: ~16-18GB
Higher resolutions (1024x1024) may require VAE tiling

Technical Details

Model Architectures

SDXL: U-Net based with dual text encoders (CLIP-L + CLIP-G)
SD 3.5: MMDiT (Multimodal Diffusion Transformer) with triple text encoding
Flux: Hybrid architecture with double/single stream blocks

Training Approach

All models use the FLAME framework which provides:

Automatic differentiation with gradient tracking
GPU-accelerated tensor operations with CUDA kernels
Advanced gradient modifications (clipping, normalization, noise)
Direct safetensors loading and weight management

FLAME Integration

FLAME (Flexible Learning and Adaptive Memory Engine) is our pure Rust tensor framework that replaces Candle entirely. It provides native automatic differentiation and GPU acceleration without any Python dependencies.

Roadmap

Phase 1 (Current)

✅ LoRA training for SDXL, SD 3.5, Flux
✅ Basic sampling during training
✅ Memory optimizations for 24GB GPUs

Phase 2 (In Progress)

🚧 Full model fine-tuning support
🚧 Complete sampling for all models
🚧 Additional model architectures

Phase 3 (Planned)

📋 Video model support (Wan Vace 2.1, LTX, Hunyuan)
📋 Multi-GPU distributed training
📋 Advanced adaptation methods (DoRA, LoKr)

See the documentation for detailed development guidelines and model specifications.

License

MIT OR Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
config		config
examples		examples
src		src
.gitignore		.gitignore
CODE_METRICS.md		CODE_METRICS.md
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EriDiffusion - Multi-Model Diffusion Trainer

Supported Models

Currently Implemented

Planned Models

Features

Current Features

Planned Features

Requirements

Important: FLAME Framework

Setup

Usage

Training Different Models

Configuration

Example Config Structure

Output

GPU Memory Usage

Technical Details

Model Architectures

Training Approach

FLAME Integration

Roadmap

Phase 1 (Current)

Phase 2 (In Progress)

Phase 3 (Planned)

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

CodeAlexx/EriDiffusion

Folders and files

Latest commit

History

Repository files navigation

EriDiffusion - Multi-Model Diffusion Trainer

Supported Models

Currently Implemented

Planned Models

Features

Current Features

Planned Features

Requirements

Important: FLAME Framework

Setup

Usage

Training Different Models

Configuration

Example Config Structure

Output

GPU Memory Usage

Technical Details

Model Architectures

Training Approach

FLAME Integration

Roadmap

Phase 1 (Current)

Phase 2 (In Progress)

Phase 3 (Planned)

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages