Skip to content

Conversation

@vukrosic
Copy link
Contributor

No description provided.

This comprehensive experiment explores how routing temperature affects MoE training:

New Components:
- Temperature-aware router with dynamic temperature control
- Temperature-aware MoE layer with detailed routing statistics
- Custom trainer that tracks routing dynamics over training
- Temperature scheduling support (linear, cosine, exponential, step)

Experiments (13 total):
- Temperature ablation: 8 temperatures from 0.5 to 10.0
- Temperature schedules: 4 different scheduling strategies
- Extended training: 1 longer run with best temperature

Metrics Tracked:
- Performance: loss, accuracy, perplexity
- Routing: entropy, selection confidence, expert utilization
- Specialization: Gini coefficient, utilization variance
- Load balancing: auxiliary loss

Visualization Suite:
- Temperature comparison plots (loss, accuracy, entropy vs temp)
- Routing dynamics analysis (entropy evolution, confidence trends)
- Expert utilization patterns (per-expert bars, heatmaps)
- Schedule comparison (loss curves, temperature evolution)
- Specialization analysis (Gini coefficients, variance)

Analysis Tools:
- Comprehensive plotting (plot_results.py)
- Expert specialization analysis (analyze_specialization.py)
- Summary report generation

Features:
- Uses optimal Muon settings from exp9
- Comprehensive documentation (README, EXPERIMENT_CARD, EXPERIMENT_SUMMARY)
- Quick demo script for rapid testing
- Modular design for easy extension

Expected Insights:
- Optimal routing temperature for MoE training
- Trade-offs between exploration and exploitation
- Expert specialization patterns under different temperatures
- Effectiveness of temperature scheduling strategies

This experiment will generate significant new knowledge about MoE routing dynamics.
- Changed references from 'final_metrics.loss' to 'final_metrics.val_loss' and 'final_metrics.accuracy' to 'final_metrics.val_accuracy' across multiple scripts for consistency in reporting.
- Added 'seaborn' to requirements.txt for enhanced visualization capabilities.
- Updated optimizer setup in tracking_trainer.py to support multiple optimizers and their respective learning rate schedulers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants