Skip to content
Thoughts on GitHub Models?

Phi-3.5-MoE instruct (128k)

A new mixture of experts model
Context
131k input · 4k output
Training date
Aug 2024
Rate limit tier
Provider support
Try Phi-3.5-MoE instruct (128k)
Azure hosted. AI powered, can make mistakes. . Subject to Product Terms & Privacy Statement. Not intended for production/sensitive data.
What are some of the most famous works of Shakespeare?
What is the history of the Great Wall of China?
What are some popular tourist attractions in Paris?

Model navigation navigation

Microsoft

Phi-3.5-MoE is a lightweight, state-of-the-art open model built upon datasets used for Phi-3 - synthetic data and filtered publicly available documents - with a focus on very high-quality, reasoning dense data. The model supports multilingual and comes with 128K context length (in tokens). The model underwent a rigorous enhancement process, incorporating supervised fine-tuning, proximal policy optimization, and direct preference optimization to ensure precise instruction adherence and robust safety measures.

Resources

🏡 Phi-3 Portal

📰 Phi-3 Microsoft Blog

📖 Phi-3 Technical Report

👩‍🍳 Phi-3 Cookbook

Model Architecture

Phi-3.5-MoE has 16x3.8B parameters with 6.6B active parameters when using 2 experts. The model is a mixture-of-expert decoder-only Transformer model using the tokenizer with vocabulary size of 32,064.

Training Data

This is a static model trained on an offline dataset with 4.9T tokens and a cutoff date October 2023 for publicly available data. Future versions of the tuned models may be released as we improve models.

Languages

 (23)
English, Arabic, Chinese, Czech, Danish, Dutch, Finnish, French, German, Hebrew, Hungarian, Italian

About

A new mixture of experts model
Context
131k input · 4k output
Training date
Aug 2024
Rate limit tier
Provider support

Languages

 (23)
English, Arabic, Chinese, Czech, Danish, Dutch, Finnish, French, German, Hebrew, Hungarian, Italian