Mixtral

Mixtral is a state-of-the-art AI model developed by Mistral AI, utilizing a sparse mixture-of-experts (MoE) architecture.

To get started, follow the instructions at mistral-inference to download the model. Once downloaded, run llama_or_mistral_ckpt.py to convert the checkpoint for MaxText compatibility. You can then proceed with decoding, pretraining, and finetuning. You could find Mixtral 8x7B example in the end_to_end/tpu/mixtral/8x7b test scripts.

Additionally, Mixtral integrates with MegaBlocks, an efficient dropless MoE strategy, which can be activated by setting the megablox flag to True (default).

MaxText supports pretraining and finetuning with high performance

Model Flop utilization for training on v5p TPUs.

Model size	Accelerator type	TFLOP/chip/sec	Model flops utilization (MFU)
Mixtral 8X7B	v5p-128	251.94	54.89%

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run_Mixtral.md

Run_Mixtral.md

Mixtral

MaxText supports pretraining and finetuning with high performance

Files

Run_Mixtral.md

Latest commit

History

Run_Mixtral.md

File metadata and controls

Mixtral

MaxText supports pretraining and finetuning with high performance