MFU calculator for LLM Models

This tool calculates Model Flops Utilization (MFU), for common Large Language Models (LLM). It analyzes training logs generated by DLLogger to provide insights into training efficiency.

Features

MFU Calculation: Computes the MFU based on average step time, model flops, batch size, and accelerator capabilities.
DLLogger Integration: Parses DLLogger log files to extract relevant training data.
Model Support: Includes pre-defined FLOPs per sample for popular LLM models like GPT-3, LLaMa2, and Mixtral.
Accelerator Awareness: Supports various GPU/TPU types with default theoretical TFLOPS values.

Usage

python process_training_results.py --file <path_to_dllogger_file> \
  --batch_size <batch_size> \
  --num_accelerators <num_accelerators> \
  [--model_type <model_type> | --model_flops <model_flops>] \
  [--accelerator_type <accelerator_type> | --max_flops <max_flops>] \
  [--start_step <start_step>] \
  [--end_step <end_step>]

Required Arguments:

--file: Path to the DLLogger log file.
--batch_size: Global batch size used during training.
--num_accelerators: Number of GPUs/TPUs used for training.

Optional Arguments:

--model_type: Type of LLM model used. Choose from predefined options (e.g., "gpt3-5b", "llama2-7b"). Currently supported models:
- gpt3-5b
- gpt3-175b
- llama2-7b
- llama2-70b
- llama3-70b
- mixtral-7b
--model_flops: Manually specify model FLOPs (forward + backward) per sample. Use this if your model is not listed in --model_type.
--accelerator_type: Type of accelerator used. Choose from predefined options (e.g., ""h100", "a100", "v5e", "v5p"). Currently supportes accelerators:
- h100
- a100
- v5e
- v5p
--max_flops: Manually specify the maximum theoretical TFLOPS of the accelerator. Use this in case your accelerator is not currently supported.
--start_step: Specify the starting step of the range to calculate the average training step time. Default to 10
--end_step: Specify the end step of the range to calculate the average training step time. Default to 30

Note: You must provide either --model_type or --model_flops and either --accelerator_type or --max_flops.

Example for a known model and accelerator

python3 process_training_results.py --file examples/dllogger.json \
--batch_size 2048 \
--num_accelerators 256 \
--model_type gpt3-175b \
--accelerator_type h100

This command analyzes the examples/dllogger.json file for a GPT3-175B model trained with a batch size of 2048 on 256 H100 accelerators.

Example for a model and accelerator not supported

# Theoretical FLOPS per fordward + backward step: 1.6E15 for my unsupported model
# Theoretical max flops for the hardware used 1000
python3 process_training_results.py --file examples/dllogger.json \
    --batch_size 2048 \
    --num_accelerators 256 \
    --model_flops 1.6E15 \
    --max_flops 1000

This command analyzes the examples/dllogger.json file for an unknown model that have a 1.6E15 number of FLOPS per step, trained with a batch size of 2048 on 256 X accelerators with max_flops of 1000 Tflops.

Output

The script prints the following information to the console:

Average step time
TFLOPS per accelerator
MFU

MAX TFLOPS for Known Accelerators

Accelerator	Max TFLOPS bf16
h100	989
v5e	197
v5p	459
a100	312

MODEL FLOPS PER SAMPLE

Model	FLOPS per sample
gpt3-5b	6.69e13
gpt3-175b	2.2e15
llama2-7b	1.89e14
llama2-70b	1.82e15
llama3-70b	3.94e15
mixtral-7b	3.4e14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

MFU calculator for LLM Models

Features

Usage

Required Arguments:

Optional Arguments:

Example for a known model and accelerator

Example for a model and accelerator not supported

Output

MAX TFLOPS for Known Accelerators

MODEL FLOPS PER SAMPLE

Files

README.md

Latest commit

History

README.md

File metadata and controls

MFU calculator for LLM Models

Features

Usage

Required Arguments:

Optional Arguments:

Example for a known model and accelerator

Example for a model and accelerator not supported

Output

MAX TFLOPS for Known Accelerators

MODEL FLOPS PER SAMPLE