Skip to content

Latest commit

 

History

History

training_metrics

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

MFU calculator for LLM Models

This tool calculates Model Flops Utilization (MFU), for common Large Language Models (LLM). It analyzes training logs generated by DLLogger to provide insights into training efficiency.

Features

  • MFU Calculation: Computes the MFU based on average step time, model flops, batch size, and accelerator capabilities.
  • DLLogger Integration: Parses DLLogger log files to extract relevant training data.
  • Model Support: Includes pre-defined FLOPs per sample for popular LLM models like GPT-3, LLaMa2, and Mixtral.
  • Accelerator Awareness: Supports various GPU/TPU types with default theoretical TFLOPS values.

Usage

python process_training_results.py --file <path_to_dllogger_file> \
  --batch_size <batch_size> \
  --num_accelerators <num_accelerators> \
  [--model_type <model_type> | --model_flops <model_flops>] \
  [--accelerator_type <accelerator_type> | --max_flops <max_flops>] \
  [--start_step <start_step>] \
  [--end_step <end_step>] 
  

Required Arguments:

  • --file: Path to the DLLogger log file.
  • --batch_size: Global batch size used during training.
  • --num_accelerators: Number of GPUs/TPUs used for training.

Optional Arguments:

  • --model_type: Type of LLM model used. Choose from predefined options (e.g., "gpt3-5b", "llama2-7b"). Currently supported models:
    • gpt3-5b
    • gpt3-175b
    • llama2-7b
    • llama2-70b
    • llama3-70b
    • mixtral-7b
  • --model_flops: Manually specify model FLOPs (forward + backward) per sample. Use this if your model is not listed in --model_type.
  • --accelerator_type: Type of accelerator used. Choose from predefined options (e.g., ""h100", "a100", "v5e", "v5p"). Currently supportes accelerators:
    • h100
    • a100
    • v5e
    • v5p
  • --max_flops: Manually specify the maximum theoretical TFLOPS of the accelerator. Use this in case your accelerator is not currently supported.
  • --start_step: Specify the starting step of the range to calculate the average training step time. Default to 10
  • --end_step: Specify the end step of the range to calculate the average training step time. Default to 30

Note: You must provide either --model_type or --model_flops and either --accelerator_type or --max_flops.

Example for a known model and accelerator

python3 process_training_results.py --file examples/dllogger.json \
--batch_size 2048 \
--num_accelerators 256 \
--model_type gpt3-175b \
--accelerator_type h100

This command analyzes the examples/dllogger.json file for a GPT3-175B model trained with a batch size of 2048 on 256 H100 accelerators.

Example for a model and accelerator not supported

# Theoretical FLOPS per fordward + backward step: 1.6E15 for my unsupported model
# Theoretical max flops for the hardware used 1000
python3 process_training_results.py --file examples/dllogger.json \
    --batch_size 2048 \
    --num_accelerators 256 \
    --model_flops 1.6E15 \
    --max_flops 1000

This command analyzes the examples/dllogger.json file for an unknown model that have a 1.6E15 number of FLOPS per step, trained with a batch size of 2048 on 256 X accelerators with max_flops of 1000 Tflops.

Output

The script prints the following information to the console:

  • Average step time
  • TFLOPS per accelerator
  • MFU

MAX TFLOPS for Known Accelerators

Accelerator Max TFLOPS bf16
h100 989
v5e 197
v5p 459
a100 312

MODEL FLOPS PER SAMPLE

Model FLOPS per sample
gpt3-5b 6.69e13
gpt3-175b 2.2e15
llama2-7b 1.89e14
llama2-70b 1.82e15
llama3-70b 3.94e15
mixtral-7b 3.4e14