[Bug] large CUDA memory usage in the evaluation phase #284

ChenMnZ · 2024-01-02T11:25:30Z

I train llama-7b with the following batch size settings:

    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 4 \

When training, it consumes about 9G GPU memory. However, when evaluation (mmlu evaluation), the memory consumption increase to 27GB. It is there any bug for the evaluation process?

The text was updated successfully, but these errors were encountered:

tianshu-zhu · 2024-01-29T23:18:35Z

set --eval_accumulation_steps

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] large CUDA memory usage in the evaluation phase #284

[Bug] large CUDA memory usage in the evaluation phase #284

ChenMnZ commented Jan 2, 2024

tianshu-zhu commented Jan 29, 2024

[Bug] large CUDA memory usage in the evaluation phase #284

[Bug] large CUDA memory usage in the evaluation phase #284

Comments

ChenMnZ commented Jan 2, 2024

tianshu-zhu commented Jan 29, 2024