You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried to run text generation with prompts using generate.py. I provided a large list of prompts, approximately 20K, and tried to run the generation on 10 RTX 8000 GPUs. However, the GPU utilization by nvidia-smi shows that the GPU utilization during generation is averaging at about 50-60%, which is not ideal. Thank you!
My configuration is:
{
# Text gen type: `input-file`, `unconditional` or `interactive`
"text-gen-type": "input-file", #"input-file",
# Params for all
"maximum_tokens": 256,
"temperature": 0.2,
"top_p": 0.95,
"top_k": 0,
"recompute": false,
# `unconditional`/`input-file`: samples
"num-samples": 100,
# input/output file
"sample-input-file": "0",
"data-path": "data/code/code_text_document",
# or for weighted datasets:
# "train-data-paths": ["data/enron/enron_text_document", "data/enron/enron_text_document"],
# "test-data-paths": ["data/enron/enron_text_document", "data/enron/enron_text_document"],
# "valid-data-paths": ["data/enron/enron_text_document", "data/enron/enron_text_document"],
# "train-data-weights": [1., 2.],
# "test-data-weights": [2., 1.],
# "valid-data-weights": [0.5, 0.4],
# If weight_by_num_documents is True, Builds dataset weights from a multinomial distribution over groups of data according to the number of documents in each group.
# WARNING: setting this to True will override any user provided weights
# "weight_by_num_documents": false,
# "weighted_sampler_alpha": 0.3,
"vocab-file": "data/code-vocab.json",
"merge-file": "data/code-merges.txt",
"save": "checkpoints",
"load": "checkpoints",
"checkpoint_validation_with_forward_pass": False,
"tensorboard-dir": "tensorboard",
"log-dir": "logs",
"use_wandb": True,
"wandb_host": "https://api.wandb.ai",
"wandb_project": "neox",
}
And the model config:
# GPT-2 pretraining setup
{
# parallelism settings ( you will want to change these based on your cluster setup, ideally scheduling pipeline stages
# across the node boundaries )
"pipe-parallel-size": 1,
"model-parallel-size": 1,
# model settings
"num-layers": 32,
"hidden-size": 2560,
"num-attention-heads": 32,
"seq-length": 2048,
"max-position-embeddings": 2048,
"norm": "layernorm",
"pos-emb": "rotary",
"no-weight-tying": true,
# these should provide some speedup but takes a while to build, set to true if desired
"scaled-upper-triang-masked-softmax-fusion": true,
"bias-gelu-fusion": true,
# optimizer settings
"zero_allow_untested_optimizer": true,
"optimizer": {
"type": "adam",
"params": {
"lr": 0.00016,
"betas": [0.9, 0.999],
"eps": 1.0e-8,
}
},
"zero_optimization": {
"stage": 1,
"allgather_partitions": True,
"allgather_bucket_size": 500000000,
"overlap_comm": True,
"reduce_scatter": True,
"reduce_bucket_size": 500000000,
"contiguous_gradients": True,
"cpu_offload": False
},
# batch / data settings
"train_micro_batch_size_per_gpu": 16,
"gradient_accumulation_steps": 1,
"data-impl": "mmap",
"split": "989,10,1",
# activation checkpointing
"checkpoint-activations": true,
"checkpoint-num-layers": 1,
"partition-activations": true,
"synchronize-each-layer": true,
# regularization
"gradient_clipping": 1.0,
"weight-decay": 0,
"hidden-dropout": 0,
"attention-dropout": 0,
# precision settings
"fp16": {
"fp16": true,
"enabled": true,
"loss_scale": 0,
"initial_scale_power": 16,
"loss_scale_window": 1000,
"hysteresis": 2,
"min_loss_scale": 1
},
# misc. training settings
"train-iters": 160000,
"lr-decay-iters": 160000,
"distributed-backend": "nccl",
"lr-decay-style": "cosine",
"warmup": 0.01,
"save-interval": 1000,
"eval-interval": 1000,
"eval-iters": 10,
# logging
"log-interval": 100,
"steps_per_print": 10,
"keep-last-n-checkpoints": 1,
"wall_clock_breakdown": true,
}
The text was updated successfully, but these errors were encountered:
Thanks for the reply. I wonder how should I increase the batch size during generation? In the configuration file it has batch size for training, such as "train_micro_batch_size_per_gpu"
I tried to run text generation with prompts using
generate.py
. I provided a large list of prompts, approximately 20K, and tried to run the generation on 10 RTX 8000 GPUs. However, the GPU utilization by nvidia-smi shows that the GPU utilization during generation is averaging at about 50-60%, which is not ideal. Thank you!My configuration is:
And the model config:
The text was updated successfully, but these errors were encountered: