We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
data_impl ....................... mmap........................updated deepspeed_extra_args ............ {'bf16': {'enabled': True}}.updated dynamic_loss_scale .............. True........................updated eval_interval ................... 40000.......................updated eval_iters ...................... 10..........................updated fp32_allreduce .................. True........................updated global_num_gpus ................. 4...........................updated gpt_j_residual .................. True........................updated hidden_size ..................... 768.........................updated init_method ..................... small_init..................updated is_pipe_parallel ................ True........................updated launcher ........................ slurm.......................updated log_interval .................... 10..........................updated lr .............................. 0.0006......................updated lr_decay_iters .................. 143000......................updated lr_decay_style .................. cosine......................updated max_position_embeddings ......... 2048........................updated min_lr .......................... 6e-05.......................updated no_weight_tying ................. True........................updated num_attention_heads ............. 12..........................updated num_layers ...................... 12..........................updated num_workers ..................... 32..........................updated optimizer ....................... {'type': 'Adam', 'params': {'lr': 0.0006, 'betas': [0.9, 0.95], 'eps': 1e-08}}updated optimizer_type .................. Adam........................updated output_layer_init_method ........ wang_init...................updated partition_activations ........... True........................updated pipe_parallel_size .............. 1...........................updated pos_emb ......................... rotary......................updated precision ....................... bfloat16....................updated rotary_pct ...................... 0.25........................updated save ............................ /pythia/checkpoints/test_1updated save_iters ...................... [10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000, 110000, 120000, 130000, 140000]updated seq_length ...................... 2048........................updated sparsity_config ................. {}..........................updated synchronize_each_layer .......... True........................updated test_data_paths ................. ['/pile_0.87_deduped_text_document/pile_0.87_deduped_text_document']updated test_data_weights ............... [1.0].......................updated text_gen_type ................... unconditional...............updated tokenizer_type .................. HFTokenizer.................updated train_batch_size ................ 128.........................updated train_data_paths ................ ['/pile_0.87_deduped_text_document/pile_0.87_deduped_text_document']updated train_data_weights .............. [1.0].......................updated train_iters ..................... 143000......................updated train_micro_batch_size_per_gpu .. 32..........................updated user_script ..................... train.py....................updated valid_data_paths ................ ['pile_0.87_deduped_text_document/pile_0.87_deduped_text_document']updated valid_data_weights .............. [1.0].......................updated vocab_file ....................../pythia/utils/20B_tokenizer.jsonupdated wall_clock_breakdown ............ True........................updated zero_allgather_bucket_size ...... 500000000...................updated zero_contiguous_gradients ....... True........................updated zero_optimization ............... {'stage': 0, 'allgather_partitions': True, 'allgather_bucket_size': 500000000, 'overlap_comm': True, 'reduce_scatter': True, 'reduce_bucket_size': 500000000, 'contiguous_gradients': True, 'cpu_offload': False, 'load_from_fp32_weights': False}updated zero_reduce_bucket_size ......... 500000000...................updated zero_reduce_scatter ............. True........................updated zero_stage ...................... 0...........................updated account ......................... None........................default activation ...................... gelu........................default activation_checkpointing ........ None........................default adlr_autoresume ................. False.......................default adlr_autoresume_interval ........ 1000........................default amp ............................. None........................default apply_query_key_layer_scaling ... False.......................default attention_dropout ............... 0...........................default attention_softmax_in_fp32 ....... False.......................default autotuning ...................... None........................default autotuning_run .................. None........................default base_shapes_file ................ None........................default bf16 ............................ None........................default bias_dropout_fusion ............. False.......................default bias_gelu_fusion ................ False.......................default char_level_ppl .................. False.......................default checkpoint ...................... None........................default checkpoint_in_cpu ............... False.......................default checkpoint_num_layers ........... 1...........................default checkpoint_scale ................ linear......................default checkpoint_validation_with_forward_pass False................default clip_grad ....................... 1.0.........................default comment ......................... None........................default comms_logger .................... None........................default communication_data_type ......... None........................default compression_training ............ None........................default contiguous_checkpointing ........ False.......................default coord_check ..................... False.......................default create_moe_param_group .......... True........................default csv_monitor ..................... None........................default curriculum_learning ............. None........................default curriculum_seqlen ............... 0...........................default data_efficiency ................. None........................default data_path ....................... None........................default data_types ...................... None........................default deepscale ....................... False.......................default deepscale_config ................ None........................default deepspeed ....................... True........................default deepspeed_activation_checkpointing True......................default deepspeed_mpi ................... False.......................default deepspeed_slurm ................. False.......................default detect_nvlink_pairs ............. False.......................default distributed_backend ............. nccl........................default do_test ......................... None........................default do_train ........................ None........................default do_valid ........................ None........................default dump_state ...................... False.......................default elasticity ...................... None........................default enable_expert_tensor_parallelism False.......................default eod_mask_loss ................... False.......................default eval_results_prefix ............. ............................default eval_tasks ...................... None........................default exclude ......................... None........................default exit_interval ................... None........................default expert_interval ................. 2...........................default extra_save_iters ................ None........................default finetune ........................ False.......................default flops_profiler .................. None........................default force_multi ..................... False.......................default fp16 ............................ None........................default fp16_lm_cross_entropy ........... False.......................default git_hash ........................ 4c426da.....................default gmlp_attn_dim ................... 64..........................default gpt_j_tied ...................... False.......................default gradient_accumulation_steps ..... 1...........................default gradient_clipping ............... 1.0.........................default gradient_noise_scale_cpu_offload False.......................default gradient_noise_scale_n_batches .. 5...........................default gradient_predivide_factor ....... 1.0.........................default hidden_dropout .................. 0...........................default hostfile ........................ None........................default hysteresis ...................... 2...........................default include ......................... None........................default init_method_std ................. 0.02........................default intermediate_size ............... None........................default iteration ....................... None........................default keep_last_n_checkpoints ......... None........................default label_data_paths ................ None........................default layernorm_epsilon ............... 1e-05.......................default layernorm_fusion ................ False.......................default lazy_mpu_init ................... False.......................default load ............................ None........................default local_rank ...................... None........................default log_dir ......................... None........................default log_grad_norm ................... False.......................default log_grad_pct_zeros .............. False.......................default log_gradient_noise_scale ........ False.......................default log_optimizer_states ............ False.......................default log_param_norm .................. False.......................default loss_scale ...................... None........................default loss_scale_window ............... 1000.0......................default make_vocab_size_divisible_by .... 128.........................default mamba_causal_conv_fusion ........ False.......................default mamba_inner_func_fusion ......... False.......................default mamba_selective_fp32_params ..... True........................default mamba_selective_scan_fusion ..... False.......................default mamba_use_bias_in_conv .......... True........................default mamba_use_bias_in_linears ....... False.......................default master_addr ..................... None........................default master_port ..................... 29500.......................default maximum_tokens .................. 64..........................default memory_profiling ................ False.......................default memory_profiling_path ........... None........................default merge_file ...................... None........................default min_scale ....................... 1.0.........................default mlp_type ........................ regular.....................default mmap_warmup ..................... False.......................default model_parallel_size ............. 1...........................default moe_eval_capacity_factor ........ 1.0.........................default moe_expert_parallel_size ........ 1...........................default moe_glu ......................... False.......................default moe_jitter_eps .................. None........................default moe_lbl_in_fp32 ................. False.......................default moe_loss_coeff .................. 0.1.........................default moe_min_capacity ................ 4...........................default moe_num_experts ................. 1...........................default moe_token_dropping .............. False.......................default moe_top_k ....................... 1...........................default moe_train_capacity_factor ....... 1.0.........................default moe_type ........................ megablocks..................default moe_use_residual ................ True........................default mup_attn_temp ................... 1.0.........................default mup_embedding_mult .............. 1.0.........................default mup_init_scale .................. 1.0.........................default mup_output_temp ................. 1.0.........................default mup_rp_embedding_mult ........... 1.0.........................default mup_width_scale ................. 2...........................default no_load_optim ................... False.......................default no_load_rng ..................... False.......................default no_save_optim ................... False.......................default no_save_rng ..................... False.......................default no_ssh_check .................... False.......................default norm ............................ layernorm...................default num_gpus ........................ None........................default num_kv_heads .................... None........................default num_nodes ....................... -1..........................default num_samples ..................... 1...........................default num_unique_layers ............... None........................default onnx_safe ....................... False.......................default opt_pos_emb_offset .............. 0...........................default output_layer_parallelism ........ column......................default override_lr_scheduler ........... False.......................default padded_vocab_size ............... None........................default param_sharing_style ............. grouped.....................default pipe_partition_method ........... type:transformer|mlp........default prescale_gradients .............. False.......................default profile ......................... False.......................default profile_backward ................ False.......................default profile_step_start .............. 10..........................default profile_step_stop ............... 12..........................default prompt_end ...................... ...........................default rank ............................ None........................default recompute ....................... False.......................default return_logits ................... False.......................default rms_norm_epsilon ................ 1e-08.......................default rope_fusion ..................... False.......................default rotary_emb_base ................. 10000.......................default rotary_save_freqs_buffer ........ False.......................default rpe_max_distance ................ 128.........................default rpe_num_buckets ................. 32..........................default s3_chunk_size ................... 104857600...................default s3_path ......................... None........................default sample_input_file ............... None........................default sample_output_file .............. samples.txt.................default save_base_shapes ................ False.......................default scaled_masked_softmax_fusion .... False.......................default scaled_upper_triang_masked_softmax_fusion False..............default scalenorm_epsilon ............... 1e-08.......................default scheduler ....................... None........................default seed ............................ 1234........................default short_seq_prob .................. 0.1.........................default sliding_window_width ............ None........................default soft_prompt_tuning .............. None........................default sparse_attention ................ None........................default sparse_gradients ................ False.......................default split ........................... 969, 30, 1..................default steps_per_print ................. 10..........................default temperature ..................... 0.0.........................default tensorboard ..................... None........................default tensorboard_dir ................. None........................default top_k ........................... 0...........................default top_p ........................... 0.0.........................default use_bias_in_attn_linear ......... True........................default use_bias_in_norms ............... True........................default use_bnb_optimizer ............... False.......................default use_checkpoint_lr_scheduler ..... False.......................default use_cpu_initialization .......... False.......................default use_mup ......................... False.......................default use_qk_layernorm ................ False.......................default use_shared_fs ................... True........................default use_tutel ....................... False.......................default use_wandb ....................... None........................default wandb ........................... None........................default wandb_group ..................... None........................default wandb_host ...................... https://api.wandb.ai........default wandb_init_all_ranks ............ False.......................default wandb_project ................... neox........................default wandb_team ...................... None........................default warmup .......................... 0.01........................default weight_by_num_documents ......... False.......................default weight_decay .................... 0.1.........................default weighted_sampler_alpha .......... 1.0.........................default world_size ...................... None........................default
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Batch_input and elapsed time per iteration slow down during model training
Arguments
data_impl ....................... mmap........................updated
deepspeed_extra_args ............ {'bf16': {'enabled': True}}.updated
dynamic_loss_scale .............. True........................updated
eval_interval ................... 40000.......................updated
eval_iters ...................... 10..........................updated
fp32_allreduce .................. True........................updated
global_num_gpus ................. 4...........................updated
gpt_j_residual .................. True........................updated
hidden_size ..................... 768.........................updated
init_method ..................... small_init..................updated
is_pipe_parallel ................ True........................updated
launcher ........................ slurm.......................updated
log_interval .................... 10..........................updated
lr .............................. 0.0006......................updated
lr_decay_iters .................. 143000......................updated
lr_decay_style .................. cosine......................updated
max_position_embeddings ......... 2048........................updated
min_lr .......................... 6e-05.......................updated
no_weight_tying ................. True........................updated
num_attention_heads ............. 12..........................updated
num_layers ...................... 12..........................updated
num_workers ..................... 32..........................updated
optimizer ....................... {'type': 'Adam', 'params': {'lr': 0.0006, 'betas': [0.9, 0.95], 'eps': 1e-08}}updated
optimizer_type .................. Adam........................updated
output_layer_init_method ........ wang_init...................updated
partition_activations ........... True........................updated
pipe_parallel_size .............. 1...........................updated
pos_emb ......................... rotary......................updated
precision ....................... bfloat16....................updated
rotary_pct ...................... 0.25........................updated
save ............................ /pythia/checkpoints/test_1updated
save_iters ...................... [10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000, 110000, 120000, 130000, 140000]updated
seq_length ...................... 2048........................updated
sparsity_config ................. {}..........................updated
synchronize_each_layer .......... True........................updated
test_data_paths ................. ['/pile_0.87_deduped_text_document/pile_0.87_deduped_text_document']updated
test_data_weights ............... [1.0].......................updated
text_gen_type ................... unconditional...............updated
tokenizer_type .................. HFTokenizer.................updated
train_batch_size ................ 128.........................updated
train_data_paths ................ ['/pile_0.87_deduped_text_document/pile_0.87_deduped_text_document']updated
train_data_weights .............. [1.0].......................updated
train_iters ..................... 143000......................updated
train_micro_batch_size_per_gpu .. 32..........................updated
user_script ..................... train.py....................updated
valid_data_paths ................ ['pile_0.87_deduped_text_document/pile_0.87_deduped_text_document']updated
valid_data_weights .............. [1.0].......................updated
vocab_file ....................../pythia/utils/20B_tokenizer.jsonupdated
wall_clock_breakdown ............ True........................updated
zero_allgather_bucket_size ...... 500000000...................updated
zero_contiguous_gradients ....... True........................updated
zero_optimization ............... {'stage': 0, 'allgather_partitions': True, 'allgather_bucket_size': 500000000, 'overlap_comm': True, 'reduce_scatter': True, 'reduce_bucket_size': 500000000, 'contiguous_gradients': True, 'cpu_offload': False, 'load_from_fp32_weights': False}updated
zero_reduce_bucket_size ......... 500000000...................updated
zero_reduce_scatter ............. True........................updated
zero_stage ...................... 0...........................updated
account ......................... None........................default
activation ...................... gelu........................default
activation_checkpointing ........ None........................default
adlr_autoresume ................. False.......................default
adlr_autoresume_interval ........ 1000........................default
amp ............................. None........................default
apply_query_key_layer_scaling ... False.......................default
attention_dropout ............... 0...........................default
attention_softmax_in_fp32 ....... False.......................default
autotuning ...................... None........................default
autotuning_run .................. None........................default
base_shapes_file ................ None........................default
bf16 ............................ None........................default
bias_dropout_fusion ............. False.......................default
bias_gelu_fusion ................ False.......................default
char_level_ppl .................. False.......................default
checkpoint ...................... None........................default
checkpoint_in_cpu ............... False.......................default
checkpoint_num_layers ........... 1...........................default
checkpoint_scale ................ linear......................default
checkpoint_validation_with_forward_pass False................default
clip_grad ....................... 1.0.........................default
comment ......................... None........................default
comms_logger .................... None........................default
communication_data_type ......... None........................default
compression_training ............ None........................default
contiguous_checkpointing ........ False.......................default
coord_check ..................... False.......................default
create_moe_param_group .......... True........................default
csv_monitor ..................... None........................default
curriculum_learning ............. None........................default
curriculum_seqlen ............... 0...........................default
data_efficiency ................. None........................default
data_path ....................... None........................default
data_types ...................... None........................default
deepscale ....................... False.......................default
deepscale_config ................ None........................default
deepspeed ....................... True........................default
deepspeed_activation_checkpointing True......................default
deepspeed_mpi ................... False.......................default
deepspeed_slurm ................. False.......................default
detect_nvlink_pairs ............. False.......................default
distributed_backend ............. nccl........................default
do_test ......................... None........................default
do_train ........................ None........................default
do_valid ........................ None........................default
dump_state ...................... False.......................default
elasticity ...................... None........................default
enable_expert_tensor_parallelism False.......................default
eod_mask_loss ................... False.......................default
eval_results_prefix ............. ............................default
eval_tasks ...................... None........................default
exclude ......................... None........................default
exit_interval ................... None........................default
expert_interval ................. 2...........................default
extra_save_iters ................ None........................default
finetune ........................ False.......................default
flops_profiler .................. None........................default
force_multi ..................... False.......................default
fp16 ............................ None........................default
fp16_lm_cross_entropy ........... False.......................default
git_hash ........................ 4c426da.....................default
gmlp_attn_dim ................... 64..........................default
gpt_j_tied ...................... False.......................default
gradient_accumulation_steps ..... 1...........................default
gradient_clipping ............... 1.0.........................default
gradient_noise_scale_cpu_offload False.......................default
gradient_noise_scale_n_batches .. 5...........................default
gradient_predivide_factor ....... 1.0.........................default
hidden_dropout .................. 0...........................default
hostfile ........................ None........................default
hysteresis ...................... 2...........................default
include ......................... None........................default
init_method_std ................. 0.02........................default
intermediate_size ............... None........................default
iteration ....................... None........................default
keep_last_n_checkpoints ......... None........................default
label_data_paths ................ None........................default
layernorm_epsilon ............... 1e-05.......................default
layernorm_fusion ................ False.......................default
lazy_mpu_init ................... False.......................default
load ............................ None........................default
local_rank ...................... None........................default
log_dir ......................... None........................default
log_grad_norm ................... False.......................default
log_grad_pct_zeros .............. False.......................default
log_gradient_noise_scale ........ False.......................default
log_optimizer_states ............ False.......................default
log_param_norm .................. False.......................default
loss_scale ...................... None........................default
loss_scale_window ............... 1000.0......................default
make_vocab_size_divisible_by .... 128.........................default
mamba_causal_conv_fusion ........ False.......................default
mamba_inner_func_fusion ......... False.......................default
mamba_selective_fp32_params ..... True........................default
mamba_selective_scan_fusion ..... False.......................default
mamba_use_bias_in_conv .......... True........................default
mamba_use_bias_in_linears ....... False.......................default
master_addr ..................... None........................default
master_port ..................... 29500.......................default
maximum_tokens .................. 64..........................default
memory_profiling ................ False.......................default
memory_profiling_path ........... None........................default
merge_file ...................... None........................default
min_scale ....................... 1.0.........................default
mlp_type ........................ regular.....................default
mmap_warmup ..................... False.......................default
model_parallel_size ............. 1...........................default
moe_eval_capacity_factor ........ 1.0.........................default
moe_expert_parallel_size ........ 1...........................default
moe_glu ......................... False.......................default
moe_jitter_eps .................. None........................default
moe_lbl_in_fp32 ................. False.......................default
moe_loss_coeff .................. 0.1.........................default
moe_min_capacity ................ 4...........................default
moe_num_experts ................. 1...........................default
moe_token_dropping .............. False.......................default
moe_top_k ....................... 1...........................default
moe_train_capacity_factor ....... 1.0.........................default
moe_type ........................ megablocks..................default
moe_use_residual ................ True........................default
mup_attn_temp ................... 1.0.........................default
mup_embedding_mult .............. 1.0.........................default
mup_init_scale .................. 1.0.........................default
mup_output_temp ................. 1.0.........................default
mup_rp_embedding_mult ........... 1.0.........................default
mup_width_scale ................. 2...........................default
no_load_optim ................... False.......................default
no_load_rng ..................... False.......................default
no_save_optim ................... False.......................default
no_save_rng ..................... False.......................default
no_ssh_check .................... False.......................default
norm ............................ layernorm...................default
num_gpus ........................ None........................default
num_kv_heads .................... None........................default
num_nodes ....................... -1..........................default
num_samples ..................... 1...........................default
num_unique_layers ............... None........................default
onnx_safe ....................... False.......................default
opt_pos_emb_offset .............. 0...........................default
output_layer_parallelism ........ column......................default
override_lr_scheduler ........... False.......................default
padded_vocab_size ............... None........................default
param_sharing_style ............. grouped.....................default
pipe_partition_method ........... type:transformer|mlp........default
prescale_gradients .............. False.......................default
profile ......................... False.......................default
profile_backward ................ False.......................default
profile_step_start .............. 10..........................default
profile_step_stop ............... 12..........................default
prompt_end ......................
...........................default
rank ............................ None........................default
recompute ....................... False.......................default
return_logits ................... False.......................default
rms_norm_epsilon ................ 1e-08.......................default
rope_fusion ..................... False.......................default
rotary_emb_base ................. 10000.......................default
rotary_save_freqs_buffer ........ False.......................default
rpe_max_distance ................ 128.........................default
rpe_num_buckets ................. 32..........................default
s3_chunk_size ................... 104857600...................default
s3_path ......................... None........................default
sample_input_file ............... None........................default
sample_output_file .............. samples.txt.................default
save_base_shapes ................ False.......................default
scaled_masked_softmax_fusion .... False.......................default
scaled_upper_triang_masked_softmax_fusion False..............default
scalenorm_epsilon ............... 1e-08.......................default
scheduler ....................... None........................default
seed ............................ 1234........................default
short_seq_prob .................. 0.1.........................default
sliding_window_width ............ None........................default
soft_prompt_tuning .............. None........................default
sparse_attention ................ None........................default
sparse_gradients ................ False.......................default
split ........................... 969, 30, 1..................default
steps_per_print ................. 10..........................default
temperature ..................... 0.0.........................default
tensorboard ..................... None........................default
tensorboard_dir ................. None........................default
top_k ........................... 0...........................default
top_p ........................... 0.0.........................default
use_bias_in_attn_linear ......... True........................default
use_bias_in_norms ............... True........................default
use_bnb_optimizer ............... False.......................default
use_checkpoint_lr_scheduler ..... False.......................default
use_cpu_initialization .......... False.......................default
use_mup ......................... False.......................default
use_qk_layernorm ................ False.......................default
use_shared_fs ................... True........................default
use_tutel ....................... False.......................default
use_wandb ....................... None........................default
wandb ........................... None........................default
wandb_group ..................... None........................default
wandb_host ...................... https://api.wandb.ai........default
wandb_init_all_ranks ............ False.......................default
wandb_project ................... neox........................default
wandb_team ...................... None........................default
warmup .......................... 0.01........................default
weight_by_num_documents ......... False.......................default
weight_decay .................... 0.1.........................default
weighted_sampler_alpha .......... 1.0.........................default
world_size ...................... None........................default
Environment:
Hardware:
The text was updated successfully, but these errors were encountered: