We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why there was only 1 epoch run during training the translator? Was is enough for "max_epoch": 1?
(graphtranslator) user@k9:.../Translator/train $ python train.py --cfg-path ./pretrain_arxiv_stage2.yaml Not using distributed mode 2024-09-09 20:10:07,222 [INFO] ===== Running Parameters ===== 2024-09-09 20:10:07,222 [INFO] { "accum_grad_iters": 32, "amp": true, "batch_size_eval": 64, "batch_size_train": 1, "device": "cuda:0", "dist_url": "env://", "distributed": false, "evaluate": false, "init_lr": 0.0001, "log_freq": 50, "lr_sched": "linear_warmup_cosine_lr", "max_epoch": 1, "min_lr": 1e-05, "output_dir": "../model_output/pretrain_arxiv_stage2", "resume_ckpt_path": null, "seed": 42, "task": "arxiv_text_pretrain", "train_splits": [ "train" ], "warmup_lr": 1e-06, "warmup_steps": 5000, "weight_decay": 0.05 } 2024-09-09 20:10:07,222 [INFO] ====== Dataset Attributes ====== 2024-09-09 20:10:07,223 [INFO] ======== arxiv_caption ======= 2024-09-09 20:10:07,223 [INFO] { "arxiv_processor": { "train": { "max_length": 1024, "name": "translator_arxiv_train", "vocab_size": 100000 } }, "datasets_dir": "../../data/arxiv/summary_embeddings.csv", "text_processor": { "train": { "name": "translator_caption" } }, "type": "translator_train_stage2" } 2024-09-09 20:10:07,223 [INFO] ====== Model Attributes ====== 2024-09-09 20:10:07,223 [INFO] { "arch": "translator_arxiv_chatglm", "behavior_length": 768, "behavior_precision": "fp16", "bert_dir": "../models/bert-base-uncased", "freeze_behavior": true, "llm_dir": "../models/chatglm2-6b", "load_finetuned": false, "max_txt_len": 1024, "model_type": "pretrain_arxiv", "num_query_token": 32, "pretrained": "../model_output/pretrain_arxiv_stage1/checkpoint_0.pth" } 2024-09-09 20:10:07,223 [INFO] Building datasets... Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:05<00:00, 1.26it/s] 2024-09-09 20:10:16,751 [INFO] load checkpoint from ../model_output/pretrain_arxiv_stage1/checkpoint_0.pth 2024-09-09 20:10:16,752 [INFO] Start training 2024-09-09 20:10:22,868 [INFO] number of trainable parameters: 182936320 2024-09-09 20:10:23,003 [INFO] Start training epoch 0, 100 iters per inner epoch. Time 2024-09-09 20:10:23.003839 Train: data epoch: [0] [ 0/100] eta: 0:03:50 lr: 0.00000100 loss: 2.84765625 time: 2.3050 data: 0.0253 max mem: 18714 Time 2024-09-09 20:10:23.003839 Train: data epoch: [0] [ 50/100] eta: 0:00:15 lr: 0.00000199 loss: 3.83593750 time: 0.2861 data: 0.0001 max mem: 20609 Time 2024-09-09 20:10:23.003839 Train: data epoch: [0] [ 99/100] eta: 0:00:00 lr: 0.00000296 loss: 3.27148438 time: 0.3000 data: 0.0001 max mem: 22493 Time 2024-09-09 20:10:23.003839 Train: data epoch: [0] Total time: 0:00:29 (0.2988 s / it) 2024-09-09 20:10:52,881 [INFO] Averaged stats: lr: 0.00000198 loss: 3.53283203 2024-09-09 20:10:52,883 [INFO] No validation splits found. 2024-09-09 20:10:52,890 [INFO] Saving checkpoint at epoch 0 to ../model_output/pretrain_arxiv_stage2/checkpoint_0.pth. 2024-09-09 20:10:55,732 [INFO] No validation splits found. 2024-09-09 20:10:55,732 [INFO] Training time 0:00:38
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Why there was only 1 epoch run during training the translator? Was is enough for "max_epoch": 1?
(graphtranslator) user@k9:.../Translator/train $ python train.py --cfg-path ./pretrain_arxiv_stage2.yaml
Not using distributed mode
2024-09-09 20:10:07,222 [INFO]
===== Running Parameters =====
2024-09-09 20:10:07,222 [INFO] {
"accum_grad_iters": 32,
"amp": true,
"batch_size_eval": 64,
"batch_size_train": 1,
"device": "cuda:0",
"dist_url": "env://",
"distributed": false,
"evaluate": false,
"init_lr": 0.0001,
"log_freq": 50,
"lr_sched": "linear_warmup_cosine_lr",
"max_epoch": 1,
"min_lr": 1e-05,
"output_dir": "../model_output/pretrain_arxiv_stage2",
"resume_ckpt_path": null,
"seed": 42,
"task": "arxiv_text_pretrain",
"train_splits": [
"train"
],
"warmup_lr": 1e-06,
"warmup_steps": 5000,
"weight_decay": 0.05
}
2024-09-09 20:10:07,222 [INFO]
====== Dataset Attributes ======
2024-09-09 20:10:07,223 [INFO]
======== arxiv_caption =======
2024-09-09 20:10:07,223 [INFO] {
"arxiv_processor": {
"train": {
"max_length": 1024,
"name": "translator_arxiv_train",
"vocab_size": 100000
}
},
"datasets_dir": "../../data/arxiv/summary_embeddings.csv",
"text_processor": {
"train": {
"name": "translator_caption"
}
},
"type": "translator_train_stage2"
}
2024-09-09 20:10:07,223 [INFO]
====== Model Attributes ======
2024-09-09 20:10:07,223 [INFO] {
"arch": "translator_arxiv_chatglm",
"behavior_length": 768,
"behavior_precision": "fp16",
"bert_dir": "../models/bert-base-uncased",
"freeze_behavior": true,
"llm_dir": "../models/chatglm2-6b",
"load_finetuned": false,
"max_txt_len": 1024,
"model_type": "pretrain_arxiv",
"num_query_token": 32,
"pretrained": "../model_output/pretrain_arxiv_stage1/checkpoint_0.pth"
}
2024-09-09 20:10:07,223 [INFO] Building datasets...
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:05<00:00, 1.26it/s]
2024-09-09 20:10:16,751 [INFO] load checkpoint from ../model_output/pretrain_arxiv_stage1/checkpoint_0.pth
2024-09-09 20:10:16,752 [INFO] Start training
2024-09-09 20:10:22,868 [INFO] number of trainable parameters: 182936320
2024-09-09 20:10:23,003 [INFO] Start training epoch 0, 100 iters per inner epoch.
Time 2024-09-09 20:10:23.003839 Train: data epoch: [0] [ 0/100] eta: 0:03:50 lr: 0.00000100 loss: 2.84765625 time: 2.3050 data: 0.0253 max mem: 18714
Time 2024-09-09 20:10:23.003839 Train: data epoch: [0] [ 50/100] eta: 0:00:15 lr: 0.00000199 loss: 3.83593750 time: 0.2861 data: 0.0001 max mem: 20609
Time 2024-09-09 20:10:23.003839 Train: data epoch: [0] [ 99/100] eta: 0:00:00 lr: 0.00000296 loss: 3.27148438 time: 0.3000 data: 0.0001 max mem: 22493
Time 2024-09-09 20:10:23.003839 Train: data epoch: [0] Total time: 0:00:29 (0.2988 s / it)
2024-09-09 20:10:52,881 [INFO] Averaged stats: lr: 0.00000198 loss: 3.53283203
2024-09-09 20:10:52,883 [INFO] No validation splits found.
2024-09-09 20:10:52,890 [INFO] Saving checkpoint at epoch 0 to ../model_output/pretrain_arxiv_stage2/checkpoint_0.pth.
2024-09-09 20:10:55,732 [INFO] No validation splits found.
2024-09-09 20:10:55,732 [INFO] Training time 0:00:38
The text was updated successfully, but these errors were encountered: