Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only 1 epoch is conducted during training the translator #17

Open
zmce2018 opened this issue Sep 9, 2024 · 0 comments
Open

Only 1 epoch is conducted during training the translator #17

zmce2018 opened this issue Sep 9, 2024 · 0 comments

Comments

@zmce2018
Copy link

zmce2018 commented Sep 9, 2024

Why there was only 1 epoch run during training the translator? Was is enough for "max_epoch": 1?

(graphtranslator) user@k9:.../Translator/train $ python train.py --cfg-path ./pretrain_arxiv_stage2.yaml
Not using distributed mode
2024-09-09 20:10:07,222 [INFO]
===== Running Parameters =====
2024-09-09 20:10:07,222 [INFO] {
"accum_grad_iters": 32,
"amp": true,
"batch_size_eval": 64,
"batch_size_train": 1,
"device": "cuda:0",
"dist_url": "env://",
"distributed": false,
"evaluate": false,
"init_lr": 0.0001,
"log_freq": 50,
"lr_sched": "linear_warmup_cosine_lr",
"max_epoch": 1,
"min_lr": 1e-05,
"output_dir": "../model_output/pretrain_arxiv_stage2",
"resume_ckpt_path": null,
"seed": 42,
"task": "arxiv_text_pretrain",
"train_splits": [
"train"
],
"warmup_lr": 1e-06,
"warmup_steps": 5000,
"weight_decay": 0.05
}
2024-09-09 20:10:07,222 [INFO]
====== Dataset Attributes ======
2024-09-09 20:10:07,223 [INFO]
======== arxiv_caption =======
2024-09-09 20:10:07,223 [INFO] {
"arxiv_processor": {
"train": {
"max_length": 1024,
"name": "translator_arxiv_train",
"vocab_size": 100000
}
},
"datasets_dir": "../../data/arxiv/summary_embeddings.csv",
"text_processor": {
"train": {
"name": "translator_caption"
}
},
"type": "translator_train_stage2"
}
2024-09-09 20:10:07,223 [INFO]
====== Model Attributes ======
2024-09-09 20:10:07,223 [INFO] {
"arch": "translator_arxiv_chatglm",
"behavior_length": 768,
"behavior_precision": "fp16",
"bert_dir": "../models/bert-base-uncased",
"freeze_behavior": true,
"llm_dir": "../models/chatglm2-6b",
"load_finetuned": false,
"max_txt_len": 1024,
"model_type": "pretrain_arxiv",
"num_query_token": 32,
"pretrained": "../model_output/pretrain_arxiv_stage1/checkpoint_0.pth"
}
2024-09-09 20:10:07,223 [INFO] Building datasets...
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:05<00:00, 1.26it/s]
2024-09-09 20:10:16,751 [INFO] load checkpoint from ../model_output/pretrain_arxiv_stage1/checkpoint_0.pth
2024-09-09 20:10:16,752 [INFO] Start training
2024-09-09 20:10:22,868 [INFO] number of trainable parameters: 182936320
2024-09-09 20:10:23,003 [INFO] Start training epoch 0, 100 iters per inner epoch.
Time 2024-09-09 20:10:23.003839 Train: data epoch: [0] [ 0/100] eta: 0:03:50 lr: 0.00000100 loss: 2.84765625 time: 2.3050 data: 0.0253 max mem: 18714
Time 2024-09-09 20:10:23.003839 Train: data epoch: [0] [ 50/100] eta: 0:00:15 lr: 0.00000199 loss: 3.83593750 time: 0.2861 data: 0.0001 max mem: 20609
Time 2024-09-09 20:10:23.003839 Train: data epoch: [0] [ 99/100] eta: 0:00:00 lr: 0.00000296 loss: 3.27148438 time: 0.3000 data: 0.0001 max mem: 22493
Time 2024-09-09 20:10:23.003839 Train: data epoch: [0] Total time: 0:00:29 (0.2988 s / it)
2024-09-09 20:10:52,881 [INFO] Averaged stats: lr: 0.00000198 loss: 3.53283203
2024-09-09 20:10:52,883 [INFO] No validation splits found.
2024-09-09 20:10:52,890 [INFO] Saving checkpoint at epoch 0 to ../model_output/pretrain_arxiv_stage2/checkpoint_0.pth.
2024-09-09 20:10:55,732 [INFO] No validation splits found.
2024-09-09 20:10:55,732 [INFO] Training time 0:00:38

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant