Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] exits with return code = 1 , AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'(Runing on a Linux Server with 6 A100, without Sudo power) #470

Closed
tonymhl opened this issue Jun 5, 2023 · 7 comments
Labels
pending Something isn't working

Comments

@tonymhl
Copy link

tonymhl commented Jun 5, 2023

...
Loading extension module cpu_adam...
Traceback (most recent call last):
File "/home/mahongli/LMFlow/examples/finetune.py", line 61, in
main()
File "/home/mahongli/LMFlow/examples/finetune.py", line 57, in main
tuned_model = finetuner.tune(model=model, dataset=dataset)
File "/home/mahongli/LMFlow/src/lmflow/pipeline/finetuner.py", line 285, in tune
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 1639, in train
return inner_training_loop(
File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 1708, in _inner_training_loop
deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(
File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/deepspeed.py", line 378, in deepspeed_init
deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/init.py", line 125, in initialize
engine = DeepSpeedEngine(args=args,
File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 340, in init
self._configure_optimizer(optimizer, model_parameters)
File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1283, in _configure_optimizer
basic_optimizer = self._configure_basic_optimizer(model_parameters)
File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1354, in _configure_basic_optimizer
optimizer = DeepSpeedCPUAdam(model_parameters,
File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/adam/cpu_adam.py", line 96, in init
self.ds_opt_adam = CPUAdamBuilder().load()
File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py", line 485, in load
return self.jit_load(verbose)
File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py", line 520, in jit_load
op_module = load(
File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1284, in load
return _jit_compile(
File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1535, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1929, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
File "", line 565, in module_from_spec
File "", line 1173, in create_module
File "", line 228, in _call_with_frames_removed
ImportError: /home/mahongli/.cache/torch_extensions/py39_cu117/cpu_adam/cpu_adam.so: cannot open shared object file: No such file or directory
Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7f915c061820>
Traceback (most recent call last):
File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/adam/cpu_adam.py", line 110, in del
self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7f2f180db820>
Traceback (most recent call last):
File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/adam/cpu_adam.py", line 110, in del
self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7f0548ddd820>
Traceback (most recent call last):
File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/adam/cpu_adam.py", line 110, in del
self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7fc24b130820>
Traceback (most recent call last):
File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/adam/cpu_adam.py", line 110, in del
self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7f4d96660820>
Traceback (most recent call last):
File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/adam/cpu_adam.py", line 110, in del
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7f05c8043820>
Traceback (most recent call last):
File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/adam/cpu_adam.py", line 110, in del
self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
[2023-06-05 21:56:22,835] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 1368214
[2023-06-05 21:56:23,637] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 1368215
[2023-06-05 21:56:23,719] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 1368216
[2023-06-05 21:56:23,719] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 1368217
[2023-06-05 21:56:23,776] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 1368221
[2023-06-05 21:56:23,816] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 1368223
[2023-06-05 21:56:23,859] [ERROR] [launch.py:324:sigkill_handler] ['/home/mahongli/anaconda3/envs/lmflow/bin/python', '-u', 'examples/finetune.py', '--local_rank=5', '--model_name_or_path', 'bigscience/bloom-560m', '--dataset_path', '/home/mahongli/LMFlow/data/alpaca/train', '--output_dir', '/home/mahongli/LMFlow/output_models/finetune', '--overwrite_output_dir', '--num_train_epochs', '1', '--learning_rate', '2e-5', '--block_size', '256', '--per_device_train_batch_size', '1', '--deepspeed', 'configs/ds_config_zero3.json', '--bf16', '--run_name', 'finetune', '--validation_split_percentage', '0', '--logging_steps', '20', '--do_train', '--ddp_timeout', '72000', '--save_steps', '5000', '--dataloader_num_workers', '1'] exits with return code = 1�

Detail
I am running on a Linux Server with 6 A100,

Screenshots
image

Additional context

(lmflow) mahongli@ops-NF5468M6:~/LMFlow$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0�

image
@tonymhl tonymhl added the pending Something isn't working label Jun 5, 2023
@shizhediao
Copy link
Contributor

Hi,
Please refer to this issue and see whether it could solve this problem. Thanks!
#446

@tonymhl
Copy link
Author

tonymhl commented Jun 7, 2023

Hi,
I actually tried all methods described in
#446
and
ymcui/Chinese-LLaMA-Alpaca#368
, but still unable to start SFT.

After removing the following section in configs/ds_config_zero3.json:

    "optimizer": {
        "type": "AdamW",
        "params": {
            "lr": "auto",
            "betas": "auto",
            "eps": "auto",
            "weight_decay": "auto"
        }
    },

By the way, I added these lines in /examples/finetune.py:
import torch
print(torch.version)
print(torch.version.cuda)
if torch.cuda.is_available():
print('CUDA is available')
else:
print('CUDA is not available')

I got:

(lmflow) mahongli@ops-NF5468M6:~/LMFlow$ ./scripts/run_finetune.sh
[2023-06-07 10:58:46,697] [WARNING] [runner.py:186:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2023-06-07 10:58:46,864] [INFO] [runner.py:550:main] cmd = /home/mahongli/anaconda3/envs/lmflow/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgMywgNCwgNV19 --master_addr=127.0.0.1 --master_port=11000 --enable_each_rank_log=None examples/finetune.py --model_name_or_path bigscience/bloom-560m --dataset_path /home/mahongli/LMFlow/data/alpaca/train --output_dir /home/mahongli/LMFlow/output_models/finetune --overwrite_output_dir --num_train_epochs 1 --learning_rate 2e-5 --block_size 512 --per_device_train_batch_size 1 --deepspeed configs/ds_config_zero3.json --bf16 --run_name finetune --validation_split_percentage 0 --logging_steps 20 --do_train --ddp_timeout 72000 --save_steps 5000 --dataloader_num_workers 1
[2023-06-07 10:58:48,575] [INFO] [launch.py:142:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3, 4, 5]}
[2023-06-07 10:58:48,575] [INFO] [launch.py:148:main] nnodes=1, num_local_procs=6, node_rank=0
[2023-06-07 10:58:48,575] [INFO] [launch.py:161:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3, 4, 5]})
[2023-06-07 10:58:48,575] [INFO] [launch.py:162:main] dist_world_size=6
[2023-06-07 10:58:48,575] [INFO] [launch.py:164:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3,4,5
2.0.0+cu117
11.7
2.0.0+cu117
11.7
2.0.0+cu117
11.7
2.0.0+cu117
11.7
2.0.0+cu117
11.7
2.0.0+cu117
11.7
CUDA is available
CUDA is available
CUDA is available
CUDA is available
CUDA is available
CUDA is available

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /home/mahongli/anaconda3/envs/lmflow did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
 warn(msg)
/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0'), PosixPath('/usr/local/cuda/lib64/libcudart.so')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
 warn(msg)
bin /home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda118.so
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda118.so...

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda118.so
/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /home/mahongli/anaconda3/envs/lmflow did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
 warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0'), PosixPath('/usr/local/cuda/lib64/libcudart.so')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
 warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda118.so...
bin /home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda118.so
/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /home/mahongli/anaconda3/envs/lmflow did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
 warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0'), PosixPath('/usr/local/cuda/lib64/libcudart.so')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
 warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda118.so...
bin /home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda118.so
bin /home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda118.so
/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /home/mahongli/anaconda3/envs/lmflow did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
 warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0'), PosixPath('/usr/local/cuda/lib64/libcudart.so')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
 warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda118.so...
/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /home/mahongli/anaconda3/envs/lmflow did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
 warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0'), PosixPath('/usr/local/cuda/lib64/libcudart.so')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
 warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda118.so...
bin /home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda118.so
/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /home/mahongli/anaconda3/envs/lmflow did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
 warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0'), PosixPath('/usr/local/cuda/lib64/libcudart.so')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
 warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda118.so...
[2023-06-07 10:58:54,110] [INFO] [comm.py:652:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
06/07/2023 10:58:55 - WARNING - lmflow.pipeline.finetuner - Process rank: 5, device: cuda:5, n_gpu: 1distributed training: True, 16-bits training: False
06/07/2023 10:58:55 - WARNING - lmflow.pipeline.finetuner - Process rank: 3, device: cuda:3, n_gpu: 1distributed training: True, 16-bits training: False
06/07/2023 10:58:55 - WARNING - lmflow.pipeline.finetuner - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: False
06/07/2023 10:58:55 - WARNING - lmflow.pipeline.finetuner - Process rank: 1, device: cuda:1, n_gpu: 1distributed training: True, 16-bits training: False
06/07/2023 10:58:55 - WARNING - lmflow.pipeline.finetuner - Process rank: 2, device: cuda:2, n_gpu: 1distributed training: True, 16-bits training: False
06/07/2023 10:58:55 - WARNING - lmflow.pipeline.finetuner - Process rank: 4, device: cuda:4, n_gpu: 1distributed training: True, 16-bits training: False
06/07/2023 10:58:56 - WARNING - datasets.builder - Found cached dataset json (/home/mahongli/.cache/huggingface/datasets/json/default-bc0827494cb815bf/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
06/07/2023 10:58:56 - WARNING - datasets.builder - Found cached dataset json (/home/mahongli/.cache/huggingface/datasets/json/default-bc0827494cb815bf/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
06/07/2023 10:58:56 - WARNING - datasets.builder - Found cached dataset json (/home/mahongli/.cache/huggingface/datasets/json/default-bc0827494cb815bf/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
06/07/2023 10:58:56 - WARNING - datasets.builder - Found cached dataset json (/home/mahongli/.cache/huggingface/datasets/json/default-bc0827494cb815bf/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
06/07/2023 10:58:56 - WARNING - datasets.builder - Found cached dataset json (/home/mahongli/.cache/huggingface/datasets/json/default-bc0827494cb815bf/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
06/07/2023 10:58:56 - WARNING - datasets.builder - Found cached dataset json (/home/mahongli/.cache/huggingface/datasets/json/default-bc0827494cb815bf/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
[2023-06-07 10:59:16,402] [INFO] [partition_parameters.py:415:__exit__] finished initializing model with 0.82B parameters
/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:2547: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead.
 warnings.warn(
/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:2547: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead.
 warnings.warn(
/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:2547: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead.
 warnings.warn(
/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:2547: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead.
 warnings.warn(
/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:2547: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead.
 warnings.warn(
/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:2547: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead.
 warnings.warn(
06/07/2023 10:59:24 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/mahongli/.cache/huggingface/datasets/json/default-bc0827494cb815bf/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-6e3bfba02f06b0a33db2edfd8d8b03f0.arrow
06/07/2023 10:59:24 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/mahongli/.cache/huggingface/datasets/json/default-bc0827494cb815bf/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-fa98ec4e8e23a487.arrow
06/07/2023 10:59:24 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/mahongli/.cache/huggingface/datasets/json/default-bc0827494cb815bf/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-6e3bfba02f06b0a33db2edfd8d8b03f0.arrow
06/07/2023 10:59:24 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/mahongli/.cache/huggingface/datasets/json/default-bc0827494cb815bf/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-6e3bfba02f06b0a33db2edfd8d8b03f0.arrow
06/07/2023 10:59:24 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/mahongli/.cache/huggingface/datasets/json/default-bc0827494cb815bf/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-6e3bfba02f06b0a33db2edfd8d8b03f0.arrow
06/07/2023 10:59:24 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/mahongli/.cache/huggingface/datasets/json/default-bc0827494cb815bf/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-6e3bfba02f06b0a33db2edfd8d8b03f0.arrow
06/07/2023 10:59:24 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/mahongli/.cache/huggingface/datasets/json/default-bc0827494cb815bf/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-6e3bfba02f06b0a33db2edfd8d8b03f0.arrow
06/07/2023 10:59:24 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/mahongli/.cache/huggingface/datasets/json/default-bc0827494cb815bf/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-fa98ec4e8e23a487.arrow
06/07/2023 10:59:24 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/mahongli/.cache/huggingface/datasets/json/default-bc0827494cb815bf/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-fa98ec4e8e23a487.arrow
06/07/2023 10:59:24 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/mahongli/.cache/huggingface/datasets/json/default-bc0827494cb815bf/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-fa98ec4e8e23a487.arrow
06/07/2023 10:59:24 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/mahongli/.cache/huggingface/datasets/json/default-bc0827494cb815bf/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-fa98ec4e8e23a487.arrow
06/07/2023 10:59:24 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/mahongli/.cache/huggingface/datasets/json/default-bc0827494cb815bf/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-fa98ec4e8e23a487.arrow
/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
 warnings.warn(
/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
 warnings.warn(
/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
 warnings.warn(
/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
 warnings.warn(
/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
 warnings.warn(
/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
 warnings.warn(
Traceback (most recent call last):
 File "/home/mahongli/LMFlow/examples/finetune.py", line 68, in <module>
Traceback (most recent call last):
 File "/home/mahongli/LMFlow/examples/finetune.py", line 68, in <module>
   main()
 File "/home/mahongli/LMFlow/examples/finetune.py", line 64, in main
   main()
 File "/home/mahongli/LMFlow/examples/finetune.py", line 64, in main
   tuned_model = finetuner.tune(model=model, dataset=dataset)
 File "/home/mahongli/LMFlow/src/lmflow/pipeline/finetuner.py", line 285, in tune
   tuned_model = finetuner.tune(model=model, dataset=dataset)
 File "/home/mahongli/LMFlow/src/lmflow/pipeline/finetuner.py", line 285, in tune
   train_result = trainer.train(resume_from_checkpoint=checkpoint)
 File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 1639, in train
   train_result = trainer.train(resume_from_checkpoint=checkpoint)
 File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 1639, in train
       return inner_training_loop(return inner_training_loop(

 File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 1708, in _inner_training_loop
 File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 1708, in _inner_training_loop
   deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(
     File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/deepspeed.py", line 378, in deepspeed_init
deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(
 File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/deepspeed.py", line 378, in deepspeed_init
   deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
 File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/__init__.py", line 125, in initialize
   deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
 File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/__init__.py", line 125, in initialize
   engine = DeepSpeedEngine(args=args,
 File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 340, in __init__
   engine = DeepSpeedEngine(args=args,
 File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 340, in __init__
   self._configure_optimizer(optimizer, model_parameters)
 File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1281, in _configure_optimizer
   self._configure_optimizer(optimizer, model_parameters)
 File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1281, in _configure_optimizer
   raise ZeRORuntimeException(msg)
deepspeed.runtime.zero.utils.ZeRORuntimeException: You are using ZeRO-Offload with a client provided optimizer (<class 'transformers.optimization.AdamW'>) which in most cases will yield poor performance. Please either use deepspeed.ops.adam.DeepSpeedCPUAdam or set an optimizer in your ds-config (https://www.deepspeed.ai/docs/config-json/#optimizer-parameters). If you really want to use a custom optimizer w. ZeRO-Offload and understand the performance impacts you can also set <"zero_force_ds_cpu_optimizer": false> in your configuration file.
   raise ZeRORuntimeException(msg)
deepspeed.runtime.zero.utils.ZeRORuntimeException: You are using ZeRO-Offload with a client provided optimizer (<class 'transformers.optimization.AdamW'>) which in most cases will yield poor performance. Please either use deepspeed.ops.adam.DeepSpeedCPUAdam or set an optimizer in your ds-config (https://www.deepspeed.ai/docs/config-json/#optimizer-parameters). If you really want to use a custom optimizer w. ZeRO-Offload and understand the performance impacts you can also set <"zero_force_ds_cpu_optimizer": false> in your configuration file.
Traceback (most recent call last):
 File "/home/mahongli/LMFlow/examples/finetune.py", line 68, in <module>
   main()
 File "/home/mahongli/LMFlow/examples/finetune.py", line 64, in main
   tuned_model = finetuner.tune(model=model, dataset=dataset)
 File "/home/mahongli/LMFlow/src/lmflow/pipeline/finetuner.py", line 285, in tune
   train_result = trainer.train(resume_from_checkpoint=checkpoint)
 File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 1639, in train
   return inner_training_loop(
 File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 1708, in _inner_training_loop
   deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(
 File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/deepspeed.py", line 378, in deepspeed_init
   deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
 File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/__init__.py", line 125, in initialize
   engine = DeepSpeedEngine(args=args,
 File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 340, in __init__
   self._configure_optimizer(optimizer, model_parameters)
 File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1281, in _configure_optimizer
   raise ZeRORuntimeException(msg)
deepspeed.runtime.zero.utils.ZeRORuntimeException: You are using ZeRO-Offload with a client provided optimizer (<class 'transformers.optimization.AdamW'>) which in most cases will yield poor performance. Please either use deepspeed.ops.adam.DeepSpeedCPUAdam or set an optimizer in your ds-config (https://www.deepspeed.ai/docs/config-json/#optimizer-parameters). If you really want to use a custom optimizer w. ZeRO-Offload and understand the performance impacts you can also set <"zero_force_ds_cpu_optimizer": false> in your configuration file.
Traceback (most recent call last):
 File "/home/mahongli/LMFlow/examples/finetune.py", line 68, in <module>
   main()
 File "/home/mahongli/LMFlow/examples/finetune.py", line 64, in main
   tuned_model = finetuner.tune(model=model, dataset=dataset)
 File "/home/mahongli/LMFlow/src/lmflow/pipeline/finetuner.py", line 285, in tune
   train_result = trainer.train(resume_from_checkpoint=checkpoint)
 File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 1639, in train
   return inner_training_loop(
 File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 1708, in _inner_training_loop
   deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(
 File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/deepspeed.py", line 378, in deepspeed_init
   deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
 File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/__init__.py", line 125, in initialize
   engine = DeepSpeedEngine(args=args,
 File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 340, in __init__
   self._configure_optimizer(optimizer, model_parameters)
 File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1281, in _configure_optimizer
   raise ZeRORuntimeException(msg)
deepspeed.runtime.zero.utils.ZeRORuntimeException: You are using ZeRO-Offload with a client provided optimizer (<class 'transformers.optimization.AdamW'>) which in most cases will yield poor performance. Please either use deepspeed.ops.adam.DeepSpeedCPUAdam or set an optimizer in your ds-config (https://www.deepspeed.ai/docs/config-json/#optimizer-parameters). If you really want to use a custom optimizer w. ZeRO-Offload and understand the performance impacts you can also set <"zero_force_ds_cpu_optimizer": false> in your configuration file.
Traceback (most recent call last):
 File "/home/mahongli/LMFlow/examples/finetune.py", line 68, in <module>
   main()
 File "/home/mahongli/LMFlow/examples/finetune.py", line 64, in main
   tuned_model = finetuner.tune(model=model, dataset=dataset)
 File "/home/mahongli/LMFlow/src/lmflow/pipeline/finetuner.py", line 285, in tune
   train_result = trainer.train(resume_from_checkpoint=checkpoint)
 File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 1639, in train
   return inner_training_loop(
 File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 1708, in _inner_training_loop
   deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(
 File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/deepspeed.py", line 378, in deepspeed_init
Traceback (most recent call last):
 File "/home/mahongli/LMFlow/examples/finetune.py", line 68, in <module>
   deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
 File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/__init__.py", line 125, in initialize
   engine = DeepSpeedEngine(args=args,
 File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 340, in __init__
   self._configure_optimizer(optimizer, model_parameters)
 File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1281, in _configure_optimizer
   main()
 File "/home/mahongli/LMFlow/examples/finetune.py", line 64, in main
   tuned_model = finetuner.tune(model=model, dataset=dataset)
 File "/home/mahongli/LMFlow/src/lmflow/pipeline/finetuner.py", line 285, in tune
   raise ZeRORuntimeException(msg)
deepspeed.runtime.zero.utils.ZeRORuntimeException: You are using ZeRO-Offload with a client provided optimizer (<class 'transformers.optimization.AdamW'>) which in most cases will yield poor performance. Please either use deepspeed.ops.adam.DeepSpeedCPUAdam or set an optimizer in your ds-config (https://www.deepspeed.ai/docs/config-json/#optimizer-parameters). If you really want to use a custom optimizer w. ZeRO-Offload and understand the performance impacts you can also set <"zero_force_ds_cpu_optimizer": false> in your configuration file.    
train_result = trainer.train(resume_from_checkpoint=checkpoint)
 File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 1639, in train
   return inner_training_loop(
 File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 1708, in _inner_training_loop
   deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(
 File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/deepspeed.py", line 378, in deepspeed_init
   deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
 File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/__init__.py", line 125, in initialize
   engine = DeepSpeedEngine(args=args,
 File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 340, in __init__
   self._configure_optimizer(optimizer, model_parameters)
 File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1281, in _configure_optimizer
   raise ZeRORuntimeException(msg)
deepspeed.runtime.zero.utils.ZeRORuntimeException: You are using ZeRO-Offload with a client provided optimizer (<class 'transformers.optimization.AdamW'>) which in most cases will yield poor performance. Please either use deepspeed.ops.adam.DeepSpeedCPUAdam or set an optimizer in your ds-config (https://www.deepspeed.ai/docs/config-json/#optimizer-parameters). If you really want to use a custom optimizer w. ZeRO-Offload and understand the performance impacts you can also set <"zero_force_ds_cpu_optimizer": false> in your configuration file.
[2023-06-07 10:59:28,703] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 498108
[2023-06-07 10:59:29,524] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 498111
[2023-06-07 10:59:29,524] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 498112
[2023-06-07 10:59:29,586] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 498113
[2023-06-07 10:59:29,643] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 498114
[2023-06-07 10:59:29,706] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 498115
[2023-06-07 10:59:29,765] [ERROR] [launch.py:324:sigkill_handler] ['/home/mahongli/anaconda3/envs/lmflow/bin/python', '-u', 'examples/finetune.py', '--local_rank=5', '--model_name_or_path', 'bigscience/bloom-560m', '--dataset_path', '/home/mahongli/LMFlow/data/alpaca/train', '--output_dir', '/home/mahongli/LMFlow/output_models/finetune', '--overwrite_output_dir', '--num_train_epochs', '1', '--learning_rate', '2e-5', '--block_size', '512', '--per_device_train_batch_size', '1', '--deepspeed', 'configs/ds_config_zero3.json', '--bf16', '--run_name', 'finetune', '--validation_split_percentage', '0', '--logging_steps', '20', '--do_train', '--ddp_timeout', '72000', '--save_steps', '5000', '--dataloader_num_workers', '1'] exits with return code = 1

Please allow me to post again the complete output without removing anything in ds_config_zero3.json:

(lmflow) mahongli@ops-NF5468M6:~/LMFlow$ ./scripts/run_finetune.sh
[2023-06-07 11:05:19,885] [WARNING] [runner.py:186:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2023-06-07 11:05:19,987] [INFO] [runner.py:550:main] cmd = /home/mahongli/anaconda3/envs/lmflow/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgMywgNCwgNV19 --master_addr=127.0.0.1 --master_port=11000 --enable_each_rank_log=None examples/finetune.py --model_name_or_path gpt2 --dataset_path /home/mahongli/LMFlow/data/alpaca/train --output_dir /home/mahongli/LMFlow/output_models/finetune --overwrite_output_dir --num_train_epochs 1 --learning_rate 2e-5 --block_size 512 --per_device_train_batch_size 1 --deepspeed configs/ds_config_zero3.json --bf16 --run_name finetune --validation_split_percentage 0 --logging_steps 20 --do_train --ddp_timeout 72000 --save_steps 5000 --dataloader_num_workers 1
[2023-06-07 11:05:21,433] [INFO] [launch.py:142:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3, 4, 5]}
[2023-06-07 11:05:21,433] [INFO] [launch.py:148:main] nnodes=1, num_local_procs=6, node_rank=0
[2023-06-07 11:05:21,433] [INFO] [launch.py:161:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3, 4, 5]})
[2023-06-07 11:05:21,433] [INFO] [launch.py:162:main] dist_world_size=6
[2023-06-07 11:05:21,433] [INFO] [launch.py:164:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3,4,5
2.0.0+cu117
11.7
2.0.0+cu117
11.7
2.0.0+cu117
11.7
2.0.0+cu117
11.7
2.0.0+cu117
11.7
2.0.0+cu117
11.7
CUDA is available
CUDA is available
CUDA is available
CUDA is available
CUDA is available
CUDA is available

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda118.so
bin /home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda118.so
/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /home/mahongli/anaconda3/envs/lmflow did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda118.so...
/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /home/mahongli/anaconda3/envs/lmflow did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda118.so...
bin /home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda118.so
bin /home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda118.so
bin /home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda118.so
/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /home/mahongli/anaconda3/envs/lmflow did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0'), PosixPath('/usr/local/cuda/lib64/libcudart.so')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda118.so...
/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /home/mahongli/anaconda3/envs/lmflow did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0'), PosixPath('/usr/local/cuda/lib64/libcudart.so')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda118.so...
/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /home/mahongli/anaconda3/envs/lmflow did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda118.so...

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda118.so
/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /home/mahongli/anaconda3/envs/lmflow did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0'), PosixPath('/usr/local/cuda/lib64/libcudart.so')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda118.so...
[2023-06-07 11:05:27,205] [INFO] [comm.py:652:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
06/07/2023 11:05:28 - WARNING - lmflow.pipeline.finetuner - Process rank: 5, device: cuda:5, n_gpu: 1distributed training: True, 16-bits training: False
06/07/2023 11:05:28 - WARNING - lmflow.pipeline.finetuner - Process rank: 2, device: cuda:2, n_gpu: 1distributed training: True, 16-bits training: False
06/07/2023 11:05:28 - WARNING - lmflow.pipeline.finetuner - Process rank: 4, device: cuda:4, n_gpu: 1distributed training: True, 16-bits training: False
06/07/2023 11:05:28 - WARNING - lmflow.pipeline.finetuner - Process rank: 1, device: cuda:1, n_gpu: 1distributed training: True, 16-bits training: False
06/07/2023 11:05:28 - WARNING - lmflow.pipeline.finetuner - Process rank: 3, device: cuda:3, n_gpu: 1distributed training: True, 16-bits training: False
06/07/2023 11:05:28 - WARNING - lmflow.pipeline.finetuner - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: False
06/07/2023 11:05:29 - WARNING - datasets.builder - Found cached dataset json (/home/mahongli/.cache/huggingface/datasets/json/default-bc0827494cb815bf/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
06/07/2023 11:05:29 - WARNING - datasets.builder - Found cached dataset json (/home/mahongli/.cache/huggingface/datasets/json/default-bc0827494cb815bf/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
06/07/2023 11:05:29 - WARNING - datasets.builder - Found cached dataset json (/home/mahongli/.cache/huggingface/datasets/json/default-bc0827494cb815bf/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
06/07/2023 11:05:29 - WARNING - datasets.builder - Found cached dataset json (/home/mahongli/.cache/huggingface/datasets/json/default-bc0827494cb815bf/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
06/07/2023 11:05:29 - WARNING - datasets.builder - Found cached dataset json (/home/mahongli/.cache/huggingface/datasets/json/default-bc0827494cb815bf/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
06/07/2023 11:05:30 - WARNING - datasets.builder - Found cached dataset json (/home/mahongli/.cache/huggingface/datasets/json/default-bc0827494cb815bf/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
[2023-06-07 11:05:45,666] [INFO] [partition_parameters.py:415:__exit__] finished initializing model with 0.16B parameters
/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:2547: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead.
  warnings.warn(
/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:2547: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead.
  warnings.warn(
/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:2547: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead.
  warnings.warn(
/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:2547: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead.
  warnings.warn(
/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:2547: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead.
  warnings.warn(
/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:2547: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead.
  warnings.warn(
06/07/2023 11:05:47 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/mahongli/.cache/huggingface/datasets/json/default-bc0827494cb815bf/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-f8d64ba5c02ffb11236c060e9ac92214.arrow
06/07/2023 11:05:47 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/mahongli/.cache/huggingface/datasets/json/default-bc0827494cb815bf/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-31d521238be5ad4e.arrow
06/07/2023 11:05:47 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/mahongli/.cache/huggingface/datasets/json/default-bc0827494cb815bf/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-f8d64ba5c02ffb11236c060e9ac92214.arrow
06/07/2023 11:05:47 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/mahongli/.cache/huggingface/datasets/json/default-bc0827494cb815bf/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-f8d64ba5c02ffb11236c060e9ac92214.arrow
06/07/2023 11:05:47 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/mahongli/.cache/huggingface/datasets/json/default-bc0827494cb815bf/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-f8d64ba5c02ffb11236c060e9ac92214.arrow
06/07/2023 11:05:47 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/mahongli/.cache/huggingface/datasets/json/default-bc0827494cb815bf/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-f8d64ba5c02ffb11236c060e9ac92214.arrow
06/07/2023 11:05:47 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/mahongli/.cache/huggingface/datasets/json/default-bc0827494cb815bf/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-f8d64ba5c02ffb11236c060e9ac92214.arrow
06/07/2023 11:05:47 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/mahongli/.cache/huggingface/datasets/json/default-bc0827494cb815bf/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-31d521238be5ad4e.arrow
06/07/2023 11:05:47 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/mahongli/.cache/huggingface/datasets/json/default-bc0827494cb815bf/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-31d521238be5ad4e.arrow
06/07/2023 11:05:47 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/mahongli/.cache/huggingface/datasets/json/default-bc0827494cb815bf/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-31d521238be5ad4e.arrow
06/07/2023 11:05:47 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/mahongli/.cache/huggingface/datasets/json/default-bc0827494cb815bf/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-31d521238be5ad4e.arrow
06/07/2023 11:05:47 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/mahongli/.cache/huggingface/datasets/json/default-bc0827494cb815bf/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-31d521238be5ad4e.arrow
Installed CUDA version 11.5 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.5 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.5 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.5 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.5 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.5 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/mahongli/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/mahongli/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/mahongli/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/mahongli/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/mahongli/.cache/torch_extensions/py39_cu117/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
Using /home/mahongli/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/mahongli/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
[1/2] /usr/bin/nvcc  -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -I/usr/include -isystem /home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include -isystem /home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/TH -isystem /home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/THC -isystem /home/mahongli/anaconda3/envs/lmflow/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -c /home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/csrc/common/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o 
FAILED: custom_cuda_kernel.cuda.o 
/usr/bin/nvcc  -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -I/usr/include -isystem /home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include -isystem /home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/TH -isystem /home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/THC -isystem /home/mahongli/anaconda3/envs/lmflow/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -c /home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/csrc/common/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o 
/usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’:
  435 |         function(_Functor&& __f)
      |                                                                                                                                                 ^ 
/usr/include/c++/11/bits/std_function.h:435:145: note:         ‘_ArgTypes’
/usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’:
  530 |         operator=(_Functor&& __f)
      |                                                                                                                                                  ^ 
/usr/include/c++/11/bits/std_function.h:530:146: note:         ‘_ArgTypes’
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build
    subprocess.run(
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/mahongli/LMFlow/examples/finetune.py", line 68, in <module>
    main()
  File "/home/mahongli/LMFlow/examples/finetune.py", line 64, in main
    tuned_model = finetuner.tune(model=model, dataset=dataset)
  File "/home/mahongli/LMFlow/src/lmflow/pipeline/finetuner.py", line 285, in tune
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 1639, in train
    return inner_training_loop(
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 1708, in _inner_training_loop
    deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/deepspeed.py", line 378, in deepspeed_init
    deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/__init__.py", line 125, in initialize
    engine = DeepSpeedEngine(args=args,
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 340, in __init__
    self._configure_optimizer(optimizer, model_parameters)
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1283, in _configure_optimizer
    basic_optimizer = self._configure_basic_optimizer(model_parameters)
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1354, in _configure_basic_optimizer
    optimizer = DeepSpeedCPUAdam(model_parameters,
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/adam/cpu_adam.py", line 96, in __init__
    self.ds_opt_adam = CPUAdamBuilder().load()
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py", line 485, in load
    return self.jit_load(verbose)
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py", line 520, in jit_load
    op_module = load(
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1284, in load
    return _jit_compile(
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1509, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1624, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'cpu_adam'
Loading extension module cpu_adam...
Loading extension module cpu_adam...
Traceback (most recent call last):
  File "/home/mahongli/LMFlow/examples/finetune.py", line 68, in <module>
Traceback (most recent call last):
  File "/home/mahongli/LMFlow/examples/finetune.py", line 68, in <module>
    main()
  File "/home/mahongli/LMFlow/examples/finetune.py", line 64, in main
    tuned_model = finetuner.tune(model=model, dataset=dataset)
  File "/home/mahongli/LMFlow/src/lmflow/pipeline/finetuner.py", line 285, in tune
    main()
  File "/home/mahongli/LMFlow/examples/finetune.py", line 64, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
      File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 1639, in train
tuned_model = finetuner.tune(model=model, dataset=dataset)
  File "/home/mahongli/LMFlow/src/lmflow/pipeline/finetuner.py", line 285, in tune
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 1639, in train
    return inner_training_loop(
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 1708, in _inner_training_loop
    return inner_training_loop(
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 1708, in _inner_training_loop
    deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/deepspeed.py", line 378, in deepspeed_init
    deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/__init__.py", line 125, in initialize
    engine = DeepSpeedEngine(args=args,
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 340, in __init__
    self._configure_optimizer(optimizer, model_parameters)
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1283, in _configure_optimizer
    basic_optimizer = self._configure_basic_optimizer(model_parameters)
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1354, in _configure_basic_optimizer
    optimizer = DeepSpeedCPUAdam(model_parameters,
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/adam/cpu_adam.py", line 96, in __init__
    self.ds_opt_adam = CPUAdamBuilder().load()
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py", line 485, in load
    return self.jit_load(verbose)
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py", line 520, in jit_load
    op_module = load(
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1284, in load
    return _jit_compile(
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1535, in _jit_compile
    return _import_module_from_library(name, build_directory, is_python_module)
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1929, in _import_module_from_library
    module = importlib.util.module_from_spec(spec)
  File "<frozen importlib._bootstrap>", line 565, in module_from_spec
    deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/deepspeed.py", line 378, in deepspeed_init
  File "<frozen importlib._bootstrap_external>", line 1173, in create_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
ImportError: /home/mahongli/.cache/torch_extensions/py39_cu117/cpu_adam/cpu_adam.so: cannot open shared object file: No such file or directory
    deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/__init__.py", line 125, in initialize
    engine = DeepSpeedEngine(args=args,
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 340, in __init__
    self._configure_optimizer(optimizer, model_parameters)
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1283, in _configure_optimizer
    basic_optimizer = self._configure_basic_optimizer(model_parameters)
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1354, in _configure_basic_optimizer
    optimizer = DeepSpeedCPUAdam(model_parameters,
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/adam/cpu_adam.py", line 96, in __init__
    self.ds_opt_adam = CPUAdamBuilder().load()
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py", line 485, in load
    return self.jit_load(verbose)
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py", line 520, in jit_load
    op_module = load(
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1284, in load
    return _jit_compile(
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1535, in _jit_compile
    return _import_module_from_library(name, build_directory, is_python_module)
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1929, in _import_module_from_library
    module = importlib.util.module_from_spec(spec)
  File "<frozen importlib._bootstrap>", line 565, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1173, in create_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
ImportError: /home/mahongli/.cache/torch_extensions/py39_cu117/cpu_adam/cpu_adam.so: cannot open shared object file: No such file or directory
Loading extension module cpu_adam...
Traceback (most recent call last):
  File "/home/mahongli/LMFlow/examples/finetune.py", line 68, in <module>
    main()
  File "/home/mahongli/LMFlow/examples/finetune.py", line 64, in main
    tuned_model = finetuner.tune(model=model, dataset=dataset)
  File "/home/mahongli/LMFlow/src/lmflow/pipeline/finetuner.py", line 285, in tune
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 1639, in train
    return inner_training_loop(
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 1708, in _inner_training_loop
    deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/deepspeed.py", line 378, in deepspeed_init
    deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/__init__.py", line 125, in initialize
    engine = DeepSpeedEngine(args=args,
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 340, in __init__
    self._configure_optimizer(optimizer, model_parameters)
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1283, in _configure_optimizer
    basic_optimizer = self._configure_basic_optimizer(model_parameters)
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1354, in _configure_basic_optimizer
    optimizer = DeepSpeedCPUAdam(model_parameters,
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/adam/cpu_adam.py", line 96, in __init__
    self.ds_opt_adam = CPUAdamBuilder().load()
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py", line 485, in load
    return self.jit_load(verbose)
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py", line 520, in jit_load
    op_module = load(
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1284, in load
    return _jit_compile(
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1535, in _jit_compile
    return _import_module_from_library(name, build_directory, is_python_module)
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1929, in _import_module_from_library
    module = importlib.util.module_from_spec(spec)
  File "<frozen importlib._bootstrap>", line 565, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1173, in create_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
ImportError: /home/mahongli/.cache/torch_extensions/py39_cu117/cpu_adam/cpu_adam.so: cannot open shared object file: No such file or directory
Loading extension module cpu_adam...
Traceback (most recent call last):
  File "/home/mahongli/LMFlow/examples/finetune.py", line 68, in <module>
    main()
  File "/home/mahongli/LMFlow/examples/finetune.py", line 64, in main
    tuned_model = finetuner.tune(model=model, dataset=dataset)
  File "/home/mahongli/LMFlow/src/lmflow/pipeline/finetuner.py", line 285, in tune
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 1639, in train
    return inner_training_loop(
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 1708, in _inner_training_loop
    deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/deepspeed.py", line 378, in deepspeed_init
    deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/__init__.py", line 125, in initialize
    engine = DeepSpeedEngine(args=args,
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 340, in __init__
    self._configure_optimizer(optimizer, model_parameters)
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1283, in _configure_optimizer
    basic_optimizer = self._configure_basic_optimizer(model_parameters)
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1354, in _configure_basic_optimizer
    optimizer = DeepSpeedCPUAdam(model_parameters,
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/adam/cpu_adam.py", line 96, in __init__
    self.ds_opt_adam = CPUAdamBuilder().load()
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py", line 485, in load
    return self.jit_load(verbose)
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py", line 520, in jit_load
    op_module = load(
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1284, in load
    return _jit_compile(
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1535, in _jit_compile
    return _import_module_from_library(name, build_directory, is_python_module)
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1929, in _import_module_from_library
    module = importlib.util.module_from_spec(spec)
  File "<frozen importlib._bootstrap>", line 565, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1173, in create_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
ImportError: /home/mahongli/.cache/torch_extensions/py39_cu117/cpu_adam/cpu_adam.so: cannot open shared object file: No such file or directory
Loading extension module cpu_adam...
Traceback (most recent call last):
  File "/home/mahongli/LMFlow/examples/finetune.py", line 68, in <module>
    main()
  File "/home/mahongli/LMFlow/examples/finetune.py", line 64, in main
    tuned_model = finetuner.tune(model=model, dataset=dataset)
  File "/home/mahongli/LMFlow/src/lmflow/pipeline/finetuner.py", line 285, in tune
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 1639, in train
    return inner_training_loop(
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 1708, in _inner_training_loop
    deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/deepspeed.py", line 378, in deepspeed_init
    deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/__init__.py", line 125, in initialize
    engine = DeepSpeedEngine(args=args,
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 340, in __init__
    self._configure_optimizer(optimizer, model_parameters)
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1283, in _configure_optimizer
    basic_optimizer = self._configure_basic_optimizer(model_parameters)
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1354, in _configure_basic_optimizer
    optimizer = DeepSpeedCPUAdam(model_parameters,
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/adam/cpu_adam.py", line 96, in __init__
    self.ds_opt_adam = CPUAdamBuilder().load()
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py", line 485, in load
    return self.jit_load(verbose)
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py", line 520, in jit_load
    op_module = load(
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1284, in load
    return _jit_compile(
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1535, in _jit_compile
    return _import_module_from_library(name, build_directory, is_python_module)
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1929, in _import_module_from_library
    module = importlib.util.module_from_spec(spec)
  File "<frozen importlib._bootstrap>", line 565, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1173, in create_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
ImportError: /home/mahongli/.cache/torch_extensions/py39_cu117/cpu_adam/cpu_adam.so: cannot open shared object file: No such file or directory
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7f40b0151280>
Traceback (most recent call last):
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/adam/cpu_adam.py", line 110, in __del__
    self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7fc9c8279280>
Traceback (most recent call last):
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/adam/cpu_adam.py", line 110, in __del__
    self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7fadd8092280>
Traceback (most recent call last):
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/adam/cpu_adam.py", line 110, in __del__
    self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7f55342b9280>
Traceback (most recent call last):
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/adam/cpu_adam.py", line 110, in __del__
    self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7f7f287a9280>
Traceback (most recent call last):
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/adam/cpu_adam.py", line 110, in __del__
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7f3cc4613280>
Traceback (most recent call last):
  File "/home/mahongli/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/adam/cpu_adam.py", line 110, in __del__
    self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
[2023-06-07 11:06:00,566] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 555346
[2023-06-07 11:06:00,965] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 555349
[2023-06-07 11:06:00,965] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 555354
[2023-06-07 11:06:01,084] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 555358
[2023-06-07 11:06:01,132] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 555359
[2023-06-07 11:06:01,179] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 555360
[2023-06-07 11:06:01,223] [ERROR] [launch.py:324:sigkill_handler] ['/home/mahongli/anaconda3/envs/lmflow/bin/python', '-u', 'examples/finetune.py', '--local_rank=5', '--model_name_or_path', 'gpt2', '--dataset_path', '/home/mahongli/LMFlow/data/alpaca/train', '--output_dir', '/home/mahongli/LMFlow/output_models/finetune', '--overwrite_output_dir', '--num_train_epochs', '1', '--learning_rate', '2e-5', '--block_size', '512', '--per_device_train_batch_size', '1', '--deepspeed', 'configs/ds_config_zero3.json', '--bf16', '--run_name', 'finetune', '--validation_split_percentage', '0', '--logging_steps', '20', '--do_train', '--ddp_timeout', '72000', '--save_steps', '5000', '--dataloader_num_workers', '1'] exits with return code = 1

@shizhediao
Copy link
Contributor

I found there is a warning saying: deepspeed.runtime.zero.utils.ZeRORuntimeException: You are using ZeRO-Offload with a client provided optimizer (<class 'transformers.optimization.AdamW'>) which in most cases will yield poor performance. Please either use deepspeed.ops.adam.DeepSpeedCPUAdam or set an optimizer in your ds-config (https://www.deepspeed.ai/docs/config-json/#optimizer-parameters). If you really want to use a custom optimizer w. ZeRO-Offload and understand the performance impacts you can also set <"zero_force_ds_cpu_optimizer": false> in your configuration file.

In addition, how much CPU memory (RAM) is in your machine?

@shizhediao
Copy link
Contributor

What is the cuda version? we suggest using 11.3-11.7

@tonymhl
Copy link
Author

tonymhl commented Jun 14, 2023

Thank you for your attention!

The CUDA version as shown is
2.0.0+cu117
(got from: print(torch.version.cuda))

CPU memory (RAM) is 1 TB

@seanxuu
Copy link
Contributor

seanxuu commented Jul 31, 2023

let's try:

git clone -b v0.8.3 https://github.com/microsoft/DeepSpeed.git
cd DeepSpeed
DS_BUILD_CPU_ADAM=1 DS_BUILD_AIO=1 DS_BUILD_UTILS=1 pip install -e .

@shizhediao
Copy link
Contributor

shizhediao commented Sep 30, 2023

Try this solution #70

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants