Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom Dataset registration error #173

Open
rahuljoshi078 opened this issue Dec 30, 2024 · 0 comments
Open

Custom Dataset registration error #173

rahuljoshi078 opened this issue Dec 30, 2024 · 0 comments

Comments

@rahuljoshi078
Copy link

Need steps for the custom dataset registration.
Query:
bash scripts/NVILA-Lite/sft.sh runs/train/NVILA-Lite-8B-stage2 "alias to data"

where "alias to data" is
/home/sample_ft/M3IT/data/captioning/coco/captioning_coco_train.pkl

Error:
2024-12-30 11:13:46.201 | INFO | llava.data.builder:register_datasets:39 - Registering datasets from environment: 'default'.
2024-12-30 11:13:46.202 | INFO | llava.data.builder:register_datasets:44 - Registering datasets from: '/home/user/VILA/llava/data/registry/datasets/default.yaml'.
Traceback (most recent call last):
File "/home/user/VILA/llava/train/train_mem.py", line 22, in
from llava.train.train import train
File "/home/user/VILA/llava/train/train.py", line 31, in
import llava.data.dataset as dataset
File "/home/user/VILA/llava/data/init.py", line 1, in
from .builder import *
File "/home/user/VILA/llava/data/builder.py", line 54, in
DATASETS = register_datasets()
File "/home/user/VILA/llava/data/builder.py", line 46, in register_datasets
dataset_meta.update(meta)
TypeError: 'NoneType' object is not iterable
E1230 11:13:47.318000 128108121974592 torch/distributed/elastic/multiprocessing/api.py:826] failed (exitcode: 1) local_rank: 0 (pid: 185298) of binary: /root/anaconda3/envs/vila_adv/bin/python
Traceback (most recent call last):
File "/root/anaconda3/envs/vila_adv/bin/torchrun", line 8, in
sys.exit(main())
File "/root/anaconda3/envs/vila_adv/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 347, in wrapper
return f(*args, **kwargs)
File "/root/anaconda3/envs/vila_adv/lib/python3.10/site-packages/torch/distributed/run.py", line 879, in main
run(args)
File "/root/anaconda3/envs/vila_adv/lib/python3.10/site-packages/torch/distributed/run.py", line 870, in run
elastic_launch(
File "/root/anaconda3/envs/vila_adv/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 132, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/root/anaconda3/envs/vila_adv/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 263, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

llava/train/train_mem.py FAILED

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant