Skip to content

Guide to finetune on custom dataset #251

Description

@vishalk2999

I have created a dataset in the following format:

- Dataset_folder
    - videos
        - video1,mp4
        - video2.mp4
    train.json

train.json is in the following format:

[
    {
        "video":"videos/calling.mp4",
        "QA":[{
            "i":"Go through the video and understand the all the actions performed in the video",
            "q":"Describe the video",
            "a":"The person is making phone call and talking on the phone"
        }]
    },
]

How to prepare a custom dataset and what are the changes I need to do in order to train on this custom dataset for stage3 finetuning.

I have set the train_file variable of config_7b_stage3.py to the path of this train.json and i get the following error:

2024-12-07T07:52:41 | __main__: train_file: /home/ubuntu/Custom_Data/train.json
2024-12-07T07:52:41 | __main__: Creating dataset for it
2024-12-07T07:52:41 | dataset.it_dataset: Load json file
Traceback (most recent call last):
  File "/home/ubuntu/Ask-Anything/video_chat2/tasks/train_it.py", line 221, in <module>
    main(cfg)
  File "/home/ubuntu/Ask-Anything/video_chat2/tasks/train_it.py", line 138, in main
    train_loaders, train_media_types = setup_dataloaders(
  File "/home/ubuntu/Ask-Anything/video_chat2/tasks/train_it.py", line 105, in setup_dataloaders
    train_datasets = create_dataset(f"{mode}_train", config)
  File "/home/ubuntu/Ask-Anything/video_chat2/dataset/__init__.py", line 174, in create_dataset
    datasets.append(dataset_cls(**dataset_kwargs))
  File "/home/ubuntu/Ask-Anything/video_chat2/dataset/it_dataset.py", line 37, in __init__
    with open(self.label_file, 'r') as f:
IsADirectoryError: [Errno 21] Is a directory: '/'

Could you please help in understading the steps and changes required to train on a custom dataset

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions