Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datasets.BuilderConfig does not work. #6868

Closed
jdm4pku opened this issue May 5, 2024 · 1 comment
Closed

datasets.BuilderConfig does not work. #6868

jdm4pku opened this issue May 5, 2024 · 1 comment

Comments

@jdm4pku
Copy link

jdm4pku commented May 5, 2024

Describe the bug

I custom a BuilderConfig and GeneratorBasedBuilder.
Here is the code for BuilderConfig

class UIEConfig(datasets.BuilderConfig):

    def __init__(
            self,
            *args,
            data_dir=None,
            instruction_file=None,
            instruction_strategy=None,
            task_config_dir=None,
            num_examples=None,
            max_num_instances_per_task=None,
            max_num_instances_per_eval_task=None,
            over_sampling=None,
            **kwargs
    ):
        super().__init__(*args, **kwargs)
        self.data_dir = data_dir
        self.num_examples = num_examples
        self.over_sampling = over_sampling
        self.instructions = self._parse_instruction(instruction_file)
        self.task_configs = self._parse_task_config(task_config_dir)
        self.instruction_strategy = instruction_strategy
        self.max_num_instances_per_task = max_num_instances_per_task
        self.max_num_instances_per_eval_task = max_num_instances_per_eval_task

Besides, here is the code for GeneratorBasedBuilder.

class UIEInstructions(datasets.GeneratorBasedBuilder):
    VERSION = datasets.Version("2.0.0")
    BUILDER_CONFIG_CLASS = UIEConfig
    BUILDER_CONFIGS = [
        UIEConfig(name="default", description="Default config for NaturalInstructions")
    ]
    DEFAULT_CONFIG_NAME = "default"

Here is the load_dataset

raw_datasets = load_dataset(
        os.path.join(CURRENT_DIR, "uie_dataset.py"),
        data_dir=data_args.data_dir,
        task_config_dir=data_args.task_config_dir,
        instruction_file=data_args.instruction_file,
        instruction_strategy=data_args.instruction_strategy,
        cache_dir=data_cache_dir,  # for debug, change dataset size, otherwise open it
        max_num_instances_per_task=data_args.max_num_instances_per_task,
        max_num_instances_per_eval_task=data_args.max_num_instances_per_eval_task,
        num_examples=data_args.num_examples,
        over_sampling=data_args.over_sampling
    )

Finally, I met the error.

BuilderConfig UIEConfig(name='default', version=0.0.0, data_dir=None, data_files=None, description='Default config for NaturalInstructions') doesn't have a 'task_config_dir' key.

I debugged the code, but I find the parameters added by me may not work.

Steps to reproduce the bug

https://github.com/BeyonderXX/InstructUIE/blob/master/src/uie_dataset.py

Expected behavior

BuilderConfig UIEConfig(name='default', version=0.0.0, data_dir=None, data_files=None, description='Default config for NaturalInstructions') doesn't have a 'task_config_dir' key.

Environment info

torch 2.3.0+cu118
transformers 4.40.1
python 3.8

@albertvillanova
Copy link
Member

I guess the issue is caused by the customization of BuilderConfig that you use from the repo https://github.com/BeyonderXX/InstructUIE. You should report to them.

I see you already opened an issue in their repo:

@albertvillanova albertvillanova closed this as not planned Won't fix, can't repro, duplicate, stale May 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants