Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confirming eval and test sets #6

Open
bagustris opened this issue Sep 17, 2021 · 1 comment
Open

Confirming eval and test sets #6

bagustris opened this issue Sep 17, 2021 · 1 comment

Comments

@bagustris
Copy link

Hi @m3hrdadfi,

Thank your for the great repository.
I just want to confirm, in the colab that you gave, the evaluation and test sets are from the same.
It is intended for demo only, right? Since the test set is included in the training process (as eval_dataset)
it is not a big surprise that the performance was high.

@jvel07
Copy link

jvel07 commented Oct 7, 2021

@bagustris was thinking about the same. But seems like he splits the data and makes 20% for test.
In my case, I have train_dataset, validation (dev_dataset), and evaluation (test_dataset). At the end, in the colab there's this:

trainer = CTCTrainer(
    model=model,
    data_collator=data_collator,
    args=training_args,
    compute_metrics=compute_metrics,
    train_dataset=train_dataset,
    eval_dataset=dev_dataset,
    tokenizer=processor.feature_extractor,
)

Then, to validate the model on the test set, I should run set the CTCTrainer again and change the eval_dataset param to eval_set=test_dataset? and then hit trainer.train() again, right? @m3hrdadfi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants