Confirming eval and test sets #6

bagustris · 2021-09-17T10:11:02Z

Thank your for the great repository.
I just want to confirm, in the colab that you gave, the evaluation and test sets are from the same.
It is intended for demo only, right? Since the test set is included in the training process (as eval_dataset)
it is not a big surprise that the performance was high.

The text was updated successfully, but these errors were encountered:

jvel07 · 2021-10-07T09:22:47Z

@bagustris was thinking about the same. But seems like he splits the data and makes 20% for test.
In my case, I have train_dataset, validation (dev_dataset), and evaluation (test_dataset). At the end, in the colab there's this:

trainer = CTCTrainer(
    model=model,
    data_collator=data_collator,
    args=training_args,
    compute_metrics=compute_metrics,
    train_dataset=train_dataset,
    eval_dataset=dev_dataset,
    tokenizer=processor.feature_extractor,
)

Then, to validate the model on the test set, I should run set the CTCTrainer again and change the eval_dataset param to eval_set=test_dataset? and then hit trainer.train() again, right? @m3hrdadfi

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confirming eval and test sets #6

Confirming eval and test sets #6

bagustris commented Sep 17, 2021

jvel07 commented Oct 7, 2021

Confirming eval and test sets #6

Confirming eval and test sets #6

Comments

bagustris commented Sep 17, 2021

jvel07 commented Oct 7, 2021