Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update run_pseudo_labelling.py #158

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

peregilk
Copy link
Contributor

Prevents an error in jiwer caused by empty predictions. For consistency both predictions and labels are replaced with <|nocaptions|> if empty, so that they are calculated as part of the wer.

Prevents an error in jiwer caused by empty predictions. For consistency both predictions and labels are replaced with <|nocaptions|> if empty, so that they are calculated as part of the wer.
Comment on lines -806 to -808
# filtering step to only evaluate the samples that correspond to non-zero normalized references:
norm_pred_str = [norm_pred_str[i] for i in range(len(norm_pred_str)) if len(norm_label_str[i]) > 0]
norm_label_str = [norm_label_str[i] for i in range(len(norm_label_str)) if len(norm_label_str[i]) > 0]
Copy link
Collaborator

@sanchit-gandhi sanchit-gandhi Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines only keep the norm_pred_str (hypothesis) and norm_label_str (reference) where the norm_label_str is not empty.

The other edge-case is where we have an empty hypothesis. In this case, for a reference set of N words we have:

  • N deletions (as many deletions as we do number of words in our reference set)
  • 0 substitutions
  • 0 insertions

So the WER is: (N + 0 + 0) / N = 1, and computed in an entirely valid way.

You can see this with a toy example:

from jiwer import wer

reference = "hello world"
hypothesis = ""

error = wer(reference, hypothesis)
print(error)

Print Output:

1.0

=> so there shouldn't be a need to have an additional check for empty normalised hypothesis! These should be valid in the WER calculation. Let me know if you have a minimal repro to rebuttal this!

@peregilk
Copy link
Contributor Author

peregilk commented Jan 7, 2025

It might be fixed now. Read details here: jitsi/jiwer#98

Currently AFK so I have not tested. Not sure if my patch is valid for newest jiwer.

It is most effective for reducing hallucinations if it is replaced by <|nocaptions|>(Pre v3) or <|nospeech|>. But slightly different issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants