-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Inference metrics improvements #14923
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inference metrics improvements #14923
Conversation
- Add UTMOSv2 MOS estimation - Track total generated duration per dataset, which we'll use an indicator of speech rate. Signed-off-by: Fejgin, Roy <[email protected]>
Until we add this to our docker image, temporarily install it each time the (relevant) CI tests are run. Signed-off-by: Fejgin, Roy <[email protected]>
| uses: actions/checkout@v4 | ||
| # Temporarily install this manually until we add it to the docker image. | ||
| - name: Install UTMOSv2 # Needed by evaluate_generated_audio.py. | ||
| run: pip install git+https://github.com/sarulab-speech/UTMOSv2.git |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we use a specific commit or are we comfortable with using their latest main branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was wondering the same. But I looked at their recent commits and it's basically infrequent maintenance updates (e.g. compatibility with new versions of torch). Probably less than 1 commit a month on average. So I tend to think we benefit more from using latest on main branch since it should have bugfixes etc. but minimal churn from updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyways: thought about it some more and decided to update pip-requirements to pin UTMOSv2 to its latest release, v1.2.1.
Signed-off-by: Fejgin, Roy <[email protected]>
Signed-off-by: Fejgin, Roy <[email protected]>
And also pin to version 1.2.1. Signed-off-by: Fejgin, Roy <[email protected]>
Signed-off-by: Fejgin, Roy <[email protected]>
Signed-off-by: Fejgin, Roy <[email protected]>
Signed-off-by: Fejgin, Roy <[email protected]>
Signed-off-by: Fejgin, Roy <[email protected]>
Signed-off-by: Fejgin, Roy <[email protected]>
| def compute_utmosv2_scores(audio_dir): | ||
| print(f"\nComputing UTMOSv2 scores for files in {audio_dir}...") | ||
| start_time = time.time() | ||
| utmosv2_calculator = UTMOSv2Calculator() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the same "device" as on line 200: device = "cuda"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left once final comment, but otherwise good to go
1. Bugfix: use correct device for UTMOSv2 model 2. Removed restriction of runnin UTMOSv2 only on datasets with fewer than 200 entries. Reason: after optimizing how UTMOSv2 is run it is now much faster; a 2000-utterance dataset only went from 25 minutes to 28 minutes by adding UTMOSv2. Getting the metric is worth the small additional runtime. Signed-off-by: Fejgin, Roy <[email protected]>
|
I removed the restriction limiting UTMOSv2 to datasets with under 200 utterances. That's because it's a lot faster to run now that it is batched and with inference workers and threads tuned. For a 2000-utterance dataset, UTMOSv2 only increased test time from ~25 min to 28 min which seems worth it. Will merge once CI passes (excluding unrelated known CI issue with coverage calculation). |
Added these two metrics to
infer_and_evaluate.py:disable_fcd-->--no_fcd