Inference metrics improvements #14923

rfejgin · 2025-10-13T20:38:59Z

Added these two metrics to infer_and_evaluate.py:

Add MOS estimation using UTMOSv2
Add a metric to tracking total generated audio duration per dataset. We can use this as a lightweight indicator of speaking rate.
Rename a command line option for consistency: disable_fcd --> --no_fcd

- Add UTMOSv2 MOS estimation - Track total generated duration per dataset, which we'll use an indicator of speech rate. Signed-off-by: Fejgin, Roy <[email protected]>

scripts/magpietts/infer_and_evaluate.py

Until we add this to our docker image, temporarily install it each time the (relevant) CI tests are run. Signed-off-by: Fejgin, Roy <[email protected]>

shehzeen · 2025-10-13T23:35:47Z

.github/workflows/cicd-main-speech.yml

        uses: actions/checkout@v4
+      # Temporarily install this manually until we add it to the docker image.
+      - name: Install UTMOSv2 # Needed by evaluate_generated_audio.py.
+        run: pip install git+https://github.com/sarulab-speech/UTMOSv2.git


should we use a specific commit or are we comfortable with using their latest main branch?

Yeah, I was wondering the same. But I looked at their recent commits and it's basically infrequent maintenance updates (e.g. compatibility with new versions of torch). Probably less than 1 commit a month on average. So I tend to think we benefit more from using latest on main branch since it should have bugfixes etc. but minimal churn from updates.

Anyways: thought about it some more and decided to update pip-requirements to pin UTMOSv2 to its latest release, v1.2.1.

scripts/magpietts/evaluate_generated_audio.py

Signed-off-by: Fejgin, Roy <[email protected]>

And also pin to version 1.2.1. Signed-off-by: Fejgin, Roy <[email protected]>

Signed-off-by: Fejgin, Roy <[email protected]>

blisc · 2025-10-20T20:06:19Z

scripts/magpietts/evaluate_generated_audio.py

+def compute_utmosv2_scores(audio_dir):
+    print(f"\nComputing UTMOSv2 scores for files in {audio_dir}...")
+    start_time = time.time()
+    utmosv2_calculator = UTMOSv2Calculator()


Can we use the same "device" as on line 200: device = "cuda"

Good catch! Fixed.

blisc

Left once final comment, but otherwise good to go

1. Bugfix: use correct device for UTMOSv2 model 2. Removed restriction of runnin UTMOSv2 only on datasets with fewer than 200 entries. Reason: after optimizing how UTMOSv2 is run it is now much faster; a 2000-utterance dataset only went from 25 minutes to 28 minutes by adding UTMOSv2. Getting the metric is worth the small additional runtime. Signed-off-by: Fejgin, Roy <[email protected]>

rfejgin · 2025-10-20T21:05:41Z

I removed the restriction limiting UTMOSv2 to datasets with under 200 utterances. That's because it's a lot faster to run now that it is batched and with inference workers and threads tuned. For a 2000-utterance dataset, UTMOSv2 only increased test time from ~25 min to 28 min which seems worth it.

Will merge once CI passes (excluding unrelated known CI issue with coverage calculation).

Inference metrics improvements

127818a

- Add UTMOSv2 MOS estimation - Track total generated duration per dataset, which we'll use an indicator of speech rate. Signed-off-by: Fejgin, Roy <[email protected]>

github-actions bot added the TTS label Oct 13, 2025

rfejgin added the Run CICD label Oct 13, 2025

rfejgin had a problem deploying to test October 13, 2025 20:40 — with GitHub Actions Error

rfejgin changed the title ~~Inference metrics improvements: UTMOSv2 and total generated duration tracking~~ Inference metrics improvements Oct 13, 2025

rfejgin marked this pull request as ready for review October 13, 2025 21:27

rfejgin commented Oct 13, 2025

View reviewed changes

scripts/magpietts/infer_and_evaluate.py Show resolved Hide resolved

Add UTMOSv2 installation to CI.

00c988f

Until we add this to our docker image, temporarily install it each time the (relevant) CI tests are run. Signed-off-by: Fejgin, Roy <[email protected]>

rfejgin requested review from chtruong814, ko3n1g, pablo-garay and thomasdhc as code owners October 13, 2025 22:00

github-actions bot added the CI label Oct 13, 2025

chtruong814 added Run CICD and removed Run CICD labels Oct 13, 2025

rfejgin requested a review from blisc October 13, 2025 22:00

chtruong814 had a problem deploying to test October 13, 2025 22:01 — with GitHub Actions Error

rfejgin requested a review from shehzeen October 13, 2025 22:04

shehzeen reviewed Oct 13, 2025

View reviewed changes

scripts/magpietts/evaluate_generated_audio.py Show resolved Hide resolved

Add UTMOSv2 to violin plots

7c6720c

Signed-off-by: Fejgin, Roy <[email protected]>

chtruong814 added Run CICD and removed Run CICD labels Oct 14, 2025

chtruong814 temporarily deployed to test October 14, 2025 00:52 — with GitHub Actions Inactive

chtruong814 temporarily deployed to test October 14, 2025 20:39 — with GitHub Actions Inactive

Disable UTMOSv2 installation to debug container build issues

22f0b32

Signed-off-by: Fejgin, Roy <[email protected]>

chtruong814 added Run CICD and removed Run CICD labels Oct 15, 2025

chtruong814 temporarily deployed to test October 15, 2025 05:36 — with GitHub Actions Inactive

Update requirements_tts.txt install UTMOSv2 with proper syntax

1d74bc7

And also pin to version 1.2.1. Signed-off-by: Fejgin, Roy <[email protected]>

Compute batch of files in UTMOSv2

4ee8939

Signed-off-by: Fejgin, Roy <[email protected]>

chtruong814 added Run CICD and removed Run CICD labels Oct 18, 2025

Fix isort and flake8 errors

287d2f9

Signed-off-by: Fejgin, Roy <[email protected]>

chtruong814 added Run CICD and removed Run CICD labels Oct 18, 2025

chtruong814 temporarily deployed to test October 18, 2025 01:22 — with GitHub Actions Inactive

Add some logging to debug UTMOSv2 score not found error

d52b361

Signed-off-by: Fejgin, Roy <[email protected]>

chtruong814 added Run CICD and removed Run CICD labels Oct 19, 2025

chtruong814 temporarily deployed to test October 19, 2025 05:06 — with GitHub Actions Inactive

Normalize file paths for UTMOSv2 score lookup

cdf4b31

Signed-off-by: Fejgin, Roy <[email protected]>

chtruong814 added Run CICD and removed Run CICD labels Oct 20, 2025

chtruong814 temporarily deployed to test October 20, 2025 03:16 — with GitHub Actions Inactive

Remove debug code

cc132f9

Signed-off-by: Fejgin, Roy <[email protected]>

chtruong814 added Run CICD and removed Run CICD labels Oct 20, 2025

chtruong814 temporarily deployed to test October 20, 2025 03:45 — with GitHub Actions Inactive

blisc reviewed Oct 20, 2025

View reviewed changes

blisc approved these changes Oct 20, 2025

View reviewed changes

chtruong814 added Run CICD and removed Run CICD labels Oct 20, 2025

chtruong814 temporarily deployed to test October 20, 2025 21:05 — with GitHub Actions Inactive

rfejgin enabled auto-merge (squash) October 20, 2025 22:38

blisc disabled auto-merge October 20, 2025 23:00

blisc merged commit 066d622 into NVIDIA-NeMo:magpietts_2508 Oct 20, 2025
63 of 66 checks passed

XuesongYang mentioned this pull request Oct 30, 2025

[MagpieTTS][TTS] Streaming Algorithm for MagpieTTS to 2508 #14573

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inference metrics improvements #14923

Inference metrics improvements #14923

rfejgin commented Oct 13, 2025 •

edited

Loading

Uh oh!

Uh oh!

shehzeen Oct 13, 2025

Uh oh!

rfejgin Oct 14, 2025

Uh oh!

rfejgin Oct 15, 2025

Uh oh!

Uh oh!

blisc Oct 20, 2025

Uh oh!

rfejgin Oct 20, 2025

Uh oh!

blisc left a comment

Uh oh!

rfejgin commented Oct 20, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Inference metrics improvements #14923

Inference metrics improvements #14923

Conversation

rfejgin commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

shehzeen Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

rfejgin Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

rfejgin Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

blisc Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

rfejgin Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

blisc left a comment

Choose a reason for hiding this comment

Uh oh!

rfejgin commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rfejgin commented Oct 13, 2025 •

edited

Loading

rfejgin commented Oct 20, 2025 •

edited

Loading