Fix TDT beam search timestamp alignment #14912

patelnav · 2025-10-10T04:16:42Z

What does this PR do ?

Fixes two TDT beam search timestamp issues:

Crash when using beam search with timestamps=True (NoneType iteration error)
~160ms offset in timestamps between beam search and greedy decoding

Collection: ASR

Changelog

Store token_durations in BatchedBeamHyps for TDT models during beam search
Mark timestamp semantics with _timestamp_semantics flag on Hypothesis objects
Update _compute_offsets_tdt to handle both START and END timestamp semantics
Add backward compatibility for computing durations from timestamp differences
Fix docstring: corrected "char" field documentation from List[str] to List[int] (pre-existing bug: y_sequence contains integer token IDs, not strings)

Problem

Issue 1: Crash with beam search + timestamps

When using TDT beam search with timestamps=True, the code crashes with:

TypeError: 'NoneType' object is not iterable
  at nemo/collections/asr/parts/submodules/rnnt_decoding.py:1153 in _compute_offsets_tdt

Root cause: Beam search (BatchedBeamHyps) doesn't populate the token_duration field that _compute_offsets_tdt requires.

Issue 2: ~160ms timestamp offset

After fixing the crash (by computing durations from timestamp diffs), beam search timestamps are still ~160ms late compared to greedy. This occurs because:

Beam search stores END timestamps: timestamp = timesteps + duration
Greedy stores START timestamps: timestamp = timesteps
_compute_offsets_tdt assumed all timestamps were START times
Result: beam search offsets included leading blank frames (~160ms = 4 frames @ 40ms/frame)

Solution

Three-part approach:

Store token durations in BatchedBeamHyps (already receiving them during beam search, now stored)
Mark timestamp semantics with _timestamp_semantics attribute on Hypothesis objects
- Beam search: "end" (timestamps are END times)
- Greedy: "start" (timestamps are START times)
Compute correct offsets in _compute_offsets_tdt based on semantics:
- END semantics: start_offset = timestamp - duration, end_offset = timestamp
- START semantics: start_offset = timestamp, end_offset = timestamp + duration
- Fallback heuristic when flag missing (backward compatibility)

Usage

No API changes. The fix is transparent to users:

import nemo.collections.asr as nemo_asr
from omegaconf import OmegaConf

# Load TDT model
model = nemo_asr.models.ASRModel.from_pretrained("nvidia/parakeet-tdt-0.6b-v3")

# Configure beam search (e.g., for GPU phrase boosting)
decoding_cfg = OmegaConf.to_container(model.cfg.decoding, resolve=True)
decoding_cfg["strategy"] = "beam"
decoding_cfg["beam"]["beam_size"] = 4
decoding_cfg["beam"]["return_best_hypothesis"] = True
model.change_decoding_strategy(OmegaConf.create(decoding_cfg))

# Timestamps now work correctly with beam search (no crash, no 160ms offset)
result = model.transcribe(["audio.wav"], timestamps=True)

Impact

Enables GPU phrase boosting with timestamps (Issue 1: previously crashed)
Eliminates ~160ms timestamp offset (Issue 2: beam search word-level timestamps now align with greedy)
Zero performance penalty (durations computed during beam search anyway)
Backward compatible (computes durations from diffs if missing)

GitHub Actions CI

Ready for CI. Please add "Run CICD" label.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests? (Can add if requested by reviewers)
Did you add or update any necessary documentation? (Updated docstrings)
Does the PR affect components that are optional to install? No

PR Type:

Bugfix

Who can review?

@andrusenkoau
Per contributor guidelines, requesting review from ASR team:

* store token durations in BatchedBeamHyps for TDT models * mark timestamp semantics with _timestamp_semantics flag * update _compute_offsets_tdt to handle END timestamp semantics * fix docstring: correct char field from List[str] to List[int] Fixes beam search timestamp offset issue where timestamps were ~160ms late compared to greedy decoding. Root cause: beam search stores END timestamps (timesteps + duration) but offset computation expected START timestamps. Solution stores actual token durations and correctly interprets timestamps based on explicit semantics flag. Signed-off-by: Nav Patel <[email protected]>

Signed-off-by: patelnav <[email protected]>

patelnav · 2025-10-11T02:37:35Z

@andrusenkoau this might be of particular interest to you.
I tried the GPU-PB beam-search with timestamps turned on and it immediately crashed.

artbataev · 2025-10-16T20:47:43Z

@patelnav Thank you for the PR!
We’re in the middle of a broader work on refactoring and extending transducer beam search capabilities that overlaps with this change, so we’re unsure about merging as-is. Let’s keep it open (or convert to Draft) while that lands, and align on your use cases so we can either rebase/adapt or integrate the ideas directly.
Appreciate your contribution!

patelnav · 2025-10-16T21:35:41Z

@patelnav Thank you for the PR! We’re in the middle of a broader work on refactoring and extending transducer beam search capabilities that overlaps with this change, so we’re unsure about merging as-is. Let’s keep it open (or convert to Draft) while that lands, and align on your use cases so we can either rebase/adapt or integrate the ideas directly. Appreciate your contribution!

Sounds good. Best of luck with the refactor. Feel free to do as you wish with the PR 👍🏽

github-actions bot added the ASR label Oct 10, 2025

Apply isort and black reformatting

576f232

Signed-off-by: patelnav <[email protected]>

github-actions bot added the community-request label Oct 11, 2025

nithinraok requested review from artbataev and lilithgrigoryan October 13, 2025 14:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix TDT beam search timestamp alignment #14912

Fix TDT beam search timestamp alignment #14912

patelnav commented Oct 10, 2025

Uh oh!

patelnav commented Oct 11, 2025

Uh oh!

artbataev commented Oct 16, 2025

Uh oh!

patelnav commented Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix TDT beam search timestamp alignment #14912

Are you sure you want to change the base?

Fix TDT beam search timestamp alignment #14912

Conversation

patelnav commented Oct 10, 2025

What does this PR do ?

Changelog

Problem

Issue 1: Crash with beam search + timestamps

Issue 2: ~160ms timestamp offset

Solution

Usage

Impact

GitHub Actions CI

Before your PR is "Ready for review"

Who can review?

Uh oh!

patelnav commented Oct 11, 2025

Uh oh!

artbataev commented Oct 16, 2025

Uh oh!

patelnav commented Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants