ASR: Use length recurrence for streaming pre-encode drop count by 1fanwang · Pull Request #15689 · NVIDIA-NeMo/NeMo

1fanwang · 2026-05-12T10:52:09Z

What

The streaming encoder's drop_extra_pre_encoded count was computed as 1 + (cache_size - 1) // subsampling_factor. For convolutional subsampling that formula is only accurate at the default cache_size = subsampling_factor + 1, because the actual ConvSubsampling.forward uses the convolutional length recurrence

L_next = floor((L + all_paddings - kernel_size) / stride) + 1

(or ceil under _ceil_mode), composed over _sampling_num layers.

For any other pre_encode_cache_size, the divisor approximation diverges from what forward produces, so streaming inference drops the wrong number of frames and the chunked output disagrees with a full pass — the mismatch reported in #15482.

This PR routes the drop count through a new get_streaming_drop_size(cache_size) on each subsampler:

ConvSubsampling.get_streaming_drop_size uses the same calc_length helper the encoder already uses for the forward pass, so the streaming drop count stays consistent with the encoder's own length bookkeeping.
StackingSubsampling.get_streaming_drop_size exposes the exact cache_size // factor relation.
ConformerEncoder.setup_streaming_params calls the new method when available; for custom pre_encode modules that predate it, it falls back to the legacy formula (which coincides with the new one only at the default cache_size).

Tests

tests/collections/asr/test_asr_subsampling.py::TestStreamingDropExtraPreEncoded:

test_drop_size_matches_forward — parametrized over 4 subsampler shapes (striding/dw_striding × subsampling_factor=4/8) and 7 cache sizes (1, 4, 8, 9, 11, 16, 32). Each case runs the actual forward on a cache_size-long input and asserts the returned out_lengths[0] equals get_streaming_drop_size(cache_size).
test_drop_size_legacy_formula_diverges_for_non_default_cache — documents the bug: subsampling_factor=8, cache_size=11 returns 2 under the old formula but the convolutional recurrence (and the actual forward) returns 3.
test_drop_size_zero_for_empty_cache — cache_size <= 0 → 0.
test_stacking_drop_size — exact cache_size // factor for StackingSubsampling.

The new tests fail on main and pass with this PR; the legacy formula case demonstrates the divergence.

The streaming encoder's `drop_extra_pre_encoded` count was computed as `1 + (cache_size - 1) // subsampling_factor`. For convolutional subsampling that's only accurate at the default `cache_size = subsampling_factor + 1` because the actual forward pass uses the convolutional length recurrence `L_next = floor((L + paddings - kernel) / stride) + 1` (or `ceil` under `_ceil_mode`) composed over `_sampling_num` layers. For arbitrary `pre_encode_cache_size` the divisor approximation diverges from what `forward` produces, so streaming inference drops the wrong number of frames and the chunked output disagrees with a full pass — the mismatch reported in NVIDIA-NeMo#15482. Route the drop count through a new `get_streaming_drop_size(cache_size)` on each subsampler. `ConvSubsampling` uses the same `calc_length` helper the encoder already uses for the forward pass; `StackingSubsampling` exposes the exact `cache_size // factor` relation. The encoder falls back to the legacy formula only when `pre_encode` is a custom module that predates this method, where it coincides with the current default. Tests parametrize 4 subsampler shapes × 7 cache sizes and assert `get_streaming_drop_size` equals what `forward` actually returns. A documented case (`subsampling_factor=8`, `cache_size=11`) shows the old formula returning 2 while the recurrence returns 3. Closes NVIDIA-NeMo#15482 Signed-off-by: 1fanwang <1fannnw@gmail.com>

copy-pr-bot · 2026-05-12T10:52:14Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions Bot added ASR community-request labels May 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ASR: Use length recurrence for streaming pre-encode drop count#15689

ASR: Use length recurrence for streaming pre-encode drop count#15689
1fanwang wants to merge 1 commit into
NVIDIA-NeMo:mainfrom
1fanwang:fix/streaming-drop-extra-pre-encoded-recurrence

1fanwang commented May 12, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

1fanwang commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Tests

Uh oh!

copy-pr-bot Bot commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1fanwang commented May 12, 2026 •

edited

Loading