[Bug] Onnx exported model outputs an audio of fixed length. #315

wetdog · 2025-02-24T15:14:43Z

Describe the bug

The models exported to Onnx always return an audio of a fixed length which is proportional to the value defined in the dummy_input in the export code.

coqui-ai-TTS/TTS/tts/models/vits.py

Line 1688 in 382b418

dummy_input_length = 100

if the text is longer than the fixed length the sentence gets trimmed, however, when the output is shorter it gets padded with a tone at the end.

To Reproduce

import numpy as np
from TTS.tts.configs.vits_config import VitsConfig
from TTS.tts.models.vits import Vits
from TTS.utils.audio.numpy_transforms import save_wav

config_path="../coqui-TTS/coqui-ai-TTS/tts_models--en--ljspeech--vits/config.json"
model_path="../coqui-TTS/coqui-ai-TTS/coqui_ljspeech.onnx"

config = VitsConfig()
config.load_json(config_path)
vits = Vits.init_from_config(config)
vits.load_onnx(model_path)


text = "Hello world, this is the widely known voice of ljspeech dataset."

text_inputs = np.asarray(
    vits.tokenizer.text_to_ids(text, language=None),
    dtype=np.int64,
)[None, :]

audio = vits.inference_onnx(text_inputs)
save_wav(wav=audio[0], path="test_onnx_coqui.wav", sample_rate=config.audio.sample_rate)

Expected behavior

No response

Logs

Environment

{
    "CUDA": {
        "GPU": [],
        "available": false,
        "version": "12.4"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.6.0+cu124",
        "TTS": "0.25.3",
        "numpy": "1.26.4"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "x86_64",
        "python": "3.10.11",
        "version": "#53~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Jan 15 19:18:46 UTC 2"
    }
}

Additional context

No response

wetdog · 2025-02-25T15:40:15Z

I manage to solve it by changing this code in https://github.com/idiap/coqui-ai-TTS/blob/dev/TTS/tts/utils/helpers.py#L47

#old
max_len = int(sequence_length.max())
#new
max_len = sequence_length.max()

this change was introduced in 7330ad8#diff-05a17bda04945f3b4ff830e669b7b0c2e684aa462c7f91b7681fe0d53691ab6aR49

wetdog added the bug Something isn't working label Feb 24, 2025

JarbasAl mentioned this issue Feb 25, 2025

feat: onnx models OpenVoiceOS/ovos-tts-plugin-nos#10

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Onnx exported model outputs an audio of fixed length. #315

[Bug] Onnx exported model outputs an audio of fixed length. #315

wetdog commented Feb 24, 2025

wetdog commented Feb 25, 2025

[Bug] Onnx exported model outputs an audio of fixed length. #315

[Bug] Onnx exported model outputs an audio of fixed length. #315

Comments

wetdog commented Feb 24, 2025

Describe the bug

To Reproduce

Expected behavior

Logs

Environment

Additional context

wetdog commented Feb 25, 2025