Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Onnx exported model outputs an audio of fixed length. #315

Open
wetdog opened this issue Feb 24, 2025 · 1 comment
Open

[Bug] Onnx exported model outputs an audio of fixed length. #315

wetdog opened this issue Feb 24, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@wetdog
Copy link

wetdog commented Feb 24, 2025

Describe the bug

The models exported to Onnx always return an audio of a fixed length which is proportional to the value defined in the dummy_input in the export code.

dummy_input_length = 100

if the text is longer than the fixed length the sentence gets trimmed, however, when the output is shorter it gets padded with a tone at the end.

To Reproduce

import numpy as np
from TTS.tts.configs.vits_config import VitsConfig
from TTS.tts.models.vits import Vits
from TTS.utils.audio.numpy_transforms import save_wav

config_path="../coqui-TTS/coqui-ai-TTS/tts_models--en--ljspeech--vits/config.json"
model_path="../coqui-TTS/coqui-ai-TTS/coqui_ljspeech.onnx"

config = VitsConfig()
config.load_json(config_path)
vits = Vits.init_from_config(config)
vits.load_onnx(model_path)


text = "Hello world, this is the widely known voice of ljspeech dataset."

text_inputs = np.asarray(
    vits.tokenizer.text_to_ids(text, language=None),
    dtype=np.int64,
)[None, :]

audio = vits.inference_onnx(text_inputs)
save_wav(wav=audio[0], path="test_onnx_coqui.wav", sample_rate=config.audio.sample_rate)

Expected behavior

No response

Logs

Environment

{
    "CUDA": {
        "GPU": [],
        "available": false,
        "version": "12.4"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.6.0+cu124",
        "TTS": "0.25.3",
        "numpy": "1.26.4"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "x86_64",
        "python": "3.10.11",
        "version": "#53~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Jan 15 19:18:46 UTC 2"
    }
}

Additional context

No response

@wetdog wetdog added the bug Something isn't working label Feb 24, 2025
@wetdog
Copy link
Author

wetdog commented Feb 25, 2025

I manage to solve it by changing this code in https://github.com/idiap/coqui-ai-TTS/blob/dev/TTS/tts/utils/helpers.py#L47

#old
max_len = int(sequence_length.max())
#new
max_len = sequence_length.max()

this change was introduced in 7330ad8#diff-05a17bda04945f3b4ff830e669b7b0c2e684aa462c7f91b7681fe0d53691ab6aR49

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant