ValueError: The input waveform must be two dimensional, but has 102528 dimension(s) instead. #510

erkaink · 2024-09-06T23:10:44Z

Hello, I am getting the error below and I can't find a solution. Does anyone have an idea of what I should do? I asked ChatGPT, I tried making the input sound file Stereo, making it Mono, etc. but it still didn't work. Thanks in advance.

----@---- seamless_communication % m4t_predict input/speech.mp3 --task S2ST --tgt_lang FRA --output_path /Users/username/seamless_communication/output/compl.mp3
2024-09-07 01:34:47,221 INFO -- seamless_communication.cli.m4t.predict.predict: Running inference on device=device(type='cpu') with dtype=torch.float32.
Using the cached checkpoint of seamlessM4T_v2_large. Set force to True to download again.
Using the cached tokenizer of seamlessM4T_v2_large. Set force to True to download again.
Using the cached tokenizer of seamlessM4T_v2_large. Set force to True to download again.
Using the cached tokenizer of seamlessM4T_v2_large. Set force to True to download again.
Using the cached checkpoint of vocoder_v2. Set force to True to download again.
/opt/homebrew/lib/python3.11/site-packages/torch/nn/utils/weight_norm.py:134: FutureWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
WeightNorm.apply(module, name, dim)
2024-09-07 01:35:09,103 INFO -- seamless_communication.cli.m4t.predict.predict: text_generation_opts=SequenceGeneratorOptions(beam_size=5, soft_max_seq_len=(1, 200), hard_max_seq_len=1024, step_processor=None, unk_penalty=0.0, len_penalty=1.0)
2024-09-07 01:35:09,105 INFO -- seamless_communication.cli.m4t.predict.predict: unit_generation_opts=SequenceGeneratorOptions(beam_size=5, soft_max_seq_len=(25, 50), hard_max_seq_len=1024, step_processor=None, unk_penalty=0.0, len_penalty=1.0)
2024-09-07 01:35:09,105 INFO -- seamless_communication.cli.m4t.predict.predict: unit_generation_ngram_filtering=False
2024-09-07 01:35:09,141 WARNING -- seamless_communication.inference.translator: Transposing audio tensor from (bsz, seq_len) -> (seq_len, bsz).
Traceback (most recent call last):
File "/opt/homebrew/bin/m4t_predict", line 8, in
sys.exit(main())
^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/seamless_communication/cli/m4t/predict/predict.py", line 235, in main
text_output, speech_output = translator.predict(
^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/seamless_communication/inference/translator.py", line 293, in predict
src = self.collate(self.convert_to_fbank(decoded_audio))["fbank"]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: The input waveform must be two dimensional, but has 102528 dimension(s) instead.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: The input waveform must be two dimensional, but has 102528 dimension(s) instead. #510

ValueError: The input waveform must be two dimensional, but has 102528 dimension(s) instead. #510

erkaink commented Sep 6, 2024 •

edited

Loading

ValueError: The input waveform must be two dimensional, but has 102528 dimension(s) instead. #510

ValueError: The input waveform must be two dimensional, but has 102528 dimension(s) instead. #510

Comments

erkaink commented Sep 6, 2024 • edited Loading

erkaink commented Sep 6, 2024 •

edited

Loading