You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I am getting the error below and I can't find a solution. Does anyone have an idea of what I should do? I asked ChatGPT, I tried making the input sound file Stereo, making it Mono, etc. but it still didn't work. Thanks in advance.
----@---- seamless_communication % m4t_predict input/speech.mp3 --task S2ST --tgt_lang FRA --output_path /Users/username/seamless_communication/output/compl.mp3
2024-09-07 01:34:47,221 INFO -- seamless_communication.cli.m4t.predict.predict: Running inference on device=device(type='cpu') with dtype=torch.float32.
Using the cached checkpoint of seamlessM4T_v2_large. Set force to True to download again.
Using the cached tokenizer of seamlessM4T_v2_large. Set force to True to download again.
Using the cached tokenizer of seamlessM4T_v2_large. Set force to True to download again.
Using the cached tokenizer of seamlessM4T_v2_large. Set force to True to download again.
Using the cached checkpoint of vocoder_v2. Set force to True to download again.
/opt/homebrew/lib/python3.11/site-packages/torch/nn/utils/weight_norm.py:134: FutureWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
WeightNorm.apply(module, name, dim)
2024-09-07 01:35:09,103 INFO -- seamless_communication.cli.m4t.predict.predict: text_generation_opts=SequenceGeneratorOptions(beam_size=5, soft_max_seq_len=(1, 200), hard_max_seq_len=1024, step_processor=None, unk_penalty=0.0, len_penalty=1.0)
2024-09-07 01:35:09,105 INFO -- seamless_communication.cli.m4t.predict.predict: unit_generation_opts=SequenceGeneratorOptions(beam_size=5, soft_max_seq_len=(25, 50), hard_max_seq_len=1024, step_processor=None, unk_penalty=0.0, len_penalty=1.0)
2024-09-07 01:35:09,105 INFO -- seamless_communication.cli.m4t.predict.predict: unit_generation_ngram_filtering=False
2024-09-07 01:35:09,141 WARNING -- seamless_communication.inference.translator: Transposing audio tensor from (bsz, seq_len) -> (seq_len, bsz).
Traceback (most recent call last):
File "/opt/homebrew/bin/m4t_predict", line 8, in
sys.exit(main())
^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/seamless_communication/cli/m4t/predict/predict.py", line 235, in main
text_output, speech_output = translator.predict(
^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/seamless_communication/inference/translator.py", line 293, in predict
src = self.collate(self.convert_to_fbank(decoded_audio))["fbank"]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: The input waveform must be two dimensional, but has 102528 dimension(s) instead.
The text was updated successfully, but these errors were encountered:
Hello, I am getting the error below and I can't find a solution. Does anyone have an idea of what I should do? I asked ChatGPT, I tried making the input sound file Stereo, making it Mono, etc. but it still didn't work. Thanks in advance.
----@---- seamless_communication % m4t_predict input/speech.mp3 --task S2ST --tgt_lang FRA --output_path /Users/username/seamless_communication/output/compl.mp3
2024-09-07 01:34:47,221 INFO -- seamless_communication.cli.m4t.predict.predict: Running inference on device=device(type='cpu') with dtype=torch.float32.
Using the cached checkpoint of seamlessM4T_v2_large. Set
force
toTrue
to download again.Using the cached tokenizer of seamlessM4T_v2_large. Set
force
toTrue
to download again.Using the cached tokenizer of seamlessM4T_v2_large. Set
force
toTrue
to download again.Using the cached tokenizer of seamlessM4T_v2_large. Set
force
toTrue
to download again.Using the cached checkpoint of vocoder_v2. Set
force
toTrue
to download again./opt/homebrew/lib/python3.11/site-packages/torch/nn/utils/weight_norm.py:134: FutureWarning:
torch.nn.utils.weight_norm
is deprecated in favor oftorch.nn.utils.parametrizations.weight_norm
.WeightNorm.apply(module, name, dim)
2024-09-07 01:35:09,103 INFO -- seamless_communication.cli.m4t.predict.predict: text_generation_opts=SequenceGeneratorOptions(beam_size=5, soft_max_seq_len=(1, 200), hard_max_seq_len=1024, step_processor=None, unk_penalty=0.0, len_penalty=1.0)
2024-09-07 01:35:09,105 INFO -- seamless_communication.cli.m4t.predict.predict: unit_generation_opts=SequenceGeneratorOptions(beam_size=5, soft_max_seq_len=(25, 50), hard_max_seq_len=1024, step_processor=None, unk_penalty=0.0, len_penalty=1.0)
2024-09-07 01:35:09,105 INFO -- seamless_communication.cli.m4t.predict.predict: unit_generation_ngram_filtering=False
2024-09-07 01:35:09,141 WARNING -- seamless_communication.inference.translator: Transposing audio tensor from (bsz, seq_len) -> (seq_len, bsz).
Traceback (most recent call last):
File "/opt/homebrew/bin/m4t_predict", line 8, in
sys.exit(main())
^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/seamless_communication/cli/m4t/predict/predict.py", line 235, in main
text_output, speech_output = translator.predict(
^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/seamless_communication/inference/translator.py", line 293, in predict
src = self.collate(self.convert_to_fbank(decoded_audio))["fbank"]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: The input waveform must be two dimensional, but has 102528 dimension(s) instead.
The text was updated successfully, but these errors were encountered: