在Windows本地部署，运行时报错 RuntimeError: Tensors must have same number of dimensions: got 2 and 1

Web页面已经启动了，并且通过麦克风说话，在output/input目录下也生成了说话的录音文件。但后续的步骤出错，无法生成回复的音频。

以下是详细日志
[INFO] Received audio data: 116 bytes
[INFO] Accumulating PCM: 960 samples (total: 476160, 19.84s)
[INFO] Received audio data: 152 bytes
[INFO] Accumulating PCM: 960 samples (total: 477120, 19.88s)
[INFO] Received audio data: 154 bytes
[INFO] Accumulating PCM: 960 samples (total: 478080, 19.92s)
[INFO] Received PAUSE signal
[INFO] Saved audio to ./output\input\Client-2944189797568_turn0_input.wav, length: 19.92s
[INFO] [Turn 0] Preparing model input: 2 messages, 1 audio items
[INFO] [Turn 0] Message history: ['system', 'user']
[INFO] [Turn 0] Queue status: audio_tokens=0, opus_bytes=0
[INFO] start inference of (19.92s audio)...
[INFO] [Turn 0] Generation thread started, input_ids shape: torch.Size([1, 219])
[INFO] Start monitoring the generation results loop (turn 0)
[INFO] Monitor loop #2200: thread_alive=True, steps=1, last_step=1, stuck_time=120.8s
[INFO] Skipping first audio batch (prompt): shape=torch.Size([1, 219]), tokens=219
[INFO] Monitor loop #100: thread_alive=True, steps=1, last_step=1, stuck_time=4.7s
[INFO] Monitor loop #2300: thread_alive=True, steps=1, last_step=1, stuck_time=125.6s
[INFO] Monitor loop #200: thread_alive=True, steps=1, last_step=1, stuck_time=9.4s
[INFO] Monitor loop #2400: thread_alive=True, steps=1, last_step=1, stuck_time=130.3s
[INFO] Monitor loop #300: thread_alive=True, steps=1, last_step=1, stuck_time=14.1s
[INFO] Monitor loop #2500: thread_alive=True, steps=1, last_step=1, stuck_time=135.1s
[INFO] Monitor loop #400: thread_alive=True, steps=1, last_step=1, stuck_time=18.9s
[INFO] Monitor loop #2600: thread_alive=True, steps=1, last_step=1, stuck_time=139.9s
[INFO] Monitor loop #500: thread_alive=True, steps=1, last_step=1, stuck_time=23.6s
[INFO] Monitor loop #2700: thread_alive=True, steps=1, last_step=1, stuck_time=144.6s
[INFO] Monitor loop #600: thread_alive=True, steps=1, last_step=1, stuck_time=28.4s
[INFO] Monitor loop #2800: thread_alive=True, steps=1, last_step=1, stuck_time=149.3s
[WARNING] [Turn 0] Generation appears stuck: no new steps for 10s (steps=1, loop=635)
[INFO] connection closed
[INFO] Send coroutine stopped, total frames sent: 0
[INFO] Text send coroutine stopped, total texts sent: 0
[INFO] Encode thread stopped
[INFO] TTS receiver thread stopped
[INFO] TTS sender thread stopped
[INFO] Monitor loop #700: thread_alive=True, steps=1, last_step=1, stuck_time=34.1s
[INFO] Monitor loop #2900: thread_alive=True, steps=1, last_step=1, stuck_time=155.1s
[INFO] Monitor loop #800: thread_alive=True, steps=1, last_step=1, stuck_time=40.6s
[INFO] Monitor loop #3000: thread_alive=True, steps=1, last_step=1, stuck_time=161.5s
[INFO] Monitor loop #900: thread_alive=True, steps=1, last_step=1, stuck_time=47.0s
[INFO] Monitor loop #3100: thread_alive=True, steps=1, last_step=1, stuck_time=168.0s
[INFO] Monitor loop #1000: thread_alive=True, steps=1, last_step=1, stuck_time=53.4s
[INFO] Monitor loop #3200: thread_alive=True, steps=1, last_step=1, stuck_time=174.4s
[INFO] Monitor loop #1100: thread_alive=True, steps=1, last_step=1, stuck_time=59.8s
[INFO] Monitor loop #3300: thread_alive=True, steps=1, last_step=1, stuck_time=180.8s
[INFO] Monitor loop #1200: thread_alive=True, steps=1, last_step=1, stuck_time=66.2s
[INFO] Monitor loop #3400: thread_alive=True, steps=1, last_step=1, stuck_time=187.2s
[ERROR] [Turn 0] Generation failed after 71.79s: Tensors must have same number of dimensions: got 2 and 1
Traceback (most recent call last):
  File "D:\apps\Fun-Audio-Chat\web_demo\server\server.py", line 447, in run_generation
    self.model_manager.model.generate(**inputs, **gen_kwargs_with_streamer)
  File "D:\apps\Fun-Audio-Chat\venv\Lib\site-packages\torch\utils\_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "D:\apps\Fun-Audio-Chat\venv\Lib\site-packages\transformers\generation\utils.py", line 2597, in generate
    result = self._sample(
             ^^^^^^^^^^^^^
  File "D:\apps\Fun-Audio-Chat\funaudiochat\modeling_funaudiochat.py", line 1329, in _sample
    outputs = self(**model_inputs, return_dict=True)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\apps\Fun-Audio-Chat\venv\Lib\site-packages\torch\nn\modules\module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\apps\Fun-Audio-Chat\venv\Lib\site-packages\torch\nn\modules\module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\apps\Fun-Audio-Chat\funaudiochat\modeling_funaudiochat.py", line 1133, in forward
    speech_output = self.audio_invert_tower(
                    ^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\apps\Fun-Audio-Chat\venv\Lib\site-packages\torch\nn\modules\module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\apps\Fun-Audio-Chat\venv\Lib\site-packages\torch\nn\modules\module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\apps\Fun-Audio-Chat\funaudiochat\modeling_funaudiochat.py", line 663, in crq_generate_forward
    crq_audio_tokens, logits = self.sampling_step(logits)
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\apps\Fun-Audio-Chat\funaudiochat\modeling_funaudiochat.py", line 603, in sampling_step
    next_token_scores = self.crq_logits_processor(torch.cat([self.crq_speech_ids, *self.crq_generate_tokens], dim=-1), next_token_logits)
                                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Tensors must have same number of dimensions: got 2 and 1
[INFO] Monitor loop ended: total_loops=1286, final_steps=1, interrupted=False
[INFO] TTS generation marked as complete
[INFO] All audio encoded and sent after 0.0s
[INFO] Generation completed:
[INFO] Processing completed
[INFO] Waiting for workers to stop...
[INFO] Cleared TTS cache for session 6586ef80-a484-4ae9-ab79-807923f2d875
[INFO] All worker threads stopped
[INFO] done with connection
2025-12-25 11:57:42,482 INFO 192.168.6.3 [25/Dec/2025:11:56:08 +0800] "GET /api/chat?text_temperature=0.7&text_topk=25&audio_temperature=0.8&audio_topk=250&pad_mult=0&text_seed=261668&audio_seed=665770&repetition_penalty_context=64&repetition_penalty=1 HTTP/1.1" 101 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:146.0) Gecko/20100101 Firefox/146.0"
[INFO] [TTS Process] Cleared cache for 6586ef80-a484-4ae9-ab79-807923f2d875

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

在Windows本地部署，运行时报错 RuntimeError: Tensors must have same number of dimensions: got 2 and 1 #13

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

在Windows本地部署，运行时报错 RuntimeError: Tensors must have same number of dimensions: got 2 and 1 #13

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions