[ISSUE] flash_attn f16 warning #97

zhzLuke96 · 2024-07-10T06:56:11Z

你的issues

启用Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`，有这个告警。同时启用--compile无法启动。
单独启用--compile ，自行触发shape预热预编译是什么意思，是指第一次生成语音比较慢对吗？ 真正跑起来也没感觉快很多。

api 通过curl 调用，开启流式，产生的mp3文件是怎么流式获取？

Originally posted by @caixianyu in #96 (comment)

- `flash_attn` 这个报错，有点奇怪，按道理说默认是开启半精度。这块的逻辑官方也才刚刚更新，我也才移植过来没几天，可能还有问题，得排查下

Originally posted by @zhzLuke96 in #96 (comment)

The text was updated successfully, but these errors were encountered:

zhzLuke96 added the bug Something isn't working label Jul 10, 2024

zhzLuke96 changed the title ~~[ISSUE]~~ [ISSUE] flash_attn f16 warning Jul 10, 2024

zhzLuke96 mentioned this issue Jul 10, 2024

如何提高生成语音的速度？ #96

Open

6 tasks

zhzLuke96 added help wanted Extra attention is needed upstream Dependency on upstream fixes labels Jul 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ISSUE] flash_attn f16 warning #97

[ISSUE] flash_attn f16 warning #97

zhzLuke96 commented Jul 10, 2024 •

edited

Loading

[ISSUE] flash_attn f16 warning #97

[ISSUE] flash_attn f16 warning #97

Comments

zhzLuke96 commented Jul 10, 2024 • edited Loading

你的issues

zhzLuke96 commented Jul 10, 2024 •

edited

Loading