Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ISSUE] flash_attn f16 warning #97

Open
zhzLuke96 opened this issue Jul 10, 2024 · 0 comments
Open

[ISSUE] flash_attn f16 warning #97

zhzLuke96 opened this issue Jul 10, 2024 · 0 comments
Labels
bug Something isn't working help wanted Extra attention is needed upstream Dependency on upstream fixes

Comments

@zhzLuke96
Copy link
Member

zhzLuke96 commented Jul 10, 2024

你的issues

启用Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`,有这个告警。同时启用--compile无法启动。
单独启用--compile ,自行触发shape预热预编译是什么意思,是指第一次生成语音比较慢对吗? 真正跑起来也没感觉快很多。

api 通过curl 调用,开启流式,产生的mp3文件是怎么流式获取?

Originally posted by @caixianyu in #96 (comment)

- `flash_attn` 这个报错,有点奇怪,按道理说默认是开启半精度。这块的逻辑官方也才刚刚更新,我也才移植过来没几天,可能还有问题,得排查下

Originally posted by @zhzLuke96 in #96 (comment)

@zhzLuke96 zhzLuke96 added the bug Something isn't working label Jul 10, 2024
@zhzLuke96 zhzLuke96 changed the title [ISSUE] [ISSUE] flash_attn f16 warning Jul 10, 2024
@zhzLuke96 zhzLuke96 added help wanted Extra attention is needed upstream Dependency on upstream fixes labels Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed upstream Dependency on upstream fixes
Projects
None yet
Development

No branches or pull requests

1 participant