-
Notifications
You must be signed in to change notification settings - Fork 12.4k
cuda : implement bf16 cpy ops and enable bf16 cont #14763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally speaking I am not a fan of how the float conversions are being done currently. I think the code could be deduplicated significantly by unconditionally casting half
, nv_bfloat16
, and float
to float
and then simply using that float
value to set the destination. I would appreciate it if you were to do this in this PR, otherwise I'll keep it as one of the tasks to hand out when people ask me for a good first issue to work on.
Implemented missing BF16 CPY ops and enabled CONT op for BF16.
Tests before
Tests after
Also fixed a cut'n'paste error for F16->F16 in
ggml_cuda_cpy_fn
and deduplicated all copy functions.