-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fp8
and bfloat16
support
#923
Comments
It looks like Cuda provides a few alternate floating point options, including This would have to be a Cuda only feature, as there is no equivalent in OpenCL 2.0. We already have support for |
@MoFtZ that's why it might make sense to have the generic-sized |
Based on our last discussions, this is more broadly related to adding support for the Cuda WMMA (Warp Level Matrix Multiply-Accumulate Instructions); adding support for the |
NVidia actually has two variants of
fp8
with different sizes of mantissa/exponent.bfloat16
is also unique. There's also TensorFloat32 which is really more likebfloat19
. Perhaps it would make sense to havefloat<SizeOfMantissa, SizeOfExponent>
generic type (hackery).The text was updated successfully, but these errors were encountered: