Have started to see some hints in cuTile Python:
See https://github.com/MurrellGroup/Microfloats.jl (needs some work/rewrite imo), which could possibly house E8M0 and sub-byte types if an extension approach like #36 can be taken, but I expect some bitpacking to be needed.
src/cuda/tile/_bytecode/float.py has some type stuff.
A comment in src/cuda/tile/_bytecode/type.py suggests that F8E8M0 (scale type in MXFP8) is planned for 13.2, and F4E2M1 (value type in MXFP4, NVFP4) is planned for 13.3.
Have started to see some hints in cuTile Python:
See https://github.com/MurrellGroup/Microfloats.jl (needs some work/rewrite imo), which could possibly house E8M0 and sub-byte types if an extension approach like #36 can be taken, but I expect some bitpacking to be needed.
src/cuda/tile/_bytecode/float.py has some type stuff.
A comment in src/cuda/tile/_bytecode/type.py suggests that F8E8M0 (scale type in MXFP8) is planned for 13.2, and F4E2M1 (value type in MXFP4, NVFP4) is planned for 13.3.