What's Changed
- [misc] Add latest cutlass 3.7.0 submodule by @DefTruth in #62
- [Bugfix] fix macro typo by @DefTruth in #63
- [Misc] Update launch templates configs for small d by @DefTruth in #64
- [misc] remove some wrong comments by @DefTruth in #65
- [test] refactor ffpa-l1 multi-stages tests by @DefTruth in #66
- Revert "[test] refactor ffpa-l1 multi-stages tests" by @DefTruth in #67
- [test] refactor ffpa-l1 multi-stages tests by @DefTruth in #68
- [test] Add official flash-attn -> test cases by @DefTruth in #69
- [feat] support ffpa-l1 registers double buffers by @DefTruth in #70
- [README] Update README.md by @DefTruth in #71
- [tests] rename test.py -> test_ffpa_attn.py by @DefTruth in #72
- [misc] fix macro typo by @DefTruth in #75
- [docs] Add FFPA(Split-D) tech blog link by @DefTruth in #77
- [README] Add FFPA Split-D Algo chart by @DefTruth in #79
- update docs by @DefTruth in #84
- remove un-need codes by @DefTruth in #86
- update ffpa algo chart by @DefTruth in #87
- Update README.md by @DefTruth in #88

Full Changelog: v0.0.2...v0.0.2.1