Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR adds a new pocket reference documenting CUDA kernels for a triangular matrix multiplication forward pass (Trimat) used in causal self-attention.
- Introduces an overview and motivation for computing only the lower triangle of the attention matrix.
- Details four CUDA kernel implementations (naive, register tiling, vectorized loads, shared memory tiling) with diagrams.
- Provides input/output shapes, computation goal, configuration, and references.
Comments suppressed due to low confidence (2)
books/compute/src/cuda/kernels/trimat_forward.md:6
- Missing reading time preprocessor tag below the title. Please add
{{#reading_time}}immediately after the header so that the estimated reading time is rendered correctly.
# Kernels for Triangular Matrix Multiplication (Trimat) Forward Pass
books/compute/src/cuda/kernels/trimat_forward.md:92
- Using raw HTML
<center>tags disables markdownlint rules and may reduce consistency. Consider using<div align="center">or a mdbook-compatible markdown approach for centering images to avoid disabling MD033.
<center>
nerdai
left a comment
There was a problem hiding this comment.
Thanks @kohankhaki for the contribution! I think there's a lot to like here. In this first pass, I've noticed there's some styling—in particular, math styling—that we need to clean up.
Also, I know its a bit annoying but the line-length rule is set to 79 and for now shouldn't be ignored. This makes .md files easier to read when viewing them as raw files. Happy to show you how I make format this to 79 line length...
emersodb
left a comment
There was a problem hiding this comment.
Overall, I think the write up is concise and clear, especially considering the complexity of the topic. Most of my suggestions are fairly cosmetic rather than correcting any issues.
As an aside: @kohankhaki, did you make the svg diagrams? I think they are really nice.
emersodb
left a comment
There was a problem hiding this comment.
Changes look good to me. There is a single comment that is unresolved from my previous review, just asking about the indexing of the blocks in the diagrams. It's probably a naive question. If so, just ignore it and resolve 🙂. I was just curious.
|
@kohankhaki Excalidraw for the win! |
nerdai
left a comment
There was a problem hiding this comment.
@kohankhaki: Thanks for making the changes. I think the Pocket Ref reads great!
NOTE: I've swapped the local image versions for hosted ones, which is one of the final steps for preparing the Pocket Ref to be released.
Thank you David, just replied. :) |
[Compute] Trimat CUDA Kernel
Type of Change
Fixes #
Book
Description
Adding trimat CUDA kernel pocket reference.
Checklist
{{#author}}or{{#authors}})mdbook watch books/<book-name> --open