Skip to content

CUDA header declarations for Layer Normalization (LayerNorm) forward and backward passes#66

Merged
Eamon2009 merged 6 commits into
codeaddict-masterfrom
master
Jun 1, 2026
Merged

CUDA header declarations for Layer Normalization (LayerNorm) forward and backward passes#66
Eamon2009 merged 6 commits into
codeaddict-masterfrom
master

Conversation

@Eamon2009
Copy link
Copy Markdown
Owner

Summary

Introduces the CUDA header declarations for Layer Normalization (LayerNorm) forward and backward passes within the quadtrix::cuda namespace. This defines the interfaces required for managing feature normalization and its corresponding gradient tracking during backpropagation.

Key Additions

  • layernorm_forward: Normalizes the incoming input tensor using learnable gamma (scale) and beta (shift) weights. It writes the result to output while preserving intermediate mean and rstd (reciprocal standard deviation) caches for exact gradient tracking.

  • layernorm_backward: Computes the backward pass gradients for the inputs (grad_input) as well as the parameter weights (grad_gamma and grad_beta) based on the incoming grad_output.

Eamon2009 and others added 6 commits June 1, 2026 01:00
* docs: report [run_20260530_165216] (~791 tok/s)

 Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms.
Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900

* docs:report [run_20260530_165216](~791 tok/s)  (#61)

Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms.
Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900

Co-authored-by: Max <eamon5174@gmail.com>

* feat(cuda): add attention forward and backward kernel declarations

Introduces the header declarations for `attention_forward` and
`attention_backward` operations inside the `quadtrix::cuda` namespace.
Configured with support for custom CUDA streams and head partitioning.

---------

Co-authored-by: Max <eamon5174@gmail.com>
- Defines `DType` and `DeviceKind` enums supporting standard types (F32, F16, BF16, I32, U8).
- Implements `dtype_name` and `dtype_size` metadata helper functions.
- Adds an explicit `Status` struct for non-throwing error propagation alongside `checked_mul` for safe allocation size computation.
- Introduces `check_cuda` and `abort_on_cuda` error macros and handling mechanisms, exposed via the `QUADTRIX_CUDA_CHECK` macro.
- Introduces the `GeluMode` enum to toggle between `Exact` and `Approximate` mathematical variants.
- Declares the `gelu_forward` and `gelu_backward` kernel entrypoints.
- Configures both signatures with optional stream execution and a default mode of `GeluMode::Approximate`.
@Eamon2009
Copy link
Copy Markdown
Owner Author

/run-checks

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

✅ All checks passed!

@codeenthusiasm23
Copy link
Copy Markdown

/run-checks

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

@codeenthusiasm23 Only maintainers can trigger checks.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

❌ Some checks failed — see Actions for details.

@Eamon2009 Eamon2009 merged commit aef3e1e into codeaddict-master Jun 1, 2026
26 of 27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants