Replace `kpad_mask` with `q_lengths` and `k_lengths` #20

AntonOresten · 2025-11-18T10:37:16Z

This is mostly a proof-of-concept for how we can rethink batching as stacking contexts as a block-diagonal in the attention space, masking any interactions between contexts of different "documents", and skipping tiles that are fully masked. For training runs with high variability in sequence length, it avoids a huge amount of needless computation on padding tokens. Dense layers in a model stack also benefit from this.

Ideally this would be generalized to fully-fledged flex attention, but even then, document_ids and lengths might need to be a special case to efficiently construct a block mask.

See also Flex Attention

AntonOresten · 2025-11-18T16:44:59Z

Added Grouped-Query attention from #19, and also #18 to avoid NaNs, assuming it to be correct. This should make this branch quite generally useful in reducing memory usage.

Lengths with gq

AntonOresten · 2025-12-04T16:12:13Z

I think a flexible way of constructing the block mask is the trickiest part. The actual attention computation within each block should mostly remain the same.

AntonOresten added 3 commits November 17, 2025 17:25

lengths, q_lengths, and k_lengths args

a064a17

.

f19ae82

add tests for lengths arg, fix backward pass

05d148d

AntonOresten added 3 commits November 18, 2025 18:22

Add Grouped-Query Attention

69803c0

fix

837f0cf

Merge pull request #1 from MurrellGroup/lengths-with-gq

0baf5ce

Lengths with gq

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Replace `kpad_mask` with `q_lengths` and `k_lengths` #20

Replace `kpad_mask` with `q_lengths` and `k_lengths` #20

Uh oh!

AntonOresten commented Nov 18, 2025

Uh oh!

AntonOresten commented Nov 18, 2025

Uh oh!

AntonOresten commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Replace kpad_mask with q_lengths and k_lengths #20

Are you sure you want to change the base?

Replace kpad_mask with q_lengths and k_lengths #20

Uh oh!

Conversation

AntonOresten commented Nov 18, 2025

Uh oh!

AntonOresten commented Nov 18, 2025

Uh oh!

AntonOresten commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Replace `kpad_mask` with `q_lengths` and `k_lengths` #20

Replace `kpad_mask` with `q_lengths` and `k_lengths` #20