pytorch-labs / attention-gym Public

Notifications You must be signed in to change notification settings
Fork 27
Star 549

Code
Issues 41
Pull requests 1
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: pytorch-labs/attention-gym

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

41 Open 34 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Doc mask returns negative sparsity

#93 opened Dec 21, 2024 by staghado

question about masking

#92 opened Dec 18, 2024 by esason

Short vs long sequences performance question

Further information is requested

#89 opened Dec 12, 2024 by francoishernandez

[Inquiry] Document Masking and Assigning Different Weights

#88 opened Dec 12, 2024 by yeahjack

flexattn with qwen2

#81 opened Nov 18, 2024 by NonvolatileMemory

Flex attention with dropout

#77 opened Nov 13, 2024 by zbh2047

Flex attention - gaps in profiler

#76 opened Nov 11, 2024 by tugot17

Rope2d

#75 opened Nov 11, 2024 by bhack

How to implement Bidirectional Alibi with padding using flex attention?

#74 opened Nov 7, 2024 by sphmel

Is there any chance to call backward function dircetly instead of using pytorch autograd mechanism?

#73 opened Nov 7, 2024 by MayDomine

Block Size when Q_LEN and KV_LEN are different

#71 opened Nov 4, 2024 by johng149

NotImplementedError: There was no rule registered for HOP flex_attention and mode

#70 opened Nov 2, 2024 by LeoXinhaoLee

How to manually check if one position or row has correct masking?

#66 opened Oct 28, 2024 by Leo-T-Zang

Selection of BLOCK_SIZE in create_block_mask

#65 opened Oct 23, 2024 by tsrikris

How to reason about efficiency of different score/mask mod functions

#63 opened Oct 22, 2024 by alex-hh

FlexAttention Output Differs from SDPA

#62 opened Oct 22, 2024 by chayut-t

How to do KV Cache with FlexAttention and BlockMask by slicing?

#60 opened Oct 21, 2024 by Leo-T-Zang

A simple adaption to Jax

#59 opened Oct 21, 2024 by zinccat

What is the best practice to save and load a BlockMask object?

#58 opened Oct 20, 2024 by complexfilter

Optimal ordering with block mask

#56 opened Oct 19, 2024 by francois-rozet

What is the expected gpu memory performance drop wrt flash attention with block masks?

#54 opened Oct 19, 2024 by arilato

FlexAttention results do not match FlashAttention results

#50 opened Oct 7, 2024 by tilmto

Two errors: (1) NameError: ModularIndexing is not defined & (2) LoweringException: AttributeError: 'View' object has no attribute 'get_stride'

#45 opened Sep 23, 2024 by tobiasvanderwerff

Distributed Attention Methods question

Further information is requested

#44 opened Sep 20, 2024 by tsrikris

CUDA OOM Issue When Using Approx Tanh with softcapping score mod

#43 opened Sep 18, 2024 by kebijuelun

Previous 1 2 Next

Previous Next

ProTip! Updated in the last three days: updated:>2024-12-22.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly