Skip to content

Make key optional for rotary embedding #17566

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 7, 2025

Conversation

sarckk
Copy link
Collaborator

@sarckk sarckk commented May 1, 2025

Make key an optional argument for rotary embedding. This flexibility may be needed in cross-layer KV sharing, e.g. Layer-Condensed KV Cache and Cross-Layer Attention, where there is no K to apply rotary embedding on.

Unit tested with:

pytest tests/kernels/core/test_rotary_embedding.py
pytest tests/kernels/core/test_pos_encoding.py

E2E tested with offline inference example both with eager and non-eager.

Note: rotary emb kernel in intel-extension-for-pytorch currently does not support key=None, so falling back to native impl for now, followed up in intel/intel-extension-for-pytorch#821

Copy link

github-actions bot commented May 1, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

offsets)
if key is None:
# XPU kernel doesn't support key=None so fall back to native impl
# TODO ipex.llm.functional.rotary_embedding_batched support key=None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the github id here? like TODO([sarckk): xxx

Copy link
Collaborator

@houseroad houseroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@houseroad houseroad added the ready ONLY add when PR is ready to merge/full CI is needed label May 3, 2025
@houseroad
Copy link
Collaborator

@sarckk , could you check the failed tests are related to the changes or not?

@sarckk
Copy link
Collaborator Author

sarckk commented May 5, 2025

@sarckk , could you check the failed tests are related to the changes or not?

these don't seem to be related to my changes, they are failing on another merged PR as well: https://buildkite.com/vllm/ci/builds/19273#01969974-4d6d-46ca-ac55-e52bea52d5b8

@houseroad houseroad enabled auto-merge (squash) May 5, 2025 19:33
Copy link
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! A small nit on the return type.

return query, key

def forward_hpu(
self,
positions: torch.Tensor,
query: torch.Tensor,
key: torch.Tensor,
key: Optional[torch.Tensor] = None,
offsets: Optional[torch.Tensor] = None,
) -> Tuple[torch.Tensor, torch.Tensor]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Shouldn't we chance the return type?

Suggested change
) -> Tuple[torch.Tensor, torch.Tensor]:
) -> Tuple[torch.Tensor, Optional[torch.Tensor]]:

Copy link
Collaborator Author

@sarckk sarckk May 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the catch, updated! it would affect the typing of downstream call sites though -- is that ok?

auto-merge was automatically disabled May 6, 2025 00:27

Head branch was pushed to by a user without write access

@sarckk sarckk force-pushed the rotary-emb-key-optional branch from 917e289 to 78486cb Compare May 6, 2025 00:27
@WoosukKwon
Copy link
Collaborator

@sarckk Can you please merge from main and restart the CI? I'm not sure whether the CI failures are related to the PR. Maybe we should retry.

sarckk added 3 commits May 6, 2025 09:52
@sarckk sarckk force-pushed the rotary-emb-key-optional branch from 0b2e344 to d12c2a4 Compare May 6, 2025 16:53
@sarckk
Copy link
Collaborator Author

sarckk commented May 6, 2025

@sarckk Can you please merge from main and restart the CI? I'm not sure whether the CI failures are related to the PR. Maybe we should retry.

I'm pretty sure failures are not related (e.g. I can reproduce the spec decode test failures locally on trunk) but I've rebased on main and kicked off a new run.

@WoosukKwon
Copy link
Collaborator

WoosukKwon commented May 6, 2025

@sarckk Hmm... The tests still failed. Could you please take another look?

@houseroad
Copy link
Collaborator

may be just run it without PR locally?

@sarckk
Copy link
Collaborator Author

sarckk commented May 7, 2025

none of the highlighted test failures are due to the PR

all 3 spec decoding tests fail locally without PR, on commit 6115b115826040ad1f49b69a8b4fdd59f0df5113. #17754 fixes one of them

the examples-test and intel hpu/xpu failures is due to #17426

amd test failures seem to be present before (https://buildkite.com/vllm/ci/builds/19352#0196a33d-129f-4a93-af93-1fb0d1e8c82f)

@houseroad houseroad enabled auto-merge (squash) May 7, 2025 00:38
Signed-off-by: Yong Hoon Shin <[email protected]>
auto-merge was automatically disabled May 7, 2025 00:47

Head branch was pushed to by a user without write access

@sarckk
Copy link
Collaborator Author

sarckk commented May 7, 2025

correction: neuron test failure is actually due to my PR, but it's an issue with the test set up. pushed fix

the newly failing distributed-tests-4-gpus test seems like transient infra error, I cannot reproduce it

@DarkLight1337
Copy link
Member

The remaining tests are failing on main, so this should be good to go

@vllm-bot vllm-bot merged commit 98c89e1 into vllm-project:main May 7, 2025
77 of 80 checks passed
@sarckk sarckk deleted the rotary-emb-key-optional branch May 9, 2025 19:36
RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025
mawong-amd pushed a commit to ROCm/vllm that referenced this pull request May 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants