Make key optional for rotary embedding #17566

sarckk · 2025-05-01T23:57:33Z

Make key an optional argument for rotary embedding. This flexibility may be needed in cross-layer KV sharing, e.g. Layer-Condensed KV Cache and Cross-Layer Attention, where there is no K to apply rotary embedding on.

Unit tested with:

pytest tests/kernels/core/test_rotary_embedding.py
pytest tests/kernels/core/test_pos_encoding.py

E2E tested with offline inference example both with eager and non-eager.

Note: rotary emb kernel in intel-extension-for-pytorch currently does not support key=None, so falling back to native impl for now, followed up in intel/intel-extension-for-pytorch#821

github-actions · 2025-05-01T23:57:43Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

houseroad · 2025-05-02T06:17:02Z

vllm/model_executor/layers/rotary_embedding.py

-                                         offsets)
+        if key is None:
+            # XPU kernel doesn't support key=None so fall back to native impl
+            # TODO ipex.llm.functional.rotary_embedding_batched support key=None


add the github id here? like TODO([sarckk): xxx

houseroad

Looks good to me.

houseroad · 2025-05-04T07:48:59Z

@sarckk , could you check the failed tests are related to the changes or not?

sarckk · 2025-05-05T17:29:49Z

@sarckk , could you check the failed tests are related to the changes or not?

these don't seem to be related to my changes, they are failing on another merged PR as well: https://buildkite.com/vllm/ci/builds/19273#01969974-4d6d-46ca-ac55-e52bea52d5b8

WoosukKwon

LGTM! A small nit on the return type.

WoosukKwon · 2025-05-06T00:24:23Z

vllm/model_executor/layers/rotary_embedding.py

        return query, key

    def forward_hpu(
        self,
        positions: torch.Tensor,
        query: torch.Tensor,
-        key: torch.Tensor,
+        key: Optional[torch.Tensor] = None,
        offsets: Optional[torch.Tensor] = None,
    ) -> Tuple[torch.Tensor, torch.Tensor]:


nit: Shouldn't we chance the return type?

Suggested change

) -> Tuple[torch.Tensor, torch.Tensor]:

) -> Tuple[torch.Tensor, Optional[torch.Tensor]]:

thanks for the catch, updated! it would affect the typing of downstream call sites though -- is that ok?

WoosukKwon · 2025-05-06T16:21:30Z

@sarckk Can you please merge from main and restart the CI? I'm not sure whether the CI failures are related to the PR. Maybe we should retry.

Signed-off-by: Yong Hoon Shin <[email protected]>

sarckk · 2025-05-06T16:54:39Z

@sarckk Can you please merge from main and restart the CI? I'm not sure whether the CI failures are related to the PR. Maybe we should retry.

I'm pretty sure failures are not related (e.g. I can reproduce the spec decode test failures locally on trunk) but I've rebased on main and kicked off a new run.

WoosukKwon · 2025-05-06T22:22:42Z

@sarckk Hmm... The tests still failed. Could you please take another look?

houseroad · 2025-05-06T22:27:34Z

may be just run it without PR locally?

sarckk · 2025-05-07T00:15:37Z

none of the highlighted test failures are due to the PR

all 3 spec decoding tests fail locally without PR, on commit 6115b115826040ad1f49b69a8b4fdd59f0df5113. #17754 fixes one of them

the examples-test and intel hpu/xpu failures is due to #17426

amd test failures seem to be present before (https://buildkite.com/vllm/ci/builds/19352#0196a33d-129f-4a93-af93-1fb0d1e8c82f)

Signed-off-by: Yong Hoon Shin <[email protected]>

sarckk · 2025-05-07T00:48:47Z

correction: neuron test failure is actually due to my PR, but it's an issue with the test set up. pushed fix

the newly failing distributed-tests-4-gpus test seems like transient infra error, I cannot reproduce it

DarkLight1337 · 2025-05-07T07:11:41Z

The remaining tests are failing on main, so this should be good to go

Signed-off-by: Yong Hoon Shin <[email protected]> Signed-off-by: Mu Huai <[email protected]>

Signed-off-by: Yong Hoon Shin <[email protected]>

Signed-off-by: Yong Hoon Shin <[email protected]> Signed-off-by: Yuqi Zhang <[email protected]>

Signed-off-by: Yong Hoon Shin <[email protected]> Signed-off-by: minpeter <[email protected]>

sarckk requested review from tlrmchlsmth and WoosukKwon as code owners May 1, 2025 23:57

sarckk force-pushed the rotary-emb-key-optional branch from b99fd50 to b681cca Compare May 1, 2025 23:59

sarckk mentioned this pull request May 2, 2025

Make key optional in ipex.llm.functional.rotary_embedding intel/intel-extension-for-pytorch#821

Open

houseroad reviewed May 2, 2025

View reviewed changes

houseroad approved these changes May 3, 2025

View reviewed changes

houseroad added the ready ONLY add when PR is ready to merge/full CI is needed label May 3, 2025

houseroad enabled auto-merge (squash) May 5, 2025 19:33

WoosukKwon approved these changes May 6, 2025

View reviewed changes

auto-merge was automatically disabled May 6, 2025 00:27
Head branch was pushed to by a user without write access

sarckk force-pushed the rotary-emb-key-optional branch from 917e289 to 78486cb Compare May 6, 2025 00:27

tlrmchlsmth approved these changes May 6, 2025

View reviewed changes

sarckk added 3 commits May 6, 2025 09:52

Make key optional for rotary embedding

d33f8f2

Signed-off-by: Yong Hoon Shin <[email protected]>

Add github user id to TODO

c245d16

Signed-off-by: Yong Hoon Shin <[email protected]>

Fix return type for rot embedding

d12c2a4

Signed-off-by: Yong Hoon Shin <[email protected]>

sarckk force-pushed the rotary-emb-key-optional branch from 0b2e344 to d12c2a4 Compare May 6, 2025 16:53

houseroad enabled auto-merge (squash) May 7, 2025 00:38

Fix neuron rotary emb test

8426efb

Signed-off-by: Yong Hoon Shin <[email protected]>

auto-merge was automatically disabled May 7, 2025 00:47
Head branch was pushed to by a user without write access

vllm-bot merged commit 98c89e1 into vllm-project:main May 7, 2025
77 of 80 checks passed

sarckk deleted the rotary-emb-key-optional branch May 9, 2025 19:36

RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025

Make key optional for rotary embedding (vllm-project#17566)

c96265a

Signed-off-by: Yong Hoon Shin <[email protected]> Signed-off-by: Mu Huai <[email protected]>

mawong-amd pushed a commit to ROCm/vllm that referenced this pull request May 14, 2025

Make key optional for rotary embedding (vllm-project#17566)

df4f43c

Signed-off-by: Yong Hoon Shin <[email protected]>

zzzyq pushed a commit to zzzyq/vllm that referenced this pull request May 24, 2025

Make key optional for rotary embedding (vllm-project#17566)

f41ee17

Signed-off-by: Yong Hoon Shin <[email protected]> Signed-off-by: Yuqi Zhang <[email protected]>

minpeter pushed a commit to minpeter/vllm that referenced this pull request Jun 24, 2025

Make key optional for rotary embedding (vllm-project#17566)

0c8af64

Signed-off-by: Yong Hoon Shin <[email protected]> Signed-off-by: minpeter <[email protected]>

tanujtiwari1998 mentioned this pull request Jul 8, 2025

cached tokens completions character-tech/vllm#22

Merged

4 tasks

	) -> Tuple[torch.Tensor, torch.Tensor]:
	) -> Tuple[torch.Tensor, Optional[torch.Tensor]]:

Uh oh!

Make key optional for rotary embedding #17566

Make key optional for rotary embedding #17566

Uh oh!

Conversation

sarckk commented May 1, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 1, 2025

Uh oh!

houseroad May 2, 2025

Choose a reason for hiding this comment

Uh oh!

houseroad left a comment

Choose a reason for hiding this comment

Uh oh!

houseroad commented May 4, 2025

Uh oh!

sarckk commented May 5, 2025

Uh oh!

WoosukKwon left a comment

Choose a reason for hiding this comment

Uh oh!

WoosukKwon May 6, 2025

Choose a reason for hiding this comment

Uh oh!

sarckk May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WoosukKwon commented May 6, 2025

Uh oh!

sarckk commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WoosukKwon commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

houseroad commented May 6, 2025

Uh oh!

sarckk commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sarckk commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DarkLight1337 commented May 7, 2025

Uh oh!

Uh oh!

Uh oh!

sarckk commented May 1, 2025 •

edited by github-actions bot

Loading

sarckk May 6, 2025 •

edited

Loading

sarckk commented May 6, 2025 •

edited

Loading

WoosukKwon commented May 6, 2025 •

edited

Loading

sarckk commented May 7, 2025 •

edited

Loading

sarckk commented May 7, 2025 •

edited

Loading