Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix] Fix value unpack error of simple connector for KVCache transfer. #11058

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ShangmingCai
Copy link
Contributor

@ShangmingCai ShangmingCai commented Dec 10, 2024

Currently the send_kv_caches_and_hidden_states get the value of num_heads and head_size from kv_cache[0].shape, which will raise ValueError when the KVCache is compacted. Since we have model_executable as the function input, we can directly get these values from the model config to avoid this error. Also, we do not need to calculate this for every single layer, so we calculate this only once for all layers.

For example,
when kv_cache[0].shape == torch.Size([2162, 16, 40, 128]), this line works normally,
when kv_cache[0].shape == torch.Size([2162, 81920]), it will raise ValueError.

ERROR 12-03 14:31:48 engine.py:135] ValueError('Error in model execution (input dumped to /tmp/err_execute_model_input_20241203-143148.pkl): not enough values to unpack (expected 4, got 2)')
ERROR 12-03 14:31:48 engine.py:135] Traceback (most recent call last):
ERROR 12-03 14:31:48 engine.py:135] File "/mnt/data_disk101/data_disk/lwq/LLM_INFER/split_platform/opensource/vllm-kuntai-disagg-refactor_1202/vllm/worker/model_runner_base.py", line 116, in _wrapper
ERROR 12-03 14:31:48 engine.py:135] return func(*args, **kwargs)
ERROR 12-03 14:31:48 engine.py:135] File "/mnt/data_disk101/data_disk/lwq/LLM_INFER/split_platform/opensource/vllm-kuntai-disagg-refactor_1202/vllm/worker/model_runner.py", line 1718, in execute_model
ERROR 12-03 14:31:48 engine.py:135] get_kv_transfer_group().send_kv_caches_and_hidden_states(
ERROR 12-03 14:31:48 engine.py:135] File "/mnt/data_disk101/data_disk/lwq/LLM_INFER/split_platform/opensource/vllm-kuntai-disagg-refactor_1202/vllm/distributed/kv_transfer/kv_transfer_agent.py", line 60, in send_kv_caches_and_hidden_states
ERROR 12-03 14:31:48 engine.py:135] self.connector.send_kv_caches_and_hidden_states(
ERROR 12-03 14:31:48 engine.py:135] File "/mnt/data_disk101/data_disk/lwq/LLM_INFER/split_platform/opensource/vllm-kuntai-disagg-refactor_1202/vllm/distributed/kv_transfer/kv_connector/simple_connector.py", line 134, in send_kv_caches_and_hidden_states
ERROR 12-03 14:31:48 engine.py:135] _, _, num_heads, head_size = kv_cache[0].shape
ERROR 12-03 14:31:48 engine.py:135] ValueError: not enough values to unpack (expected 4, got 2)

CC list: @KuntaiDu @youkaichao

Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

Signed-off-by: ShangmingCai <[email protected]>
@youkaichao youkaichao requested a review from KuntaiDu December 10, 2024 19:30
Signed-off-by: ShangmingCai <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant