Skip to content

Commit

Permalink
[GPU] Fix kernels synchronization in PagedAttention operation (#28645)
Browse files Browse the repository at this point in the history
### Details:
- Fix synchronization in PagedAttention operation when KV-cache rotation
is enabled but skipped for the current iteration.
Previously, `dep_events` was always replaced with `res_events` if
`has_rotated_blocks=true`, leading to empty events vector being passed
to the next kernels and causing accuracy deviations in cases of
out_of_order queue due to missing synchronization
  • Loading branch information
sshlyapn authored Jan 23, 2025
1 parent ecdecf3 commit 485833c
Showing 1 changed file with 1 addition and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -346,7 +346,7 @@ struct paged_attention_impl : multi_stage_primitive<paged_attention> {

std::vector<event::ptr> res_events;
std::vector<event::ptr> dep_events = events;
if (has_rotated_blocks) {
if (has_rotated_blocks && !_kernels_data[Stage::KV_CACHE_ROTATE].kernels[0].skip_execution) {
execute_stage(dep_events, instance, res_events, Stage::KV_CACHE_ROTATE, is_mixed_mode);
dep_events = res_events;
}
Expand Down

0 comments on commit 485833c

Please sign in to comment.