-
Notifications
You must be signed in to change notification settings - Fork 346
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[PyTorch] Adjusted the logic of MHA and DPA to enable speculative dec…
…oding (#668) * Modified MHA and DPA logic to use causal softmax and FA for inference Signed-off-by: Oleg Goncharov <[email protected]> * Adjusted unfused attention and softmax logic for inference Signed-off-by: Oleg Goncharov <[email protected]> * Cleaned up the code per pylint Signed-off-by: Oleg Goncharov <[email protected]> * Added test cases to evaluate numerics of incremental decoding Signed-off-by: Oleg Goncharov <[email protected]> * Apply suggestions from code review Co-authored-by: Tim Moon <[email protected]> Signed-off-by: Oleg Goncharov <[email protected]> * Apply suggestions from code review [sequence start-end] Co-authored-by: Tim Moon <[email protected]> Signed-off-by: Oleg Goncharov <[email protected]> * Apply suggestions from code review [inference_params offset update]] Co-authored-by: Tim Moon <[email protected]> Signed-off-by: Oleg Goncharov <[email protected]> * Fixed bug in KV-cache indices and updated test suite Signed-off-by: Oleg Goncharov <[email protected]> * Added inference_params description and applied suggestions from the code review Signed-off-by: Oleg Goncharov <[email protected]> * Adjusted absolute tolerances in numerics tests Signed-off-by: Oleg Goncharov <[email protected]> * Cleaned up the files per pylint Signed-off-by: Oleg Goncharov <[email protected]> --------- Signed-off-by: Oleg Goncharov <[email protected]> Signed-off-by: Oleg Goncharov <[email protected]> Co-authored-by: Przemyslaw Tredak <[email protected]> Co-authored-by: Tim Moon <[email protected]> Co-authored-by: Kirthi Shankar Sivamani <[email protected]>
- Loading branch information
1 parent
728e335
commit b459ccc
Showing
3 changed files
with
243 additions
and
85 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.