Skip to content

Clarification on the OP_CPY operation src0->src1 #1314

Answered by ggerganov
josemonsalve2 asked this question in Q&A
Discussion options

You must be logged in to vote

The code that you are looking is for the backward pass. It's not used during inference.

The update of the KV cache in all inference graphs is indeed a bit tricky. We basically have a large KV buffer and for each batch we update a small section of it - this is the cpy into a view (i.e. the left red arrow in your picture).

Later in the same graph, we need to use a larger portion of the KV buffer. So we make another view and use that - this is the right arrow in the picture.

Indeed, there is no explicit dependency stated here. We solve this by applying a ggml_build_forward_expand() after the update of the KV cache to guarantee that all operations up to that point would be performed before th…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@josemonsalve2
Comment options

Answer selected by josemonsalve2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants