Replies: 1 comment
-
Hey Alex, the buffer management should be more clear in recent versions. However, I would recommend looking at the PyTorch backend which is the default since v1.0.
Correct.
Correct. These have been refactored into
The idea here is that inputs and outputs are only valid for a specific iteration. There can be multiple batches each with their own inputs and outputs.
Not sure. I think they are needed to store additional information that is not present in decoder-only models.
req slot / batch slot is an identifier which maps a request to a specific resource slot. The slot is persistent for the whole execution of the request.
Multiple batches (each with their own buffers) are used in |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I was trying to modify TRT LLM code to make a specific thing for my model, that’s is not implemented, and was quite confused with amount of buffers that’s existed can you please explain the structure of them?
Beta Was this translation helpful? Give feedback.
All reactions