LLM/vLLM at main · rtahmasbi/LLM

History

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md

README.md

https://www.hopsworks.ai/dictionary/vllm

The attention mechanism allows LLMs to focus on relevant parts of the input sequence while generating output/response. Inside the attention mechanism, the attention scores for all input tokens need to be calculated. Existing systems store KV pairs in contiguous memory spaces, limiting memory sharing and leading to inefficient memory management.

PagedAttention is an attention algorithm inspired by the concept of paging in operating systems. It allows storing continuous KV pairs in non-contiguous memory space by partitioning the KV cache of each sequence into KV block tables.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vLLM

vLLM

README.md

Files

vLLM

Directory actions

More options

Directory actions

More options

Latest commit

History

vLLM

Folders and files

parent directory

README.md