upate user guide according to review comments

lvhan028 · Nov 19, 2023 · cbe7108 · cbe7108
1 parent 2c1a466
commit cbe7108
Show file tree

Hide file tree

Showing 2 changed files with 13 additions and 1 deletion.
diff --git a/docs/en/turbomind_config.md b/docs/en/turbomind_config.md
@@ -88,7 +88,13 @@ For the llama2-7b model, when storing k/v as the `half` type, the memory of a k/
 The meaning of `cache_max_entry_count` varies depending on its value:
 
 - When it's a decimal between (0, 1), `cache_max_entry_count` represents the percentage of memory used by k/v blocks. For example, if turbomind launches on a A100-80G GPU with `cache_max_entry_count` being `0.5`, the total memory used by the k/v blocks is `80 * 0.5 = 40G`.
-- When it's an integer no less than 1, it represents the number of k/v blocks
+- When it's an integer > 0, it represents the total number of k/v blocks
+
+The `cache_chunk_size` indicates the size of the k/v cache chunk to be allocated each time new k/v cache blocks are needed. Different values represent different meanings:
+
+- When it is an integer > 0, `cache_chunk_size` number of k/v cache blocks are allocated.
+- When the value is -1, `cache_max_entry_count` number of k/v cache blocks are allocated.
+- When the value is 0, `sqrt(cache_max_entry_count)` number of k/v cache blocks are allocated.
 
 ### kv int8 switch
 

diff --git a/docs/zh_cn/turbomind_config.md b/docs/zh_cn/turbomind_config.md
@@ -92,6 +92,12 @@ cache_block_seq_len * num_layer * kv_head_num * size_per_head * 2 * sizeof(kv_da
 - 当值为 (0, 1) 之间的小数时，`cache_max_entry_count` 表示 k/v block 使用的内存百分比。比如 A100-80G 显卡内存是80G，当`cache_max_entry_count`为0.5时，表示 k/v block 使用的内存总量为 80 * 0.5 = 40G
 - 当值为 > 1的整数时，表示 k/v block 数量
 
+`cache_chunk_size` 表示在每次需要新的 k/v cache 块时，开辟 k/v cache 块的大小。不同的取值，表示不同的含义：
+
+- 当为 > 0 的整数时，开辟 `cache_chunk_size` 个 k/v cache 块
+- 当值为 -1 时，开辟 `cache_max_entry_count` 个 k/v cache 块
+- 当值为 0 时，时，开辟 `sqrt(cache_max_entry_count)` 个 k/v cache 块
+
 ### kv int8 开关
 
 `quant_policy`是 KV-int8 推理开关。具体使用方法，请参考 [kv int8](./kv_int8.md) 部署文档