Description
I run python3 -m llama_cpp.server
in order to call the API from my scripts.
I'd like to implement prompt caching (like I can do in llama-cpp), but the command line options that work for llama-cpp server don't work for this project.
I search the docs and couldn't find docs on the command line options that would work.
After an error trying random command line options, I did get this output on the command line:
/home/arthur/.local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:127: UserWarning: Field "model_alias" has conflict with protected namespace "model_".
You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ('settings_',)`.
warnings.warn(
usage: __main__.py [-h] [--model MODEL] [--model_alias MODEL_ALIAS] [--n_ctx N_CTX] [--n_gpu_layers N_GPU_LAYERS] [--tensor_split TENSOR_SPLIT]
[--rope_freq_base ROPE_FREQ_BASE] [--rope_freq_scale ROPE_FREQ_SCALE] [--seed SEED] [--n_batch N_BATCH] [--n_threads N_THREADS]
[--f16_kv F16_KV] [--use_mlock USE_MLOCK] [--use_mmap USE_MMAP] [--embedding EMBEDDING] [--low_vram LOW_VRAM]
[--last_n_tokens_size LAST_N_TOKENS_SIZE] [--logits_all LOGITS_ALL] [--cache CACHE] [--cache_type CACHE_TYPE] [--cache_size CACHE_SIZE]
[--vocab_only VOCAB_ONLY] [--verbose VERBOSE] [--host HOST] [--port PORT] [--interrupt_requests INTERRUPT_REQUESTS] [--n_gqa N_GQA]
[--rms_norm_eps RMS_NORM_EPS] [--mul_mat_q MUL_MAT_Q]
From which I can see these look like what I'm looking for:
[--cache CACHE] [--cache_type CACHE_TYPE] [--cache_size CACHE_SIZE]
However:
- I have no idea what the format for CACHE, CACHE_TYPE and CACHE_SIZE or, or the precise meaning/effect of each option.
- I would be very interrested in knowing what the othe options mean also.
Is there any documentation anywhere of what these mean/how to use them?
( following the exact same format/names as llamma cpp might be a good idea wherever possible btw, it would have enabled me to get this to work without bothering you, as using the llama cpp formats/options is the first thing I tried)..
Thanks a lot for any possible help.
Best regards.