Skip to content

Documentation of server command line parameters. #635

Open
@arthurwolf

Description

@arthurwolf

I run python3 -m llama_cpp.server in order to call the API from my scripts.

I'd like to implement prompt caching (like I can do in llama-cpp), but the command line options that work for llama-cpp server don't work for this project.

I search the docs and couldn't find docs on the command line options that would work.

After an error trying random command line options, I did get this output on the command line:

/home/arthur/.local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:127: UserWarning: Field "model_alias" has conflict with protected namespace "model_".

You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ('settings_',)`.
  warnings.warn(
usage: __main__.py [-h] [--model MODEL] [--model_alias MODEL_ALIAS] [--n_ctx N_CTX] [--n_gpu_layers N_GPU_LAYERS] [--tensor_split TENSOR_SPLIT]
                   [--rope_freq_base ROPE_FREQ_BASE] [--rope_freq_scale ROPE_FREQ_SCALE] [--seed SEED] [--n_batch N_BATCH] [--n_threads N_THREADS]
                   [--f16_kv F16_KV] [--use_mlock USE_MLOCK] [--use_mmap USE_MMAP] [--embedding EMBEDDING] [--low_vram LOW_VRAM]
                   [--last_n_tokens_size LAST_N_TOKENS_SIZE] [--logits_all LOGITS_ALL] [--cache CACHE] [--cache_type CACHE_TYPE] [--cache_size CACHE_SIZE]
                   [--vocab_only VOCAB_ONLY] [--verbose VERBOSE] [--host HOST] [--port PORT] [--interrupt_requests INTERRUPT_REQUESTS] [--n_gqa N_GQA]
                   [--rms_norm_eps RMS_NORM_EPS] [--mul_mat_q MUL_MAT_Q]

From which I can see these look like what I'm looking for:

[--cache CACHE] [--cache_type CACHE_TYPE] [--cache_size CACHE_SIZE]

However:

  1. I have no idea what the format for CACHE, CACHE_TYPE and CACHE_SIZE or, or the precise meaning/effect of each option.
  2. I would be very interrested in knowing what the othe options mean also.

Is there any documentation anywhere of what these mean/how to use them?

( following the exact same format/names as llamma cpp might be a good idea wherever possible btw, it would have enabled me to get this to work without bothering you, as using the llama cpp formats/options is the first thing I tried)..

Thanks a lot for any possible help.

Best regards.

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions