Skip to content

An api to query local language models using different backends

License

Notifications You must be signed in to change notification settings

emencia/locallm

Repository files navigation

Locallm

pub package

An api to query local language models using different backends. Supported backends:

Quickstart

pip install locallm

Local

from locallm import LocalLm, InferenceParams, LmParams

lm = LocalLm(
    LmParams(
        models_dir="/home/me/my/models/dir"
    )
)
lm.load_model("mistral-7b-instruct-v0.1.Q4_K_M.gguf", 8192)
template = "<s>[INST] {prompt} [/INST]"
lm.infer(
    "list the planets in the solar system",
    InferenceParams(
        template=template,
        temperature=0.2,
        stream=True,
        max_tokens=512,
    ),
)

Koboldcpp

from locallm import KoboldcppLm, LmParams, InferenceParams

lm = KoboldcppLm(
    LmParams(is_verbose=True)
)
lm.load_model("", 8192) # sets the context window size to 8196 tokens
template = "<s>[INST] {prompt} [/INST]"
lm.infer(
    "list the planets in the solar system",
    InferenceParams(
        template=template,
        stream=True,
        max_tokens=512,
    ),
)

Ollama

from locallm import OllamaLm, LmParams, InferenceParams

lm = OllamaLm(
    LmParams(is_verbose=True)
)
lm.load_model("mistral-7b-instruct-v0.1.Q4_K_M.gguf", 8192)
template = "<s>[INST] {prompt} [/INST]"
lm.infer(
    "list the planets in the solar system",
    InferenceParams(
        stream=True,
        template=template,
        temperature=0.5,
    ),
)

Examples

Providers:

Other:

  • Cli: a Python terminal client
  • Autodoc: generate docstrings from code

Api

LmProvider

An abstract base class to describe a language model provider. All the providers implement this api

Attributes

  • llm Optional[Llama]: the language model.
  • models_dir str: the directory where the models are stored.
  • api_key str: the API key for the language model.
  • server_url str: the URL of the language model server.
  • is_verbose bool: whether to print more information.
  • threads Optional[int]: the numbers of threads to use.
  • gpu_layers Optional[int]: the numbers of layers to offload to the GPU.
  • embedding Optional[bool]: use embeddings or not.
  • on_token OnTokenType: the function to be called when a token is generated. Default: outputs the token to the terminal.
  • on_start_emit OnStartEmitType: the function to be called when the model starts emitting tokens.

Example

lm = OllamaLm(LmParams(is_verbose=True))

Methods:

__init__

Constructs all the necessary attributes for the LmProvider object.

Parameters

  • params LmParams: the parameters for the language model.

Example

lm = KoboldcppLm(LmParams())

load_model

Loads a language model.

Parameters

  • model_name str: The name of the model to load.
  • ctx int: The context window size for the model.
  • gpu_layers Optional[int]: The number of layers to offload to the GPU for the model.

Example

lm.load_model("my_model.gguf", 2048, 32)

infer

Run an inference query.

Parameters

  • prompt str: the prompt to generate text from.
  • params InferenceParams: the parameters for the inference query.

Returns

  • result InferenceResult: the generated text and stats

Example

>>> lm.infer("<s>[INST] List the planets in the solar system [/INST>")
The planets in the solar system are: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune.

Types

InferenceParams

Parameters for inference.

Args

  • stream bool, Optional: Whether to stream the output.
  • template str, Optional: The template to use for the inference.
  • threads int, Optional: The number of threads to use for the inference.
  • max_tokens int, Optional: The maximum number of tokens to generate.
  • temperature float, Optional: The temperature for the model.
  • top_p float, Optional: The probability cutoff for the top k tokens.
  • top_k int, Optional: The top k tokens to generate.
  • min_p float, Optional: The minimum probability for a token to be considered.
  • stop List[str], Optional: A list of words to stop the model from generating.
  • frequency_penalty float, Optional: The frequency penalty for the model.
  • presence_penalty float, Optional: The presence penalty for the model.
  • repeat_penalty float, Optional: The repeat penalty for the model.
  • tfs float, Optional: The temperature for the model.
  • grammar str, Optional: A gbnf grammar to constraint the model's output

Example

InferenceParams(stream=True, template="<s>[INST] {prompt} [/INST>")
{
    "stream": True,
    "template": "<s>[INST] {prompt} [/INST>"
}

LmParams

Parameters for language model.

Args

  • models_dir str, Optional: The directory containing the language model.
  • api_key str, Optional: The API key for the language model.
  • server_url str, Optional: The server URL for the language model.
  • is_verbose bool, Optional: Whether to enable verbose output.
  • on_token Callable[[str], None], Optional: A callback function to be called on each token generated. If not provided the default will output tokens to the command line as they arrive
  • on_start_emit Callable[[Optional[Any]], None], Optional: A callback function to be called on the start of the emission.

Example

LmParams(
    models_dir="/home/me/models",
    api_key="abc123",
)

Tests

To configure the tests create a tests/localconf.py containing the some local config info to run the tests:

# absolute path to your models dir
MODELS_DIR = "/home/me/my/models/dir"
# the model to use in the tests
MODEL = "q5_1-gguf-mamba-gpt-3B_v4.gguf"
# the context window size for the tests
CTX = 2048

Be sure to have the corresponding backend up before running a test.

About

An api to query local language models using different backends

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published