Locallm

An api to query local language models using different backends. Supported backends:

Llama.cpp Python: the local Python bindings for Llama.cpp
Kobold.cpp: the Koboldcpp api server
Ollama: the Ollama api server

Quickstart

pip install locallm

Local

from locallm import LocalLm, InferenceParams, LmParams

lm = LocalLm(
    LmParams(
        models_dir="/home/me/my/models/dir"
    )
)
lm.load_model("mistral-7b-instruct-v0.1.Q4_K_M.gguf", 8192)
template = "<s>[INST] {prompt} [/INST]"
lm.infer(
    "list the planets in the solar system",
    InferenceParams(
        template=template,
        temperature=0.2,
        stream=True,
        max_tokens=512,
    ),
)

Koboldcpp

from locallm import KoboldcppLm, LmParams, InferenceParams

lm = KoboldcppLm(
    LmParams(is_verbose=True)
)
lm.load_model("", 8192) # sets the context window size to 8196 tokens
template = "<s>[INST] {prompt} [/INST]"
lm.infer(
    "list the planets in the solar system",
    InferenceParams(
        template=template,
        stream=True,
        max_tokens=512,
    ),
)

Ollama

from locallm import OllamaLm, LmParams, InferenceParams

lm = OllamaLm(
    LmParams(is_verbose=True)
)
lm.load_model("mistral-7b-instruct-v0.1.Q4_K_M.gguf", 8192)
template = "<s>[INST] {prompt} [/INST]"
lm.infer(
    "list the planets in the solar system",
    InferenceParams(
        stream=True,
        template=template,
        temperature=0.5,
    ),
)

Examples

Providers:

Llama.cpp Python provider
Kobold.cpp provider
Ollama provider

Other:

Cli: a Python terminal client
Autodoc: generate docstrings from code

Api

LmProvider

An abstract base class to describe a language model provider. All the providers implement this api

Attributes

llm Optional[Llama]: the language model.
models_dir str: the directory where the models are stored.
api_key str: the API key for the language model.
server_url str: the URL of the language model server.
is_verbose bool: whether to print more information.
threads Optional[int]: the numbers of threads to use.
gpu_layers Optional[int]: the numbers of layers to offload to the GPU.
embedding Optional[bool]: use embeddings or not.
on_token OnTokenType: the function to be called when a token is generated. Default: outputs the token to the terminal.
on_start_emit OnStartEmitType: the function to be called when the model starts emitting tokens.

Example

lm = OllamaLm(LmParams(is_verbose=True))

Methods:

`init`

Constructs all the necessary attributes for the LmProvider object.

Parameters

params LmParams: the parameters for the language model.

Example

lm = KoboldcppLm(LmParams())

`load_model`

Loads a language model.

Parameters

model_name str: The name of the model to load.
ctx int: The context window size for the model.
gpu_layers Optional[int]: The number of layers to offload to the GPU for the model.

Example

lm.load_model("my_model.gguf", 2048, 32)

`infer`

Run an inference query.

Parameters

prompt str: the prompt to generate text from.
params InferenceParams: the parameters for the inference query.

Returns

result InferenceResult: the generated text and stats

Example

>>> lm.infer("<s>[INST] List the planets in the solar system [/INST>")
The planets in the solar system are: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune.

Types

InferenceParams

Parameters for inference.

Args

stream bool, Optional: Whether to stream the output.
template str, Optional: The template to use for the inference.
threads int, Optional: The number of threads to use for the inference.
max_tokens int, Optional: The maximum number of tokens to generate.
temperature float, Optional: The temperature for the model.
top_p float, Optional: The probability cutoff for the top k tokens.
top_k int, Optional: The top k tokens to generate.
min_p float, Optional: The minimum probability for a token to be considered.
stop List[str], Optional: A list of words to stop the model from generating.
frequency_penalty float, Optional: The frequency penalty for the model.
presence_penalty float, Optional: The presence penalty for the model.
repeat_penalty float, Optional: The repeat penalty for the model.
tfs float, Optional: The temperature for the model.
grammar str, Optional: A gbnf grammar to constraint the model's output

Example

InferenceParams(stream=True, template="<s>[INST] {prompt} [/INST>")
{
    "stream": True,
    "template": "<s>[INST] {prompt} [/INST>"
}

LmParams

Parameters for language model.

Args

models_dir str, Optional: The directory containing the language model.
api_key str, Optional: The API key for the language model.
server_url str, Optional: The server URL for the language model.
is_verbose bool, Optional: Whether to enable verbose output.
on_token Callable[[str], None], Optional: A callback function to be called on each token generated. If not provided the default will output tokens to the command line as they arrive
on_start_emit Callable[[Optional[Any]], None], Optional: A callback function to be called on the start of the emission.

Example

LmParams(
    models_dir="/home/me/models",
    api_key="abc123",
)

Tests

To configure the tests create a tests/localconf.py containing the some local config info to run the tests:

# absolute path to your models dir
MODELS_DIR = "/home/me/my/models/dir"
# the model to use in the tests
MODEL = "q5_1-gguf-mamba-gpt-3B_v4.gguf"
# the context window size for the tests
CTX = 2048

Be sure to have the corresponding backend up before running a test.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
examples		examples
locallm		locallm
tests		tests
.gitignore		.gitignore
LICENCE.txt		LICENCE.txt
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Locallm

Quickstart

Local

Koboldcpp

Ollama

Examples

Api

LmProvider

Attributes

Example

`init`

Parameters

Example

`load_model`

Parameters

Example

`infer`

Parameters

Returns

Example

Types

InferenceParams

Args

Example

LmParams

Args

Example

Tests

About

Releases

Packages

Contributors 2

Languages

License

emencia/locallm

Folders and files

Latest commit

History

Repository files navigation

Locallm

Quickstart

Local

Koboldcpp

Ollama

Examples

Api

LmProvider

Attributes

Example

__init__

Parameters

Example

load_model

Parameters

Example

infer

Parameters

Returns

Example

Types

InferenceParams

Args

Example

LmParams

Args

Example

Tests

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

`init`

`load_model`

`infer`

Packages