From e3e7024f0df633746b94f19d80302ad19bdac885 Mon Sep 17 00:00:00 2001 From: Daniel J Walsh Date: Fri, 20 Sep 2024 14:26:50 -0400 Subject: [PATCH] Add more information to man pages and readme Signed-off-by: Daniel J Walsh --- README.md | 109 +++++++++++++++++++++++++++++++++------------ docs/ramalama.1.md | 57 ++++++++++++++++++------ 2 files changed, 125 insertions(+), 41 deletions(-) diff --git a/README.md b/README.md index 8f8fa27e..bd2c22a1 100644 --- a/README.md +++ b/README.md @@ -4,12 +4,56 @@ The Ramalama project's goal is to make working with AI boring through the use of OCI containers. On first run Ramalama inspects your system for GPU support, falling back to CPU -support if no GPUs are present. It then uses container engines like Podman to -pull the appropriate OCI image with all of the software necessary to run an -AI Model for your systems setup. This eliminates the need for the user to -configure the system for AI themselves. After the initialization, Ramalama +support if no GPUs are present. It then uses container engines like Podman or +Docker to pull the appropriate OCI image with all of the software necessary to +run an AI Model for your systems setup. This eliminates the need for the user +to configure the system for AI themselves. After the initialization, Ramalama will run the AI Models within a container based on the OCI image. +Ramalama supports multiple AI model registries types called transports. +Supported transports: + + +## TRANSPORTS + +| Transports | Web Site | +| ------------- | --------------------------------------------------- | +| HuggingFace | [`huggingface.co`](https://www.huggingface.co) | +| Ollama | [`ollama.com`](https://www.ollama.com) | +| OCI Container Registries | [`opencontainers.org`](https://opencontainers.org)| +||Examples: [`quay.io`](https://quay.io), [`Docker Hub`](https://docker.io), and [`Artifactory`](https://artifactory.com)| + +The ramalama uses the Ollama registry transport by default. Use the RAMALAMA_TRANSPORTS environment variable to modify the default. `export RAMALAMA_TRANSPORT=huggingface` Changes RamaLama to use huggingface transport. + +Individual model transports can be modifies when specifying a model via the `huggingface://`, `oci://`, or `ollama://` prefix. + +ramalama pull `huggingface://`afrideva/Tiny-Vicuna-1B-GGUF/tiny-vicuna-1b.q2_k.gguf + +To make it easier for users, ramalama uses shortname files, which container +alias names for fully specified AI Models allowing users to specify the shorter +names when referring to models. ramalama reads shortnames.conf files if they +exist . These files contain a list of name value pairs for specification of +the model. The following table specifies the order which Ramama reads the files +. Any duplicate names that exist override previously defined shortnames. + +| Shortnames type | Path | +| --------------- | ---------------------------------------- | +| Distribution | /usr/share/ramalama/shortnames.conf | +| Administrators | /etc/ramamala/shortnames.conf | +| Users | $HOME/.config/ramalama/shortnames.conf | + +```code +$ cat /usr/share/ramalama/shortnames.conf +[shortnames] + "tiny" = "ollama://tinyllama" + "granite" = "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf" + "granite:7b" = "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf" + "ibm/granite" = "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf" + "merlinite" = "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf" + "merlinite:7b" = "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf" +... +``` + ## Install Install Ramalama by running this one-liner (on macOS run without sudo): @@ -40,6 +84,7 @@ curl -fsSL https://raw.githubusercontent.com/containers/ramalama/s/install.py | | Command | Description | | ------------------------------------------------------ | ---------------------------------------------------------- | +| [ramalama(1)](docs/ramalama.1.md) | Primary ramalama man page. | | [ramalama-containers(1)](docs/ramalama-containers.1.md)| List all ramalama containers. | | [ramalama-list(1)](docs/ramalama-list.1.md) | List all AI models in local storage. | | [ramalama-login(1)](docs/ramalama-login.1.md) | Login to remote model registry. | @@ -111,10 +156,18 @@ $ ramalama pull granite-code ### Serving Models -You can `serve` a chatbot on a model using the `serve` command. By default, it pulls from the ollama registry. +You can `serve` multiple models using the `serve` command. By default, it pulls from the ollama registry. + +``` +$ ramalama serve --name mylama llama3 +``` + +### Stopping servers + +You can stop a running model if it is running in a container. ``` -$ ramalama serve llama3 +$ ramalama stop mylama ``` ## Diagram @@ -125,28 +178,28 @@ $ ramalama serve llama3 | ramalama run granite-code | | | +-------+-------------------+ - | - | - | +------------------+ - | | Pull model layer | - +----------------------------------------->| granite-code | - +------------------+ - | Repo options: | - +-+-------+------+-+ - | | | - v v v - +---------+ +------+ +----------+ - | Hugging | | quay | | Ollama | - | Face | | | | Registry | - +-------+-+ +---+--+ +-+--------+ - | | | - v v v - +------------------+ - | Start with | - | llama.cpp and | - | granite-code | - | model | - +------------------+ + | + | + | +------------------+ + | | Pull model layer | + +----------------------------------------->| granite-code | + +------------------+ + | Repo options: | + +-+-------+------+-+ + | | | + v v v + +---------+ +------+ +----------+ + | Hugging | | quay | | Ollama | + | Face | | | | Registry | + +-------+-+ +---+--+ +-+--------+ + | | | + v v v + +------------------+ + | Start with | + | llama.cpp and | + | granite-code | + | model | + +------------------+ ``` ## In development diff --git a/docs/ramalama.1.md b/docs/ramalama.1.md index a220620c..ba717857 100644 --- a/docs/ramalama.1.md +++ b/docs/ramalama.1.md @@ -7,31 +7,62 @@ ramalama - Simple management tool for working with AI Models **ramalama** [*options*] *command* ## DESCRIPTION -Ramalama : The goal of ramalama is to make AI boring. Ramalama can pull an AI -Model from model registires and start a chatbot or serve as a rest API from a -simple single command. It treats Models similar to the way that Podman or -Docker treat container images. +Ramalama : The goal of ramalama is to make AI boring. -Ramalama runs models with a specially designed container image containing all -of the tooling required to run the Model. Users d ont need to pre-configure -the host system. +On first run Ramalama inspects your system for GPU support, falling back to CPU +support if no GPUs are present. It then uses container engines like Podman or +Docker to pull the appropriate OCI image with all of the software necessary to run an +AI Model for your systems setup. This eliminates the need for the user to +configure the system for AI themselves. After the initialization, Ramalama +will run the AI Models within a container based on the OCI image. -Ramalama supports multiple model registries types called transports. +Ramalama first pulls AI Models from model registires. It then start a chatbot +or a service as a rest API from a simple single command. Models are treated similarly +to the way that Podman or Docker treat container images. + +Ramalama supports multiple AI model registries types called transports. Supported transports: -* HuggingFace : [`huggingface.co`](https://www.huggingface.co) -* Ollama : [`ollama.com`](https://www.ollama.com) +## TRANSPORTS -* OCI : [`opencontainers.org`](https://opencontainers.org) -(quay.io, docker.io, Artifactory) +| Transports | Web Site | +| ------------- | --------------------------------------------------- | +| HuggingFace | [`huggingface.co`](https://www.huggingface.co) | +| Ollama | [`ollama.com`](https://www.ollama.com) | +| OCI Container Registries | [`opencontainers.org`](https://opencontainers.org)| +||Examples: [`quay.io`](https://quay.io), [`Docker Hub`](https://docker.io), and [`Artifactory`](https://artifactory.com)| -RamaLama uses the OCI registry transport by default. Use the RAMALAMA_TRANSPORTS environment variable to modify the default. `export RAMALAMA_TRANSPORT=ollama` Changes RamaLama to use ollama transport. +The ramalama uses the Ollama registry transport by default. Use the RAMALAMA_TRANSPORTS environment variable to modify the default. `export RAMALAMA_TRANSPORT=huggingface` Changes RamaLama to use huggingface transport. Individual model transports can be modifies when specifying a model via the `huggingface://`, `oci://`, or `ollama://` prefix. ramalama pull `huggingface://`afrideva/Tiny-Vicuna-1B-GGUF/tiny-vicuna-1b.q2_k.gguf +To make it easier for users, ramalama uses shortname files, which container +alias names for fully specified AI Models allowing users to specify the shorter +names when referring to models. ramalama reads shortnames.conf files if they +exist . These files contain a list of name value pairs for specification of +the model. The following table specifies the order which Ramama reads the files +. Any duplicate names that exist override previously defined shortnames. + +| Shortnames type | Path | +| --------------- | ---------------------------------------- | +| Distribution | /usr/share/ramalama/shortnames.conf | +| Administrators | /etc/ramamala/shortnames.conf | +| Users | $HOME/.config/ramalama/shortnames.conf | + +```code +$ cat /usr/share/ramalama/shortnames.conf +[shortnames] + "tiny" = "ollama://tinyllama" + "granite" = "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf" + "granite:7b" = "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf" + "ibm/granite" = "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf" + "merlinite" = "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf" + "merlinite:7b" = "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf" +... +``` **ramalama [GLOBAL OPTIONS]** ## GLOBAL OPTIONS