Skip to content

Commit

Permalink
Merge pull request #163 from rhatdan/docs
Browse files Browse the repository at this point in the history
Add more information to man pages and readme
  • Loading branch information
ericcurtin authored Sep 20, 2024
2 parents b3a2b24 + e3e7024 commit 682c52f
Show file tree
Hide file tree
Showing 2 changed files with 125 additions and 41 deletions.
109 changes: 81 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,56 @@ The Ramalama project's goal is to make working with AI boring
through the use of OCI containers.

On first run Ramalama inspects your system for GPU support, falling back to CPU
support if no GPUs are present. It then uses container engines like Podman to
pull the appropriate OCI image with all of the software necessary to run an
AI Model for your systems setup. This eliminates the need for the user to
configure the system for AI themselves. After the initialization, Ramalama
support if no GPUs are present. It then uses container engines like Podman or
Docker to pull the appropriate OCI image with all of the software necessary to
run an AI Model for your systems setup. This eliminates the need for the user
to configure the system for AI themselves. After the initialization, Ramalama
will run the AI Models within a container based on the OCI image.

Ramalama supports multiple AI model registries types called transports.
Supported transports:


## TRANSPORTS

| Transports | Web Site |
| ------------- | --------------------------------------------------- |
| HuggingFace | [`huggingface.co`](https://www.huggingface.co) |
| Ollama | [`ollama.com`](https://www.ollama.com) |
| OCI Container Registries | [`opencontainers.org`](https://opencontainers.org)|
||Examples: [`quay.io`](https://quay.io), [`Docker Hub`](https://docker.io), and [`Artifactory`](https://artifactory.com)|

The ramalama uses the Ollama registry transport by default. Use the RAMALAMA_TRANSPORTS environment variable to modify the default. `export RAMALAMA_TRANSPORT=huggingface` Changes RamaLama to use huggingface transport.

Individual model transports can be modifies when specifying a model via the `huggingface://`, `oci://`, or `ollama://` prefix.

ramalama pull `huggingface://`afrideva/Tiny-Vicuna-1B-GGUF/tiny-vicuna-1b.q2_k.gguf

To make it easier for users, ramalama uses shortname files, which container
alias names for fully specified AI Models allowing users to specify the shorter
names when referring to models. ramalama reads shortnames.conf files if they
exist . These files contain a list of name value pairs for specification of
the model. The following table specifies the order which Ramama reads the files
. Any duplicate names that exist override previously defined shortnames.

| Shortnames type | Path |
| --------------- | ---------------------------------------- |
| Distribution | /usr/share/ramalama/shortnames.conf |
| Administrators | /etc/ramamala/shortnames.conf |
| Users | $HOME/.config/ramalama/shortnames.conf |

```code
$ cat /usr/share/ramalama/shortnames.conf
[shortnames]
"tiny" = "ollama://tinyllama"
"granite" = "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf"
"granite:7b" = "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf"
"ibm/granite" = "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf"
"merlinite" = "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf"
"merlinite:7b" = "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf"
...
```

## Install

Install Ramalama by running this one-liner (on macOS run without sudo):
Expand Down Expand Up @@ -40,6 +84,7 @@ curl -fsSL https://raw.githubusercontent.com/containers/ramalama/s/install.py |

| Command | Description |
| ------------------------------------------------------ | ---------------------------------------------------------- |
| [ramalama(1)](docs/ramalama.1.md) | Primary ramalama man page. |
| [ramalama-containers(1)](docs/ramalama-containers.1.md)| List all ramalama containers. |
| [ramalama-list(1)](docs/ramalama-list.1.md) | List all AI models in local storage. |
| [ramalama-login(1)](docs/ramalama-login.1.md) | Login to remote model registry. |
Expand Down Expand Up @@ -111,10 +156,18 @@ $ ramalama pull granite-code

### Serving Models

You can `serve` a chatbot on a model using the `serve` command. By default, it pulls from the ollama registry.
You can `serve` multiple models using the `serve` command. By default, it pulls from the ollama registry.

```
$ ramalama serve --name mylama llama3
```

### Stopping servers

You can stop a running model if it is running in a container.

```
$ ramalama serve llama3
$ ramalama stop mylama
```

## Diagram
Expand All @@ -125,28 +178,28 @@ $ ramalama serve llama3
| ramalama run granite-code |
| |
+-------+-------------------+
|
|
| +------------------+
| | Pull model layer |
+----------------------------------------->| granite-code |
+------------------+
| Repo options: |
+-+-------+------+-+
| | |
v v v
+---------+ +------+ +----------+
| Hugging | | quay | | Ollama |
| Face | | | | Registry |
+-------+-+ +---+--+ +-+--------+
| | |
v v v
+------------------+
| Start with |
| llama.cpp and |
| granite-code |
| model |
+------------------+
|
|
| +------------------+
| | Pull model layer |
+----------------------------------------->| granite-code |
+------------------+
| Repo options: |
+-+-------+------+-+
| | |
v v v
+---------+ +------+ +----------+
| Hugging | | quay | | Ollama |
| Face | | | | Registry |
+-------+-+ +---+--+ +-+--------+
| | |
v v v
+------------------+
| Start with |
| llama.cpp and |
| granite-code |
| model |
+------------------+
```

## In development
Expand Down
57 changes: 44 additions & 13 deletions docs/ramalama.1.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,31 +7,62 @@ ramalama - Simple management tool for working with AI Models
**ramalama** [*options*] *command*

## DESCRIPTION
Ramalama : The goal of ramalama is to make AI boring. Ramalama can pull an AI
Model from model registires and start a chatbot or serve as a rest API from a
simple single command. It treats Models similar to the way that Podman or
Docker treat container images.
Ramalama : The goal of ramalama is to make AI boring.

Ramalama runs models with a specially designed container image containing all
of the tooling required to run the Model. Users d ont need to pre-configure
the host system.
On first run Ramalama inspects your system for GPU support, falling back to CPU
support if no GPUs are present. It then uses container engines like Podman or
Docker to pull the appropriate OCI image with all of the software necessary to run an
AI Model for your systems setup. This eliminates the need for the user to
configure the system for AI themselves. After the initialization, Ramalama
will run the AI Models within a container based on the OCI image.

Ramalama supports multiple model registries types called transports.
Ramalama first pulls AI Models from model registires. It then start a chatbot
or a service as a rest API from a simple single command. Models are treated similarly
to the way that Podman or Docker treat container images.

Ramalama supports multiple AI model registries types called transports.
Supported transports:

* HuggingFace : [`huggingface.co`](https://www.huggingface.co)

* Ollama : [`ollama.com`](https://www.ollama.com)
## TRANSPORTS

* OCI : [`opencontainers.org`](https://opencontainers.org)
(quay.io, docker.io, Artifactory)
| Transports | Web Site |
| ------------- | --------------------------------------------------- |
| HuggingFace | [`huggingface.co`](https://www.huggingface.co) |
| Ollama | [`ollama.com`](https://www.ollama.com) |
| OCI Container Registries | [`opencontainers.org`](https://opencontainers.org)|
||Examples: [`quay.io`](https://quay.io), [`Docker Hub`](https://docker.io), and [`Artifactory`](https://artifactory.com)|

RamaLama uses the OCI registry transport by default. Use the RAMALAMA_TRANSPORTS environment variable to modify the default. `export RAMALAMA_TRANSPORT=ollama` Changes RamaLama to use ollama transport.
The ramalama uses the Ollama registry transport by default. Use the RAMALAMA_TRANSPORTS environment variable to modify the default. `export RAMALAMA_TRANSPORT=huggingface` Changes RamaLama to use huggingface transport.

Individual model transports can be modifies when specifying a model via the `huggingface://`, `oci://`, or `ollama://` prefix.

ramalama pull `huggingface://`afrideva/Tiny-Vicuna-1B-GGUF/tiny-vicuna-1b.q2_k.gguf

To make it easier for users, ramalama uses shortname files, which container
alias names for fully specified AI Models allowing users to specify the shorter
names when referring to models. ramalama reads shortnames.conf files if they
exist . These files contain a list of name value pairs for specification of
the model. The following table specifies the order which Ramama reads the files
. Any duplicate names that exist override previously defined shortnames.

| Shortnames type | Path |
| --------------- | ---------------------------------------- |
| Distribution | /usr/share/ramalama/shortnames.conf |
| Administrators | /etc/ramamala/shortnames.conf |
| Users | $HOME/.config/ramalama/shortnames.conf |

```code
$ cat /usr/share/ramalama/shortnames.conf
[shortnames]
"tiny" = "ollama://tinyllama"
"granite" = "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf"
"granite:7b" = "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf"
"ibm/granite" = "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf"
"merlinite" = "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf"
"merlinite:7b" = "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf"
...
```
**ramalama [GLOBAL OPTIONS]**

## GLOBAL OPTIONS
Expand Down

0 comments on commit 682c52f

Please sign in to comment.