How does applying a model from URL work? #4

braheezy · 2023-05-28T18:39:38Z

Hello! I am an absolute LLM noob so I apologize if these are rather basic questions. I am loving LocalAI so far and it's been incredibly easy to get running with models from the gallery.

I wanted to try a model where the definition does not contain a URL, like Vicuna or Koala. The instructions indicate a POST request should be sent, using the koala.yaml configuration file from this repository and to supply URI(s) to actual model files to use, probably from HuggingFace:

curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
     "url": "github:go-skynet/model-gallery/koala.yaml",
     "name": "koala",
     "overrides": { "parameters": {"model": "koala.bin" } },
     "files": [
        {
            "uri": "https://huggingface.co/xxxx",
            "sha256": "xxx",
            "filename": "koala.bin"
        }
     ]
   }'

So I went to HuggingFace, searched koala and reviewed one of the top results. It appears to have the model split into multiple files:

pytorch_model-00001-of-000002.bin
pytorch_model-00002-of-000002.bin

Presumably both of these files are needed. I couldn't find examples of how to handle model bin files that are split across multiple files. Additional, some light research indicates I couldn't just cat the model files together.

I found this repository that seems to host a single koala model file. So I tried that:

curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
     "url": "github:go-skynet/model-gallery/koala.yaml",
     "name": "koala",
     "overrides": { "parameters": {"model": "koala.bin" } },
     "files": [
        {
            "uri": "https://huggingface.co/4bit/koala-13B-GPTQ-4bit-128g/resolve/main/koala-13B-4bit-128g.safetensors",
            "sha256": "${SHA}",
            "filename": "koala.bin"
        }
     ]
   }'

(I downloaded the file first and calculated the SHA256, then ran this command and LocalAI also downloaded the model. Is that right?)

After the job finished processing, I was able to see the new model defined:

$ curl -q $LOCALAI/v1/models | jq '.'
{
  "object": "list",
  "data": [
    {
      "id": "ggml-gpt4all-j",
      "object": "model"
    },
    {
      "id": "koala.bin",
      "object": "model"
    },
  ]
}

I proceeded to place prompt-templates/koala.tmpl into the models/ directory. I then tried to call the model and got a 500 error:

$ curl $LOCALAI/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "koala.bin",
     "messages": [{"role": "user", "content": "How are you?"}],
     "temperature": 0.9 
   }'
{"error":{"code":500,"message":"could not load model - all backends returned error: 12 errors occurred:\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\n","type":""}}

I am sure I took a wrong turn at some point. Any advice? Thanks!

The text was updated successfully, but these errors were encountered:

mudler · 2023-05-29T07:49:32Z

Hey !

The files you picked are for pytorch - you should pick instead ggml files. A tip: I usually search for "ggml" in hugging face instead.

The Author (TheBloke) in the huggingface link you refered to has uploaded quite a bunch of them!

Edit: I had to update your comment and remove any link (as the license of those models is unclear)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does applying a model from URL work? #4

How does applying a model from URL work? #4

braheezy commented May 28, 2023 •

edited by mudler

Loading

mudler commented May 29, 2023 •

edited

Loading

How does applying a model from URL work? #4

How does applying a model from URL work? #4

Comments

braheezy commented May 28, 2023 • edited by mudler Loading

mudler commented May 29, 2023 • edited Loading

braheezy commented May 28, 2023 •

edited by mudler

Loading

mudler commented May 29, 2023 •

edited

Loading