404 when using separate LLM server on network #420

AshkanArabim · 2024-12-11T20:46:24Z

Describe the bug
I have a Ubuntu server on which I have Ollama installed. I've already downloaded the codellama models recommended in the quick start guide and have exposed the ollama server to the network and adjusted the "providers" configurations to point to the server.
BUT, when I try using the extension to chat with the model, the server logs 404 errors:

[GIN] 2024/12/11 - 13:22:36 | 404 |     916.848µs |  192.168.58.108 | POST     "/v1/chat/completions"
[GIN] 2024/12/11 - 13:22:36 | 404 |    1.185828ms |  192.168.58.108 | POST     "/v1/chat/completions"
[GIN] 2024/12/11 - 13:22:36 | 404 |     165.749µs |  192.168.58.108 | POST     "/v1/chat/completions"

In the VSCode extension, I get this message:

==## ERROR ##== : Server responded with status code: 404

To Reproduce

Get yourself an Ubuntu server.
Install ollama using curl -fsSL https://ollama.com/install.sh | sh, as described here.
Download these models, as seen in the quick start guide:
- ollama pull codellama:7b-instruct
- ollama pull codellama:7b-code
Start the Ollama server on 0.0.0.0 to expose it to the network: OLLAMA_HOST=0.0.0.0:11433 ollama serve
- Note that I'm using 11433 instead of 11434 because that is already taken by another Ollama container.
Adjust your providers like so (where the IP matches the server IP):
Try sending a message in the Twinny chat.

Expected behavior
For the Ollama server to respond and the extension to give an output.

Screenshots
If applicable, add screenshots to help explain your problem.

Logging
Rnable logging in the extension settings if not already enabled (you may need to restart vscode if you don't see logs). Proivide the log with the report.

API Provider
Ollama

Chat or Auto Complete?
Chat

Model Name
Using codellama:7b-code and codellama:7b-instruct

Desktop (please complete the following information):

OS: Arch Linux (user), Ubuntu Server (LLM server)
Extension version: v3.19.23 (from vscode extension store)

Additional context
Full log of ollama server:

$ OLLAMA_HOST=0.0.0.0:11433 ollama serve
2024/12/11 13:22:10 routes.go:1195: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11433 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/ashkan/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2024-12-11T13:22:10.953-07:00 level=INFO source=images.go:753 msg="total blobs: 0"
time=2024-12-11T13:22:10.953-07:00 level=INFO source=images.go:760 msg="total unused blobs removed: 0"
time=2024-12-11T13:22:10.954-07:00 level=INFO source=routes.go:1246 msg="Listening on [::]:11433 (version 0.5.1)"
time=2024-12-11T13:22:10.954-07:00 level=INFO source=common.go:135 msg="extracting embedded files" dir=/tmp/ollama1020983046/runners
time=2024-12-11T13:22:11.070-07:00 level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[cpu cpu_avx cpu_avx2 cuda_v11 cuda_v12 rocm]"
time=2024-12-11T13:22:11.070-07:00 level=INFO source=gpu.go:221 msg="looking for compatible GPUs"
time=2024-12-11T13:22:11.083-07:00 level=INFO source=gpu.go:620 msg="Unable to load cudart library /usr/lib/x86_64-linux-gnu/libcuda.so.565.57.01: cuda driver library init failure: 804"
time=2024-12-11T13:22:11.105-07:00 level=INFO source=gpu.go:386 msg="no compatible GPUs were discovered"
time=2024-12-11T13:22:11.105-07:00 level=INFO source=types.go:123 msg="inference compute" id=0 library=cpu variant=avx2 compute="" driver=0.0 name="" total="62.7 GiB" available="44.9 GiB"
[GIN] 2024/12/11 - 13:22:36 | 404 |     916.848µs |  192.168.58.108 | POST     "/v1/chat/completions"
[GIN] 2024/12/11 - 13:22:36 | 404 |    1.185828ms |  192.168.58.108 | POST     "/v1/chat/completions"
[GIN] 2024/12/11 - 13:22:36 | 404 |     165.749µs |  192.168.58.108 | POST     "/v1/chat/completions"
[GIN] 2024/12/11 - 13:22:55 | 404 |     350.273µs |  192.168.58.108 | POST     "/v1/chat/completions"
[GIN] 2024/12/11 - 13:31:35 | 404 |     371.202µs |  192.168.58.108 | POST     "/v1/chat/completions"
[GIN] 2024/12/11 - 13:31:35 | 404 |     371.262µs |  192.168.58.108 | POST     "/v1/chat/completions"
[GIN] 2024/12/11 - 13:31:35 | 404 |      252.22µs |  192.168.58.108 | POST     "/v1/chat/completions"
[GIN] 2024/12/11 - 13:38:03 | 404 |     339.042µs |  192.168.58.108 | POST     "/v1/chat/completions"
[GIN] 2024/12/11 - 13:38:03 | 404 |     412.339µs |  192.168.58.108 | POST     "/v1/chat/completions"
[GIN] 2024/12/11 - 13:38:03 | 404 |     211.604µs |  192.168.58.108 | POST     "/v1/chat/completions"

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

404 when using separate LLM server on network #420

404 when using separate LLM server on network #420

AshkanArabim commented Dec 11, 2024

404 when using separate LLM server on network #420

404 when using separate LLM server on network #420

Comments

AshkanArabim commented Dec 11, 2024