Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

404 when using separate LLM server on network #420

Open
AshkanArabim opened this issue Dec 11, 2024 · 0 comments
Open

404 when using separate LLM server on network #420

AshkanArabim opened this issue Dec 11, 2024 · 0 comments

Comments

@AshkanArabim
Copy link

Describe the bug
I have a Ubuntu server on which I have Ollama installed. I've already downloaded the codellama models recommended in the quick start guide and have exposed the ollama server to the network and adjusted the "providers" configurations to point to the server.
BUT, when I try using the extension to chat with the model, the server logs 404 errors:

[GIN] 2024/12/11 - 13:22:36 | 404 |     916.848µs |  192.168.58.108 | POST     "/v1/chat/completions"
[GIN] 2024/12/11 - 13:22:36 | 404 |    1.185828ms |  192.168.58.108 | POST     "/v1/chat/completions"
[GIN] 2024/12/11 - 13:22:36 | 404 |     165.749µs |  192.168.58.108 | POST     "/v1/chat/completions"

In the VSCode extension, I get this message:

==## ERROR ##== : Server responded with status code: 404

To Reproduce

  • Get yourself an Ubuntu server.
  • Install ollama using curl -fsSL https://ollama.com/install.sh | sh, as described here.
  • Download these models, as seen in the quick start guide:
    • ollama pull codellama:7b-instruct
    • ollama pull codellama:7b-code
  • Start the Ollama server on 0.0.0.0 to expose it to the network: OLLAMA_HOST=0.0.0.0:11433 ollama serve
    • Note that I'm using 11433 instead of 11434 because that is already taken by another Ollama container.
  • Adjust your providers like so (where the IP matches the server IP):
    image
  • Try sending a message in the Twinny chat.

Expected behavior
For the Ollama server to respond and the extension to give an output.

Screenshots
If applicable, add screenshots to help explain your problem.
image
image

Logging
Rnable logging in the extension settings if not already enabled (you may need to restart vscode if you don't see logs). Proivide the log with the report.

API Provider
Ollama

Chat or Auto Complete?
Chat

Model Name
Using codellama:7b-code and codellama:7b-instruct

Desktop (please complete the following information):

  • OS: Arch Linux (user), Ubuntu Server (LLM server)
  • Extension version: v3.19.23 (from vscode extension store)

Additional context
Full log of ollama server:

$ OLLAMA_HOST=0.0.0.0:11433 ollama serve
2024/12/11 13:22:10 routes.go:1195: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11433 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/ashkan/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2024-12-11T13:22:10.953-07:00 level=INFO source=images.go:753 msg="total blobs: 0"
time=2024-12-11T13:22:10.953-07:00 level=INFO source=images.go:760 msg="total unused blobs removed: 0"
time=2024-12-11T13:22:10.954-07:00 level=INFO source=routes.go:1246 msg="Listening on [::]:11433 (version 0.5.1)"
time=2024-12-11T13:22:10.954-07:00 level=INFO source=common.go:135 msg="extracting embedded files" dir=/tmp/ollama1020983046/runners
time=2024-12-11T13:22:11.070-07:00 level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[cpu cpu_avx cpu_avx2 cuda_v11 cuda_v12 rocm]"
time=2024-12-11T13:22:11.070-07:00 level=INFO source=gpu.go:221 msg="looking for compatible GPUs"
time=2024-12-11T13:22:11.083-07:00 level=INFO source=gpu.go:620 msg="Unable to load cudart library /usr/lib/x86_64-linux-gnu/libcuda.so.565.57.01: cuda driver library init failure: 804"
time=2024-12-11T13:22:11.105-07:00 level=INFO source=gpu.go:386 msg="no compatible GPUs were discovered"
time=2024-12-11T13:22:11.105-07:00 level=INFO source=types.go:123 msg="inference compute" id=0 library=cpu variant=avx2 compute="" driver=0.0 name="" total="62.7 GiB" available="44.9 GiB"
[GIN] 2024/12/11 - 13:22:36 | 404 |     916.848µs |  192.168.58.108 | POST     "/v1/chat/completions"
[GIN] 2024/12/11 - 13:22:36 | 404 |    1.185828ms |  192.168.58.108 | POST     "/v1/chat/completions"
[GIN] 2024/12/11 - 13:22:36 | 404 |     165.749µs |  192.168.58.108 | POST     "/v1/chat/completions"
[GIN] 2024/12/11 - 13:22:55 | 404 |     350.273µs |  192.168.58.108 | POST     "/v1/chat/completions"
[GIN] 2024/12/11 - 13:31:35 | 404 |     371.202µs |  192.168.58.108 | POST     "/v1/chat/completions"
[GIN] 2024/12/11 - 13:31:35 | 404 |     371.262µs |  192.168.58.108 | POST     "/v1/chat/completions"
[GIN] 2024/12/11 - 13:31:35 | 404 |      252.22µs |  192.168.58.108 | POST     "/v1/chat/completions"
[GIN] 2024/12/11 - 13:38:03 | 404 |     339.042µs |  192.168.58.108 | POST     "/v1/chat/completions"
[GIN] 2024/12/11 - 13:38:03 | 404 |     412.339µs |  192.168.58.108 | POST     "/v1/chat/completions"
[GIN] 2024/12/11 - 13:38:03 | 404 |     211.604µs |  192.168.58.108 | POST     "/v1/chat/completions"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant