LlamaStackDirectClient with ollama failed to run #562

wukaixingxp · 2024-12-03T18:18:04Z

System Info

python -m torch.utils.collect_env
/opt/homebrew/anaconda3/envs/llamastack-ollama/lib/python3.10/runpy.py:126: RuntimeWarning: 'torch.utils.collect_env' found in sys.modules after import of package 'torch.utils', but prior to execution of 'torch.utils.collect_env'; this may result in unpredictable behaviour
warn(RuntimeWarning(msg))
Collecting environment information...
PyTorch version: 2.5.1
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 15.1.1 (arm64)
GCC version: Could not collect
Clang version: 16.0.0 (clang-1600.0.26.4)
CMake version: Could not collect
Libc version: N/A

Python version: 3.10.15 (main, Oct 3 2024, 02:24:49) [Clang 14.0.6 ] (64-bit runtime)
Python platform: macOS-15.1.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M1 Pro

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] torch==2.5.1
[conda] numpy 1.26.4 pypi_0 pypi
[conda] torch 2.5.1 pypi_0 pypi

Information

The official example scripts
My own modified scripts

🐛 Describe the bug

response r is not a dict but <class 'ollama._types.GenerateResponse'> , Maybe we should remove this line
---My script-----

import asyncio
import os

#pip install aiosqlite ollama faiss
from llama_stack_client.lib.direct.direct import LlamaStackDirectClient
from llama_stack_client.types import SystemMessage, UserMessage


async def main():
    os.environ["INFERENCE_MODEL"] = "meta-llama/Llama-3.2-1B-Instruct"
    client = await LlamaStackDirectClient.from_template("ollama")
    await client.initialize()
    response = await client.models.list()
    print(response)
    model_name = response[0].identifier
    response = await client.inference.chat_completion(
        messages=[
            SystemMessage(content="You are a friendly assistant.", role="system"),
            UserMessage(
                content="hello world, write me a 2 sentence poem about the moon",
                role="user",
            ),
        ],
        model_id=model_name,
        stream=False,
    )
    print("\nChat completion response:")
    print(response, type(response))


asyncio.run(main())

Error logs

Error:

❯ python test.py
Using template ollama with config:
apis:
- agents
- inference
- memory
- safety
- telemetry
conda_env: ollama
datasets: []
docker_image: null
eval_tasks: []
image_name: ollama
memory_banks: []
metadata_store:
  db_path: /Users/kaiwu/.llama/distributions/ollama/registry.db
  namespace: null
  type: sqlite
models:
- metadata: {}
  model_id: meta-llama/Llama-3.2-1B-Instruct
  provider_id: ollama
  provider_model_id: null
providers:
  agents:
  - config:
      persistence_store:
        db_path: /Users/kaiwu/.llama/distributions/ollama/agents_store.db
        namespace: null
        type: sqlite
    provider_id: meta-reference
    provider_type: inline::meta-reference
  inference:
  - config:
      url: http://localhost:11434
    provider_id: ollama
    provider_type: remote::ollama
  memory:
  - config:
      kvstore:
        db_path: /Users/kaiwu/.llama/distributions/ollama/faiss_store.db
        namespace: null
        type: sqlite
    provider_id: faiss
    provider_type: inline::faiss
  safety:
  - config: {}
    provider_id: llama-guard
    provider_type: inline::llama-guard
  telemetry:
  - config: {}
    provider_id: meta-reference
    provider_type: inline::meta-reference
scoring_fns: []
shields: []
version: '2'

[Model(identifier='meta-llama/Llama-3.2-1B-Instruct', provider_resource_id='llama3.2:1b-instruct-fp16', provider_id='ollama', type='model', metadata={})]
Traceback (most recent call last):
  File "/Users/kaiwu/work/llama-stack-apps/examples/DocQA/test.py", line 32, in <module>
    asyncio.run(main())
  File "/opt/homebrew/anaconda3/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/Users/kaiwu/work/llama-stack-apps/examples/DocQA/test.py", line 17, in main
    response = await client.inference.chat_completion(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/llama_stack_client/lib/direct/direct.py", line 147, in post
    async for response in self._call_endpoint(path, "POST", body):
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/llama_stack_client/lib/direct/direct.py", line 118, in _call_endpoint
    yield await func(**(body or {}))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/llama_stack/distribution/routers/routers.py", line 123, in chat_completion
    return await provider.chat_completion(**params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/llama_stack/providers/remote/inference/ollama/ollama.py", line 221, in chat_completion
    return await self._nonstream_chat_completion(request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/llama_stack/providers/remote/inference/ollama/ollama.py", line 273, in _nonstream_chat_completion
    assert isinstance(r, dict)
AssertionError

---debug print----

...
version: '2'

[Model(identifier='meta-llama/Llama-3.2-1B-Instruct', provider_resource_id='llama3.2:1b-instruct-fp16', provider_id='ollama', type='model', metadata={})]
r model='llama3.2:1b-instruct-fp16' created_at='2024-12-03T18:00:33.186056Z' done=True done_reason='stop' total_duration=1213954708 load_duration=29042500 prompt_eval_count=35 prompt_eval_duration=698000000 eval_count=30 eval_duration=486000000 response='Here is a short poem about the moon:\n\nThe moon glows softly in the midnight sky,\nA silver crescent shining, catching the eye.' context=None 
type <class 'ollama._types.GenerateResponse'>

Chat completion response:
completion_message=CompletionMessage(role='assistant', content='Here is a short poem about the moon:\n\nThe moon glows softly in the midnight sky,\nA silver crescent shining, catching the eye.', stop_reason=<StopReason.end_of_turn: 'end_of_turn'>, tool_calls=[]) logprobs=None

Expected behavior

importing_as_library should be able to run with Ollama

The text was updated successfully, but these errors were encountered:

ashwinb · 2024-12-04T00:00:06Z

@wukaixingxp does removing that line help?

wukaixingxp · 2024-12-04T00:05:17Z

Yes, but I was wondering why that line works with normal LlamaStackClient but not LlamaStackDirectClient.

wukaixingxp · 2024-12-04T00:07:33Z

also in this line model= should be model_id= .

wukaixingxp · 2024-12-04T00:23:40Z

PR sent to fix this issue. @ashwinb Please take a look!

# What does this PR do? 1. removed [incorrect assertion](https://github.com/meta-llama/llama-stack/blob/435f34b05e84f1747b28570234f25878cf0b31c4/llama_stack/providers/remote/inference/ollama/ollama.py#L183) in ollama.py 2. fixed a typo in [this line](https://github.com/meta-llama/llama-stack/blob/435f34b05e84f1747b28570234f25878cf0b31c4/docs/source/distributions/importing_as_library.md?plain=1#L24), as `model=` should be `model_id=` . - [x] Addresses issue ([#issue562](#562)) ## Test Plan tested with code: ```python import asyncio import os # pip install aiosqlite ollama faiss from llama_stack_client.lib.direct.direct import LlamaStackDirectClient from llama_stack_client.types import SystemMessage, UserMessage async def main(): os.environ["INFERENCE_MODEL"] = "meta-llama/Llama-3.2-1B-Instruct" client = await LlamaStackDirectClient.from_template("ollama") await client.initialize() response = await client.models.list() print(response) model_name = response[0].identifier response = await client.inference.chat_completion( messages=[ SystemMessage(content="You are a friendly assistant.", role="system"), UserMessage( content="hello world, write me a 2 sentence poem about the moon", role="user", ), ], model_id=model_name, stream=False, ) print("\nChat completion response:") print(response, type(response)) asyncio.run(main()) ``` OUTPUT: ``` python test.py Using template ollama with config: apis: - agents - inference - memory - safety - telemetry conda_env: ollama datasets: [] docker_image: null eval_tasks: [] image_name: ollama memory_banks: [] metadata_store: db_path: /Users/kaiwu/.llama/distributions/ollama/registry.db namespace: null type: sqlite models: - metadata: {} model_id: meta-llama/Llama-3.2-1B-Instruct provider_id: ollama provider_model_id: null providers: agents: - config: persistence_store: db_path: /Users/kaiwu/.llama/distributions/ollama/agents_store.db namespace: null type: sqlite provider_id: meta-reference provider_type: inline::meta-reference inference: - config: url: http://localhost:11434 provider_id: ollama provider_type: remote::ollama memory: - config: kvstore: db_path: /Users/kaiwu/.llama/distributions/ollama/faiss_store.db namespace: null type: sqlite provider_id: faiss provider_type: inline::faiss safety: - config: {} provider_id: llama-guard provider_type: inline::llama-guard telemetry: - config: {} provider_id: meta-reference provider_type: inline::meta-reference scoring_fns: [] shields: [] version: '2' [Model(identifier='meta-llama/Llama-3.2-1B-Instruct', provider_resource_id='llama3.2:1b-instruct-fp16', provider_id='ollama', type='model', metadata={})] Chat completion response: completion_message=CompletionMessage(role='assistant', content='Here is a short poem about the moon:\n\nThe moon glows bright in the midnight sky,\nA silver crescent shining, catching the eye.', stop_reason=<StopReason.end_of_turn: 'end_of_turn'>, tool_calls=[]) logprobs=None <class 'llama_stack.apis.inference.inference.ChatCompletionResponse'> ``` ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.

miguelg719 · 2024-12-06T03:56:40Z

Will this fix go into the python package anytime soon? I'm working on a llama-recipe using llama-stack and it would be easier to follow by just using pip install. I experience the issue with both Direct client and regular client.

By the way, the issue also corrupts the quickstart example:
https://llama-stack.readthedocs.io/en/latest/getting_started/index.html

I'm guessing a new docker image must be created and pushed?

@ashwinb @wukaixingxp any plans in the near future to support ':latest' tags from Ollama? Langchain does

yanxi0830 · 2024-12-10T18:27:43Z

Does this PR close the issue? #563

miguelg719 · 2024-12-11T20:00:09Z

New python package has this integrated, thanks

wukaixingxp mentioned this issue Dec 4, 2024

removed assertion in ollama.py and fixed typo in the readme #563

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LlamaStackDirectClient with ollama failed to run #562

LlamaStackDirectClient with ollama failed to run #562

wukaixingxp commented Dec 3, 2024 •

edited

Loading

ashwinb commented Dec 4, 2024

wukaixingxp commented Dec 4, 2024

wukaixingxp commented Dec 4, 2024

wukaixingxp commented Dec 4, 2024

miguelg719 commented Dec 6, 2024 •

edited

Loading

yanxi0830 commented Dec 10, 2024

miguelg719 commented Dec 11, 2024

LlamaStackDirectClient with ollama failed to run #562

LlamaStackDirectClient with ollama failed to run #562

Comments

wukaixingxp commented Dec 3, 2024 • edited Loading

System Info

Information

🐛 Describe the bug

Error logs

Expected behavior

ashwinb commented Dec 4, 2024

wukaixingxp commented Dec 4, 2024

wukaixingxp commented Dec 4, 2024

wukaixingxp commented Dec 4, 2024

miguelg719 commented Dec 6, 2024 • edited Loading

yanxi0830 commented Dec 10, 2024

miguelg719 commented Dec 11, 2024

wukaixingxp commented Dec 3, 2024 •

edited

Loading

miguelg719 commented Dec 6, 2024 •

edited

Loading