Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LlamaStackDirectClient with ollama failed to run #562

Open
1 of 2 tasks
wukaixingxp opened this issue Dec 3, 2024 · 7 comments
Open
1 of 2 tasks

LlamaStackDirectClient with ollama failed to run #562

wukaixingxp opened this issue Dec 3, 2024 · 7 comments

Comments

@wukaixingxp
Copy link
Contributor

wukaixingxp commented Dec 3, 2024

System Info

python -m torch.utils.collect_env
/opt/homebrew/anaconda3/envs/llamastack-ollama/lib/python3.10/runpy.py:126: RuntimeWarning: 'torch.utils.collect_env' found in sys.modules after import of package 'torch.utils', but prior to execution of 'torch.utils.collect_env'; this may result in unpredictable behaviour
warn(RuntimeWarning(msg))
Collecting environment information...
PyTorch version: 2.5.1
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 15.1.1 (arm64)
GCC version: Could not collect
Clang version: 16.0.0 (clang-1600.0.26.4)
CMake version: Could not collect
Libc version: N/A

Python version: 3.10.15 (main, Oct 3 2024, 02:24:49) [Clang 14.0.6 ] (64-bit runtime)
Python platform: macOS-15.1.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M1 Pro

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] torch==2.5.1
[conda] numpy 1.26.4 pypi_0 pypi
[conda] torch 2.5.1 pypi_0 pypi

Information

  • The official example scripts
  • My own modified scripts

🐛 Describe the bug

response r is not a dict but <class 'ollama._types.GenerateResponse'> , Maybe we should remove this line
---My script-----

import asyncio
import os

#pip install aiosqlite ollama faiss
from llama_stack_client.lib.direct.direct import LlamaStackDirectClient
from llama_stack_client.types import SystemMessage, UserMessage


async def main():
    os.environ["INFERENCE_MODEL"] = "meta-llama/Llama-3.2-1B-Instruct"
    client = await LlamaStackDirectClient.from_template("ollama")
    await client.initialize()
    response = await client.models.list()
    print(response)
    model_name = response[0].identifier
    response = await client.inference.chat_completion(
        messages=[
            SystemMessage(content="You are a friendly assistant.", role="system"),
            UserMessage(
                content="hello world, write me a 2 sentence poem about the moon",
                role="user",
            ),
        ],
        model_id=model_name,
        stream=False,
    )
    print("\nChat completion response:")
    print(response, type(response))


asyncio.run(main())

Error logs

Error:

❯ python test.py
Using template ollama with config:
apis:
- agents
- inference
- memory
- safety
- telemetry
conda_env: ollama
datasets: []
docker_image: null
eval_tasks: []
image_name: ollama
memory_banks: []
metadata_store:
  db_path: /Users/kaiwu/.llama/distributions/ollama/registry.db
  namespace: null
  type: sqlite
models:
- metadata: {}
  model_id: meta-llama/Llama-3.2-1B-Instruct
  provider_id: ollama
  provider_model_id: null
providers:
  agents:
  - config:
      persistence_store:
        db_path: /Users/kaiwu/.llama/distributions/ollama/agents_store.db
        namespace: null
        type: sqlite
    provider_id: meta-reference
    provider_type: inline::meta-reference
  inference:
  - config:
      url: http://localhost:11434
    provider_id: ollama
    provider_type: remote::ollama
  memory:
  - config:
      kvstore:
        db_path: /Users/kaiwu/.llama/distributions/ollama/faiss_store.db
        namespace: null
        type: sqlite
    provider_id: faiss
    provider_type: inline::faiss
  safety:
  - config: {}
    provider_id: llama-guard
    provider_type: inline::llama-guard
  telemetry:
  - config: {}
    provider_id: meta-reference
    provider_type: inline::meta-reference
scoring_fns: []
shields: []
version: '2'

[Model(identifier='meta-llama/Llama-3.2-1B-Instruct', provider_resource_id='llama3.2:1b-instruct-fp16', provider_id='ollama', type='model', metadata={})]
Traceback (most recent call last):
  File "/Users/kaiwu/work/llama-stack-apps/examples/DocQA/test.py", line 32, in <module>
    asyncio.run(main())
  File "/opt/homebrew/anaconda3/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/Users/kaiwu/work/llama-stack-apps/examples/DocQA/test.py", line 17, in main
    response = await client.inference.chat_completion(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/llama_stack_client/lib/direct/direct.py", line 147, in post
    async for response in self._call_endpoint(path, "POST", body):
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/llama_stack_client/lib/direct/direct.py", line 118, in _call_endpoint
    yield await func(**(body or {}))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/llama_stack/distribution/routers/routers.py", line 123, in chat_completion
    return await provider.chat_completion(**params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/llama_stack/providers/remote/inference/ollama/ollama.py", line 221, in chat_completion
    return await self._nonstream_chat_completion(request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/llama_stack/providers/remote/inference/ollama/ollama.py", line 273, in _nonstream_chat_completion
    assert isinstance(r, dict)
AssertionError

---debug print----

...
version: '2'

[Model(identifier='meta-llama/Llama-3.2-1B-Instruct', provider_resource_id='llama3.2:1b-instruct-fp16', provider_id='ollama', type='model', metadata={})]
r model='llama3.2:1b-instruct-fp16' created_at='2024-12-03T18:00:33.186056Z' done=True done_reason='stop' total_duration=1213954708 load_duration=29042500 prompt_eval_count=35 prompt_eval_duration=698000000 eval_count=30 eval_duration=486000000 response='Here is a short poem about the moon:\n\nThe moon glows softly in the midnight sky,\nA silver crescent shining, catching the eye.' context=None 
type <class 'ollama._types.GenerateResponse'>

Chat completion response:
completion_message=CompletionMessage(role='assistant', content='Here is a short poem about the moon:\n\nThe moon glows softly in the midnight sky,\nA silver crescent shining, catching the eye.', stop_reason=<StopReason.end_of_turn: 'end_of_turn'>, tool_calls=[]) logprobs=None

Expected behavior

importing_as_library should be able to run with Ollama

@ashwinb
Copy link
Contributor

ashwinb commented Dec 4, 2024

@wukaixingxp does removing that line help?

@wukaixingxp
Copy link
Contributor Author

Yes, but I was wondering why that line works with normal LlamaStackClient but not LlamaStackDirectClient.

@wukaixingxp
Copy link
Contributor Author

also in this line model= should be model_id= .

@wukaixingxp
Copy link
Contributor Author

PR sent to fix this issue. @ashwinb Please take a look!

ashwinb pushed a commit that referenced this issue Dec 4, 2024
# What does this PR do?
1. removed [incorrect
assertion](https://github.com/meta-llama/llama-stack/blob/435f34b05e84f1747b28570234f25878cf0b31c4/llama_stack/providers/remote/inference/ollama/ollama.py#L183)
in ollama.py
2. fixed a typo in [this
line](https://github.com/meta-llama/llama-stack/blob/435f34b05e84f1747b28570234f25878cf0b31c4/docs/source/distributions/importing_as_library.md?plain=1#L24),
as `model=` should be `model_id=` .

- [x] Addresses issue
([#issue562](#562))


## Test Plan

tested with code:

```python
import asyncio
import os

# pip install aiosqlite ollama faiss
from llama_stack_client.lib.direct.direct import LlamaStackDirectClient
from llama_stack_client.types import SystemMessage, UserMessage


async def main():
    os.environ["INFERENCE_MODEL"] = "meta-llama/Llama-3.2-1B-Instruct"
    client = await LlamaStackDirectClient.from_template("ollama")
    await client.initialize()
    response = await client.models.list()
    print(response)
    model_name = response[0].identifier
    response = await client.inference.chat_completion(
        messages=[
            SystemMessage(content="You are a friendly assistant.", role="system"),
            UserMessage(
                content="hello world, write me a 2 sentence poem about the moon",
                role="user",
            ),
        ],
        model_id=model_name,
        stream=False,
    )
    print("\nChat completion response:")
    print(response, type(response))


asyncio.run(main())

```
OUTPUT:
```
python test.py
Using template ollama with config:
apis:
- agents
- inference
- memory
- safety
- telemetry
conda_env: ollama
datasets: []
docker_image: null
eval_tasks: []
image_name: ollama
memory_banks: []
metadata_store:
  db_path: /Users/kaiwu/.llama/distributions/ollama/registry.db
  namespace: null
  type: sqlite
models:
- metadata: {}
  model_id: meta-llama/Llama-3.2-1B-Instruct
  provider_id: ollama
  provider_model_id: null
providers:
  agents:
  - config:
      persistence_store:
        db_path:
/Users/kaiwu/.llama/distributions/ollama/agents_store.db
        namespace: null
        type: sqlite
    provider_id: meta-reference
    provider_type: inline::meta-reference
  inference:
  - config:
      url: http://localhost:11434
    provider_id: ollama
    provider_type: remote::ollama
  memory:
  - config:
      kvstore:
        db_path:
/Users/kaiwu/.llama/distributions/ollama/faiss_store.db
        namespace: null
        type: sqlite
    provider_id: faiss
    provider_type: inline::faiss
  safety:
  - config: {}
    provider_id: llama-guard
    provider_type: inline::llama-guard
  telemetry:
  - config: {}
    provider_id: meta-reference
    provider_type: inline::meta-reference
scoring_fns: []
shields: []
version: '2'

[Model(identifier='meta-llama/Llama-3.2-1B-Instruct', provider_resource_id='llama3.2:1b-instruct-fp16', provider_id='ollama', type='model', metadata={})]

Chat completion response:
completion_message=CompletionMessage(role='assistant', content='Here is a short poem about the moon:\n\nThe moon glows bright in the midnight sky,\nA silver crescent shining, catching the eye.', stop_reason=<StopReason.end_of_turn: 'end_of_turn'>, tool_calls=[]) logprobs=None <class 'llama_stack.apis.inference.inference.ChatCompletionResponse'>
```

## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
@miguelg719
Copy link

miguelg719 commented Dec 6, 2024

Will this fix go into the python package anytime soon? I'm working on a llama-recipe using llama-stack and it would be easier to follow by just using pip install. I experience the issue with both Direct client and regular client.

By the way, the issue also corrupts the quickstart example:
https://llama-stack.readthedocs.io/en/latest/getting_started/index.html

I'm guessing a new docker image must be created and pushed?

@ashwinb @wukaixingxp any plans in the near future to support ':latest' tags from Ollama? Langchain does

@yanxi0830
Copy link
Contributor

Does this PR close the issue? #563

@miguelg719
Copy link

New python package has this integrated, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants