Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ollama llama3.x models do not work with LangChain chat/tool integration #12780

Open
tkarna opened this issue Feb 6, 2025 · 3 comments
Open
Assignees

Comments

@tkarna
Copy link

tkarna commented Feb 6, 2025

Llama3.x models ran through ipex-llm's ollama do not work with LangChain chat/tool integration.

Here's a minimal example of a chat model with tools:

# pip install langchain langchain-ollama
# ollama pull llama3.2:3b-instruct-q4_K_M
from langchain_core.tools import tool
from langchain_ollama.chat_models import ChatOllama


@tool
def get_weather(location: str):
    """Call to get the current weather."""
    if location.lower() in ["sf", "san francisco"]:
        return "It's 60 degrees and foggy."
    else:
        return "It's 90 degrees and sunny."


model = ChatOllama(
    model= "llama3.2:3b-instruct-q4_K_M",
    num_predict=50,  # limit number of tokens to stop hallucination
)

tools = [get_weather]
model_with_tools = model.bind_tools(tools)

res = model_with_tools.invoke("what's the weather in sf?")
res.pretty_print()

This example runs correctly with standard ollama. Expected response with tool arguments:

================================== Ai Message ==================================
Tool Calls:
  get_weather (328879e5-247c-48d1-9013-39f0e1b65539)
 Call ID: 328879e5-247c-48d1-9013-39f0e1b65539
  Args:
    location: sf

Actual output shows that the model just hallucinates:

================================== Ai Message ==================================

I hope that a new
$ has several times this would be =_._ _-level was an item 8/<< is not only to the best
To view= (or is a significant but also knows are still allow [or)

Tested with:

Ubuntu 22.04.5 LTS
oneapi/2025.0
python 3.10.0
ipex-llm 2.2.0b20250105, 2.2.0b20250123
langchain 0.3.17
langchain-ollama 0.2.3
GPU: Intel(R) Data Center GPU Max 1100

@tkarna
Copy link
Author

tkarna commented Feb 6, 2025

Ollama server is able to generate JSON output however. This command

curl -X POST http://localhost:11434/api/chat -H "Content-Type: application/json" -d '{
  "model": "llama3.2:1b-instruct-q4_K_M",
  "messages": [{"role": "user", "content": "Tell me about Canada."}],
  "stream": false,
  "format": {
    "type": "object",
    "properties": {
      "name": {
        "type": "string"
      },
      "capital": {
        "type": "string"
      },
      "languages": {
        "type": "array",
        "items": {
          "type": "string"
        }
      }
    },
    "required": [
      "name",
      "capital",
      "languages"
    ]
  }
}'

produces a correct result

{
  "model":"llama3.2:1b-instruct-q4_K_M",
  "created_at":"2025-02-06T13:21:51.954049417Z",
  "message":{
    "role":"assistant",
    "content":"{ \"capital\": \"Ottawa\", \"languages\": [\"English\", \"French\"], \"name\": \"Canada\" }"
  },
  "done_reason":"stop",
  "done":true,
  "total_duration":3687525616,
  "load_duration":2630099684,
  "prompt_eval_count":30,
  "prompt_eval_duration":580000000,
  "eval_count":33,
  "eval_duration":473000000
}

@sgwhat
Copy link
Contributor

sgwhat commented Feb 7, 2025

Hi @tkarna , May I ask what "standard ollama" refers to? Is it "ollama run" or the community version of ollama?

@tkarna
Copy link
Author

tkarna commented Feb 7, 2025

Hi @tkarna , May I ask what "standard ollama" refers to? Is it "ollama run" or the community version of ollama?

Community ollama running on CPU. I have also compiled ollama 3.13 with Intel GPU support. The above test case works with both of these.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants