Notes on how MLX models handle tool‐calling

Hermes 2 Theta Llama 3 8B

Tried: mlx-community/Hermes-2-Theta-Llama-3-8B-4bit, which worked quite well, as covered in the README.

Firefunction

Tried: mlx-community/firefunction-v2-4bit Usain Bolt example gave me:

  File "<template>", line 21, in top-level template code
jinja2.exceptions.UndefinedError: 'functions' is undefined

This looks like something we might be able to massage in schema_helper.py.

There is a code snippet in the original model card which should provide a clue.

dolphin-2.9.2 Phi 3 Medium

Tried: mlx-community/dolphin-2.9.2-Phi-3-Medium-4bit

Usain Bolt example worked fine:

toolio_request --apibase="http://127.0.0.1:8000" --tool=toolio.tool.math.calculator --trace \
--prompt='Usain Bolt ran the 100m race in 9.58s. What was his average velocity?'

Resulted in:

⚙️Calling tool calculator with args {'expr': '100 / 9.58'}
⚙️Tool call result: 10.438413361169102
Final response:

Usain Bolt's average velocity was 10.44 m/s.

But no luck with the birthday helper example.

toolio_request --apibase="http://127.0.0.1:8000" --trace \
--tool=toolio.tool.demo.birthday_lookup \
--tool=toolio.tool.demo.today_kfabe \
--sysprompt='You are a writer who reasons step by step and uses research tools in the correct order before writing' \
--prompt='Write a nice note for each employee who has a birthday today.'

Got a malformed tool call response

{'choices': [{'index': 0, 'message': {'role': 'assistant', 'tool_calls': []}, 'finish_reason': 'tool_calls'}], 'usage': {'completion_tokens': 1, 'prompt_tokens': 332, 'total_tokens': 333}, 'object': 'chat.completion', 'id': 'chatcmpl-5902768144_1719778333', 'created': 1719778333, 'model': 'mlx-community/dolphin-2.9.2-Phi-3-Medium-4bit'}

Gemma-2 9b

Tried: mlx-community/gemma-2-9b-8bit

toolio_request --apibase="http://127.0.0.1:8000" --prompt='If I have three mangos and Obi has twice as many bananas, how many fruit items do we have altogether?' --tool=toolio.tool.math.calculator

I get the warning:

No chat template is set for this tokenizer, falling back to a default class-level template. This is very error-prone, because models are often trained with templates different from the class default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which point any code depending on them will stop working. We recommend setting a valid chat template before then to ensure that this model continues working without issues.

It gets really confused & returns from trip 1, generating a tool-call to calculator with the silly expression 'string' (as if it thinks the param type is the value):

RESPONSE {'choices': [{'index': 0, 'message': {'role': 'assistant', 'tool_calls': [{'id': 'call_23014958736_1719690754_0', 'type': 'function', 'function': {'name': 'calculator', 'arguments': '{"expr": "string"}'}}]}, 'finish_reason': 'tool_calls'}], 'usage': {'completion_tokens': 19, 'prompt_tokens': 237, 'total_tokens': 256}, 'object': 'chat.completion', 'id': 'chatcmpl-23014958736_1719690754', 'created': 1719690754, 'model': 'mlx-community/gemma-2-9b-8bit'}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Notes on how MLX models handle tool‐calling

Hermes 2 Theta Llama 3 8B

Firefunction

dolphin-2.9.2 Phi 3 Medium

Gemma-2 9b

Clone this wiki locally