Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#432] Add Groq Provider - tool calls #630

Merged
merged 2 commits into from
Jan 14, 2025

Conversation

aidando73
Copy link
Contributor

@aidando73 aidando73 commented Dec 14, 2024

What does this PR do?

Contributes to issue #432

  • Adds tool calls to Groq provider
  • Enables tool call integration tests

PR Train

Test Plan

Environment:

export GROQ_API_KEY=<api-key>

# build.yaml and run.yaml files
wget https://raw.githubusercontent.com/aidando73/llama-stack/9165502582cd7cb178bc1dcf89955b45768ab6c1/build.yaml
wget https://raw.githubusercontent.com/aidando73/llama-stack/9165502582cd7cb178bc1dcf89955b45768ab6c1/run.yaml

# Create environment if not already
conda create --prefix ./envs python=3.10
conda activate ./envs

# Build
pip install -e . && llama stack build --config ./build.yaml --image-type conda

# Activate built environment
conda activate llamastack-groq
Unit tests
# Setup
conda activate llamastack-groq
pytest llama_stack/providers/tests/inference/groq/test_groq_utils.py -vv -k groq -s

# Result
llama_stack/providers/tests/inference/groq/test_groq_utils.py .....................

======================================== 21 passed, 1 warning in 0.05s ========================================
Integration tests
# Run
conda activate llamastack-groq
pytest llama_stack/providers/tests/inference/test_text_inference.py -k groq -s

# Result
llama_stack/providers/tests/inference/test_text_inference.py .sss.s.ss.sss.s...

========================== 8 passed, 10 skipped, 180 deselected, 7 warnings in 2.73s ==========================
Manual
llama stack run ./run.yaml --port 5001

Via this Jupyter notebook: https://github.com/aidando73/llama-stack/blob/9165502582cd7cb178bc1dcf89955b45768ab6c1/hello.ipynb

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Ran pre-commit to handle lint / formatting issues.
  • Read the contributor guideline,
    Pull Request section?
  • Updated relevant documentation. (no relevant documentation it seems)
  • Wrote necessary unit or integration tests.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 14, 2024
README.md Outdated Show resolved Hide resolved
@aidando73 aidando73 changed the title [#432] Add tool calls to groq inference adapter [#432] Add Groq Provider - tool calls Dec 15, 2024
except groq.BadRequestError as e:
if e.body.get("error", {}).get("code") == "tool_use_failed":
# For smaller models, Groq may fail to call a tool even when the request is well formed
raise ValueError("Groq failed to call a tool", e.body.get("error", {}))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find a better Error class for this. RequestValidationError could work [1] - but that's a fast api error. Not sure if it's a good idea to have fast api errors in adapter code.

So just going to use ValueError for now. Maybe we could use a set of Llama-stack specific error classes? E.g. llama_stack.BadRequestError, llama_stack.ServiceUnavailableError.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aidando73 Does groq support the raw completions API? If it does that, I'd rather always use that instead of the chat completions API. Llama stack can then format the tool formats exactly as needed and re-parse the tool calls from groq.

Copy link
Contributor Author

@aidando73 aidando73 Dec 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashwinb no unfortunately not - they only support the chat completions API. These are all their endpoints:

Screen.Recording.2024-12-21.at.19.30.55.mov

Here's their client sdk: https://github.com/groq/groq-python/blob/main/api.md#chat

@ricklamers do you know if there's any plans to support the completions API?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No current plans for completions API afaict.

elif choice.delta.tool_calls:
# We assume there is only one tool call per chunk, but emit a warning in case we're wrong
if len(choice.delta.tool_calls) > 1:
warnings.warn("Groq returned multiple tool calls in one chunk. Using the first one, ignoring the rest.")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No explicit documentation by Groq on this, but based on my testing this seems to be the case.

For multiple tool calls, each one is a separate chunk:

Choice(delta=ChoiceDelta(content=None, function_call=None, role=None, tool_calls=[ChoiceDeltaToolCall(index=0, id='call_gjk5', function=ChoiceDeltaToolCallFunction(arguments='{"message": 10}', name='calculate_log'), type='function')]), finish_reason=None, index=0, logprobs=None)
Choice(delta=ChoiceDelta(content=None, function_call=None, role=None, tool_calls=[ChoiceDeltaToolCall(index=1, id='call_qw70', function=ChoiceDeltaToolCallFunction(arguments='{"a": 3.52, "b": 4.89}', name='add'), type='function')]), finish_reason=None, index=0, logprobs=None)
Choice(delta=ChoiceDelta(content=None, function_call=None, role=None, tool_calls=[ChoiceDeltaToolCall(index=2, id='call_f64g', function=ChoiceDeltaToolCallFunction(arguments='{"a": 3.52, "b": 4.89}', name='multiply'), type='function')]), finish_reason=None, index=0, logprobs=None)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment we don't include more than 1 tool call in a completion chunk, but the format (https://platform.openai.com/docs/api-reference/chat/streaming) does support it and we might include multiple tool calls in a single chunk in the future (with different indexes).

if len(choice.delta.tool_calls) > 1:
warnings.warn("Groq returned multiple tool calls in one chunk. Using the first one, ignoring the rest.")

# We assume Groq produces fully formed tool calls for each chunk
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No explicit documentation by Groq on this, but based on my testing this seems to be the case.

E.g., I made it run a tool call with a 10k character document and Grok puts the whole tool call into one chunk:

ChatCompletionResponseStreamChunk(event=ChatCompletionResponseStreamChunkEvent(delta=ChatCompletionResponseStreamChunkEventDeltaToolCallDelta(content=ToolCall(arguments={'text': 'Donec eu lorem eget quam accumsan iaculis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Aenean felis tortor, tincidunt eu purus at, lacinia mollis mi. Aliquam lacinia molestie augue ac vestibulum. Duis ante lacus, vulputate a sollicitudin in, consectetur fermentum augue. Maecenas eget risus a dolor mattis feugiat. Integer accumsan tempor elit vel imperdiet. Donec et dignissim velit. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Mauris dictum, nibh id varius accumsan, massa risus aliquam diam, at pulvinar risus risus sit amet erat. In nec lorem metus. Nunc molestie mollis enim, vitae volutpat elit blandit non.Donec eu lorem eget quam accumsan iaculis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Aenean felis tortor, tincidunt eu purus at, lacinia mollis mi. Aliquam lacinia molestie augue ac vestibulum. Duis ante lacus, vulputate a sollicitudin in, consectetur fermentum augue. Maecenas eget risus a dolor mattis feugiat. Integer accumsan tempor elit vel imperdiet. Donec et dignissim velit. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Mauris dictum, nibh id varius accumsan, massa risus aliquam diam, at pulvinar risus risus sit amet erat. In nec lorem metus. Nunc molestie mollis enim, vitae volutpat elit blandit non.Donec eu lorem eget quam accumsan iaculis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Aenean felis tortor, tincidunt eu purus at, lacinia mollis mi. Aliquam lacinia molestie augue ac vestibulum. Duis ante lacus, vulputate a sollicitudin in, consectetur fermentum augue. Maecenas eget risus a dolor mattis feugiat. Integer accumsan tempor elit vel imperdiet. Donec et dignissim velit. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Mauris dictum, nibh id varius accumsan, massa risus aliquam diam, at pulvinar risus risus sit amet erat. In nec lorem metus. Nunc molestie mollis enim, vitae volutpat elit blandit non.Donec eu lorem eget quam accumsan iaculis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Aenean felis tortor, tincidunt eu purus at, lacinia mollis mi. Aliquam lacinia molestie augue ac vestibulum. Duis ante lacus, vulputate a sollicitudin in, consectetur fermentum augue. Maecenas eget risus a dolor mattis feugiat. Integer accumsan tempor elit vel imperdiet. Donec et dignissim velit. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Mauris dictum, nibh id varius accumsan, massa risus aliquam diam, at pulvinar risus risus sit amet erat. In nec lorem metus. Nunc molestie mollis enim, vitae volutpat elit blandit non.Donec eu lorem eget quam accumsan iaculis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Aenean felis tortor, tincidunt eu purus at, lacinia mollis mi. Aliquam lacinia molestie augue ac vestibulum. Duis ante lacus, vulputate a sollicitudin in, consectetur fermentum augue. Maecenas eget risus a dolor mattis feugiat. Integer accumsan tempor elit vel imperdiet. Donec et dignissim velit. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Mauris dictum, nibh id varius accumsan, massa risus aliquam diam, at pulvinar risus risus sit amet erat. In nec lorem metus. Nunc molestie mollis enim, vitae volutpat elit blandit non.Donec eu lorem eget quam accumsan iaculis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Aenean felis tortor, tincidunt eu purus at, lacinia mollis mi. Aliquam lacinia molestie augue ac vestibulum. Duis ante lacus, vulputate a sollicitudin in, consectetur fermentum augue. Maecenas eget risus a dolor mattis feugiat. Integer accumsan tempor elit vel imperdiet. Donec et dignissim velit. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Mauris dictum, nibh id varius accumsan, massa risus aliquam diam, at pulvinar risus risus sit amet erat. In nec lorem metus. Nunc molestie mollis enim, vitae volutpat elit blandit non.Donec eu lorem eget quam accumsan iaculis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Aenean felis tortor, tincidunt eu purus at, lacinia mollis mi. Aliquam lacinia molestie augue ac vestibulum. Duis ante lacus, vulputate a sollicitudin in, consectetur fermentum augue. Maecenas eget risus a dolor mattis feugiat. Integer accumsan tempor elit vel imperdiet. Donec et dignissim velit. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Mauris dictum, nibh id varius accumsan, massa risus aliquam diam, at pulvinar risus risus sit amet erat. In nec lorem metus. Nunc molestie mollis enim, vitae volutpat elit blandit non.Donec eu lorem eget quam accumsan iaculis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Aenean felis tortor, tincidunt eu purus at, lacinia mollis mi. Aliquam lacinia molestie augue ac vestibulum. Duis ante lacus, vulputate a sollicitudin in, consectetur fermentum augue. Maecenas eget risus a dolor mattis feugiat. Integer accumsan tempor elit vel imperdiet. Donec et dignissim velit. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Mauris dictum, nibh id varius accumsan, massa risus aliquam diam, at pulvinar risus risus sit amet erat. In nec lorem metus. Nunc molestie mollis enim, vitae volutpat elit blandit non.Donec eu lorem eget quam accumsan iaculis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Aenean felis tortor, tincidunt eu purus at, lacinia mollis mi. Aliquam lacinia molestie augue ac vestibulum. Duis ante lacus, vulputate a sollicitudin in, consectetur fermentum augue. Maecenas eget risus a dolor mattis feugiat. Integer accumsan tempor elit vel imperdiet. Donec et dignissim velit. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Mauris dictum, nibh id varius accumsan, massa risus aliquam diam, at pulvinar risus risus sit amet erat. In nec lorem metus. Nunc molestie mollis enim, vitae volutpat elit blandit non.'}, call_id='call_676h', tool_name='count_characters'), parse_status='in_progress'), event_type='progress', logprobs=None, stop_reason=None))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now each chunk is a fully formed tool call but this might change to become incremental in the future like other OpenAI-compatible tool chunk streaming implementations.

# Note that Groq may return a string that is not valid JSON here
# So this may raise a 500 error. Going to leave this as is to see
# how big of an issue this is and what we can do about it.
arguments=json.loads(tool_call.function.arguments),
Copy link
Contributor Author

@aidando73 aidando73 Dec 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lmk if you want me to handle invalid JSON here. If so, lmk which approach you'd prefer:

  • Return an empty arguments dict
  • Stringify the entire tool call as a string

After that, I'm assuming we'd want to set:

parse_status=ToolCallParseStatus.failure,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aidando73 yes we should definitely handle invalid JSON and set parse_status appropriately. could you do this as a follow-up PR? I am going to merge this one now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure - give me one sec

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashwinb addressed here: #811

and "Llama-3.2" in inference_model
):
# TODO(aidand): Remove this skip once Groq's tool calling for Llama3.2 works better
pytest.skip("Groq's tool calling for Llama3.2 doesn't work very well")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For 3.2-3B, Groq doesn't parse the tool call properly:

CompletionMessage(role='assistant', content='<function=get_weather>{"location": "San Francisco, CA"}', stop_reason=<StopReason.end_of_turn: 'end_of_turn'>, tool_calls=[])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3.2-3B doesn't always handle the format correctly. In this case it outputted a stop token instead of completing the response with </function>.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aidando73 let's not have this conditional. I'd rather have the tests fail so we can mark it as such.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in #811

and "Llama-3.2" in inference_model
):
# TODO(aidand): Remove this skip once Groq's tool calling for Llama3.2 works better
pytest.skip("Groq's tool calling for Llama3.2 doesn't work very well")
Copy link
Contributor Author

@aidando73 aidando73 Dec 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For 3.2-3B, Groq returns an error:

ChatCompletionResponseStreamChunk(event=ChatCompletionResponseEvent(event_type=<ChatCompletionResponseEventType.complete: 'complete'>, delta='Tool use failed: JSON does not match the expected schema for tool calls', logprobs=None, stop_reason=<StopReason.end_of_turn: 'end_of_turn'>))

Copy link

@ricklamers ricklamers Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We validate JSON Schema validity for each generated tool call object. This could be caused by a tool schema that is too complicated or ambiguous for Llama 3.2-3B.

stop_reason=stop_reason,
)
def _convert_groq_tool_call(tool_call: ChatCompletionMessageToolCall) -> ToolCall:
return ToolCall(
Copy link
Contributor Author

@aidando73 aidando73 Dec 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note unrelated to this PR: ToolCall fails for more complex nested types:

E.g.,

# For a param definition:
"passengers": {
    "param_type": "array",
    "description": "The passengers",
},

# Groq wants to return something like:
res = [
  {'name': 'John', 'age': 35, 'class': 'Economy'},
  {'name': 'Jane', 'age': 32, 'class': 'Economy'},
  {'name': 'Tim', 'age': 5, 'class': 'Economy'}
]

# But we run into error:
pydantic_core._pydantic_core.ValidationError: 17 validation errors for ToolCall
arguments.passengers.str
  Input should be a valid string [type=string_type, input_value=[{'name': 'John', 'age': ... 5, 'class': 'Economy'}], input_type=list]
    For further information visit https://errors.pydantic.dev/2.10/v/string_type
arguments.passengers.int
  Input should be a valid integer [type=int_type, input_value=[{'name': 'John', 'age': ... 5, 'class': 'Economy'}], input_type=list]
    For further information visit https://errors.pydantic.dev/2.10/v/int_type
arguments.passengers.float
  Input should be a valid number [type=float_type, input_value=[{'name': 'John', 'age': ... 5, 'class': 'Economy'}], input_type=list]
    For further information visit https://errors.pydantic.dev/2.10/v/float_type
arguments.passengers.bool
  Input should be a valid boolean [type=bool_type, input_value=[{'name': 'John', 'age': ... 5, 'class': 'Economy'}], input_type=list]
    For further information visit https://errors.pydantic.dev/2.10/v/bool_type
arguments.passengers.list[nullable[union[str,int,float,bool]]].0.str
  Input should be a valid string [type=string_type, input_value={'name': 'John', 'age': 35, 'class': 'Economy'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/string_type
arguments.passengers.list[nullable[union[str,int,float,bool]]].0.int
  Input should be a valid integer [type=int_type, input_value={'name': 'John', 'age': 35, 'class': 'Economy'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/int_type
arguments.passengers.list[nullable[union[str,int,float,bool]]].0.float
  Input should be a valid number [type=float_type, input_value={'name': 'John', 'age': 35, 'class': 'Economy'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/float_type
arguments.passengers.list[nullable[union[str,int,float,bool]]].0.bool
  Input should be a valid boolean [type=bool_type, input_value={'name': 'John', 'age': 35, 'class': 'Economy'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/bool_type
...
Code
# Tool call with object and array parameters
response = client.inference.chat_completion(
    model_id="Llama3.1-8B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant helping users book flights and hotels."},
        {"role": "user", "content": """
         When's the next flight from Adelaide to Sydney? (I only want direct flights and the flight should have wifi and a meal.)
         The flight should fit 2 adults and 1 child. Economy class.
         Also find a hotel in the Sydney area for 3 nights starting on the 1st of January 2024. (Should be smoking friendly.)
         """},
    ],
    # stream=True,
    tools=[
        {
            "tool_name": "get_flight_info",
            "description": "Get the flight information for a given origin and destination",
            "parameters": {
                "origin": {
                    "param_type": "string",
                    "description": "The origin airport code. E.g., AU",
                    "required": True,
                },
                "destination": {
                    "param_type": "string",
                    "description": "The destination airport code. E.g., 'LAX'",
                    "required": True,
                },
                "passengers": {
                    "param_type": "array",
                    "description": "The passengers",
                },
            }
        },
        {
            "tool_name": "get_hotel_info",
            "description": "Get the hotel information for a given destination",
            "parameters": {
                "address": {
                    "param_type": "object",
                    "description": "The address of the hotel. E.g., {'street_address': '123 Main St', 'city': 'Sydney', 'state': 'NSW', 'post_code': '2000'}",
                    "properties": {
                        "street_address": {
                            "param_type": "string",
                            "description": "The street address of the hotel. E.g., '123 Main St'",
                            "required": True,
                        },
                        "city": {
                            "param_type": "string",
                            "description": "The city of the hotel. E.g., 'Sydney'",
                            "required": True,
                        },
                        "state": {
                            "param_type": "string",
                            "description": "The state of the hotel. E.g., 'NSW'",
                            "required": True,
                        },
                        "post_code": {
                            "param_type": "string",
                            "description": "The post code of the hotel. E.g., '2000'",
                            "required": True,
                        },
                    },
                    "required": True,
                },
                "num_nights": {
                    "param_type": "integer",
                    "description": "The number of nights to stay. E.g., 3",
                    "required": True,
                },
                "date_from": {
                    "param_type": "string",
                    "description": "The date to start the stay formatted as YYYY-MM-DD. E.g., '2024-01-01'",
                    "required": True,
                },
                "smoking_friendly": {
                    "param_type": "boolean",
                    "description": "Whether the hotel is smoking friendly. E.g., True",
                    "required": False,
                },
            }
        },
    ]
)

print(response)

for tool_call in response.completion_message.tool_calls:
    print(tool_call)

Going to ignore for now. Would like to see whether our users actually want this before implementing.

README.md Outdated Show resolved Hide resolved
@aidando73
Copy link
Contributor Author

I've rebased the PR train

ashwinb pushed a commit that referenced this pull request Jan 3, 2025
# What does this PR do?

Contributes towards issue (#432)

- Groq text chat completions
- Streaming
- All the sampling params that Groq supports

A lot of inspiration taken from @mattf's good work at
#355

**What this PR does not do**

- Tool calls (Future PR)
- Adding llama-guard model
- See if we can add embeddings

### PR Train

- #609 👈 
- #630


## Test Plan

<details>

<summary>Environment</summary>

```bash
export GROQ_API_KEY=<api_key>

wget https://raw.githubusercontent.com/aidando73/llama-stack/240e6e2a9c20450ffdcfbabd800a6c0291f19288/build.yaml
wget https://raw.githubusercontent.com/aidando73/llama-stack/92c9b5297f9eda6a6e901e1adbd894e169dbb278/run.yaml

# Build and run environment
pip install -e . \
&& llama stack build --config ./build.yaml --image-type conda \
&& llama stack run ./run.yaml \
  --port 5001
```

</details>

<details>

<summary>Manual tests</summary>

Using this jupyter notebook to test manually:
https://github.com/aidando73/llama-stack/blob/2140976d76ee7ef46025c862b26ee87585381d2a/hello.ipynb

Use this code to test passing in the api key from provider_data

```
from llama_stack_client import LlamaStackClient

client = LlamaStackClient(
    base_url="http://localhost:5001",
)

response = client.inference.chat_completion(
    model_id="Llama3.2-3B-Instruct",
    messages=[
        {"role": "user", "content": "Hello, world client!"},
    ],
    # Test passing in groq_api_key from the client
    # Need to comment out the groq_api_key in the run.yaml file
    x_llama_stack_provider_data='{"groq_api_key": "<api-key>"}',
    # stream=True,
)
response
```

</details>

<details>
<summary>Integration</summary>

`pytest llama_stack/providers/tests/inference/test_text_inference.py -v
-k groq`

(run in same environment)

```
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_model_list[llama_3b-groq] PASSED                 [  6%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion[llama_3b-groq] SKIPPED (Other inf...) [ 12%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_structured_output[llama_3b-groq] SKIPPED [ 18%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming[llama_3b-groq] PASSED [ 25%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_structured_output[llama_3b-groq] SKIPPED (Ot...) [ 31%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming[llama_3b-groq] PASSED  [ 37%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_with_tool_calling[llama_3b-groq] SKIPPED [ 43%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_with_tool_calling_streaming[llama_3b-groq] SKIPPED [ 50%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_model_list[llama_8b-groq] PASSED                 [ 56%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion[llama_8b-groq] SKIPPED (Other inf...) [ 62%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_structured_output[llama_8b-groq] SKIPPED [ 68%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming[llama_8b-groq] PASSED [ 75%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_structured_output[llama_8b-groq] SKIPPED (Ot...) [ 81%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming[llama_8b-groq] PASSED  [ 87%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_with_tool_calling[llama_8b-groq] SKIPPED [ 93%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_with_tool_calling_streaming[llama_8b-groq] SKIPPED [100%]

======================================= 6 passed, 10 skipped, 160 deselected, 7 warnings in 2.05s ========================================
```
</details>

<details>
<summary>Unit tests</summary>

`pytest llama_stack/providers/tests/inference/groq/ -v`

```
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_sets_model PASSED            [  5%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_converts_user_message PASSED [ 10%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_converts_system_message PASSED [ 15%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_converts_completion_message PASSED [ 20%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_does_not_include_logprobs PASSED [ 25%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_does_not_include_response_format PASSED [ 30%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_does_not_include_repetition_penalty PASSED [ 35%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_includes_stream PASSED       [ 40%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_n_is_1 PASSED                [ 45%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_if_max_tokens_is_0_then_it_is_not_included PASSED [ 50%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_includes_max_tokens_if_set PASSED [ 55%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_includes_temperature PASSED  [ 60%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_includes_top_p PASSED        [ 65%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertNonStreamChatCompletionResponse::test_returns_response PASSED [ 70%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertNonStreamChatCompletionResponse::test_maps_stop_to_end_of_message PASSED [ 75%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertNonStreamChatCompletionResponse::test_maps_length_to_end_of_message PASSED [ 80%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertStreamChatCompletionResponse::test_returns_stream PASSED [ 85%]
llama_stack/providers/tests/inference/groq/test_init.py::TestGroqInit::test_raises_runtime_error_if_config_is_not_groq_config PASSED [ 90%]
llama_stack/providers/tests/inference/groq/test_init.py::TestGroqInit::test_returns_groq_adapter PASSED                            [ 95%]
llama_stack/providers/tests/inference/groq/test_init.py::TestGroqConfig::test_api_key_defaults_to_env_var PASSED                   [100%]

==================================================== 20 passed, 11 warnings in 0.08s =====================================================
```

</details>

## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [x] Updated relevant documentation
- [x] Wrote necessary unit or integration tests.
@@ -140,7 +150,7 @@ async def embeddings(

def _get_client(self) -> Groq:
if self._config.api_key is not None:
return Groq(api_key=self.config.api_key)
return Groq(api_key=self._config.api_key)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also contains fix from: #719

@aidando73
Copy link
Contributor Author

aidando73 commented Jan 3, 2025

Rebased and re-tested this one. Still works.

@cheesecake100201
Copy link
Contributor

@ashwinb Could you take a look at this PR and merge it?

@raghotham raghotham added this to the v0.1 milestone Jan 10, 2025
@ashwinb ashwinb merged commit fdcc74f into meta-llama:main Jan 14, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants