We welcome your contributions to the Leaderboard! This guide provides step-by-step instructions for adding a new model to the leaderboard.
The repository is organized as follows:
berkeley-function-call-leaderboard/
├── bfcl/
│ ├── eval_checker/ # Evaluation modules
│ │ ├── ast_eval/ # AST-based evaluation
│ │ ├── executable_eval/ # Evaluation by execution
│ │ ├── multi_turn_eval/ # Multi-turn evaluation
│ ├── model_handler/ # All model-specific handlers
│ │ ├── local_inference/ # Handlers for locally-hosted models
│ │ │ ├── base_oss_handler.py # Base handler for OSS models
│ │ │ ├── llama_fc.py # Example: LLaMA (FC mode)
│ │ │ ├── deepseek_coder.py # Example: DeepSeek Coder
│ │ │ ├── ...
│ │ ├── api_inference/ # Handlers for API-based models
│ │ │ ├── openai.py # Example: OpenAI models
│ │ │ ├── claude.py # Example: Claude models
│ │ │ ├── ...
│ │ ├── parser/ # Parsing utilities for Java/JavaScript
│ │ ├── base_handler.py # Base handler blueprint
│ │ ├── handler_map.py # Maps model names to handler classes
├── data/ # Datasets
├── result/ # Model responses
├── score/ # Evaluation results
├── utils/ # Helper scripts
To add a new model, focus primarily on the model_handler
directory. You do not need to modify the parsing utilities in model_handler/parser
or any other directories.
- Base Handler: Start by reviewing
bfcl/model_handler/base_handler.py
. All model handlers inherit from this base class. Theinference_single_turn
andinference_multi_turn
methods defined there are helpful for understanding the model response generation pipeline. Thebase_handler.py
contains many useful details in the docstrings of each abstract method, so be sure to review them.- If your model is hosted locally, you should also look at
bfcl/model_handler/local_inference/base_oss_handler.py
.
- If your model is hosted locally, you should also look at
- Reference Handlers: Checkout some of the existing model handlers (such as
openai.py
,claude.py
, etc); you can likely reuse some of the existing code if your new model outputs in a similar format.- If your model is OpenAI-compatible, the
openai.py
handler will be helpful (and you might be able to just use it as is). - If your model is locally hosted, the
llama_fc.py
handler or thedeepseek_coder.py
handler can be good starting points.
- If your model is OpenAI-compatible, the
We support models in two modes:
-
Function Calling (FC) Mode:
Models with native tool/function calling capabilities. For example, OpenAI GPT in FC mode uses thetools
section as documented in the OpenAI function calling guide. -
Prompting Mode:
Models without native function calling capabilities rely on traditional prompt-based interactions, and we supply the function definitions in thesystem prompt
section as opposed to a dedicatedtools
section. Prompt mode also serve as an alternative approach for models that support FC mode but do not fully leverage its function calling ability (i.e., we only use its normal text generation capability).
For API-based models (such as OpenAI GPT), both FC and Prompting modes can be defined in the same handler. Methods related to FC mode end with _FC
, while Prompting mode methods end with _prompting
.
For locally-hosted models, we only implement prompting methods to maintain code readablity. If a locally-hosted model has both FC and Prompting modes, you will typically create two separate handlers (e.g., llama_fc.py
for FC mode and llama.py
for Prompting mode).
For API-based Models:
- Implement all the methods marked as "not implemented" under the
FC Methods
orPrompting Methods
sections inbase_handler.py
, depending on which mode(s) your model supports.
For Locally-Hosted Models:
- Implement the
_format_prompt
method in your handler. - Other methods from the
Prompting Methods
section inbase_oss_handler.py
are already implemented, but you may override them if necessary.
Common Requirements for All Handlers:
Regardless of mode or model type, you should implement the following methods to convert raw model response (output of _parse_query_response_xxx
) into standard formats expected by the evaluation pipeline:
-
decode_ast
Converts the raw model response into a structured list of dictionaries, with each dictionary representing a function call:[{"func1": {"param1": "val1", "param2": "val2"}}, {"func2": {"param1": "val1"}}]
This helps the evaluation pipeline understand the model’s intended function calls.
-
decode_execute
Converts the raw model response into a list of strings representing callable functions:["func1(param1=val1, param2=val2)", "func2(param1=val1)"]
-
Update
model_handler/handler_map.py
:
Add your new model’s handler class and associate it with the model’s name. -
Update
model_handler/model_metadata.py
:
Inbfcl/eval_checker/model_metadata.py
, add entries inMODEL_METADATA_MAPPING
to include:- Model display name (as shown in the leaderboard)
- URL to the model’s documentation or homepage
- License details
- Company name
If your model is API-based and has usage costs, update the following accordingly:
INPUT_PRICE_PER_MILLION_TOKEN
OUTPUT_PRICE_PER_MILLION_TOKEN
If the model is API-based but free, add it to the
NO_COST_MODELS
list. -
Update
model_handler/constant.py
:
If your model is a Function Calling model and it does not support.
in the function name (such as GPT in FC mode), add the model name to theUNDERSCORE_TO_DOT
list. -
Update
SUPPORTED_MODELS.md
:
Add your model to the list of supported models. Include the model name and type (FC or Prompt) in the table.
- Raise a Pull Request with your new Model Handler and the necessary updates to the metadata.
- Ensure that the model you add is publicly accessible, either open-source or behind a publicly available API. While you may require authentication, billing, registration, or tokens, the general public should ultimately be able to access the endpoint.
- If your model is not publicly accessible, we would still welcome your contribution, but we unfortunately cannot include it in the public-facing leaderboard.
- Have questions or need help? Join the Gorilla Discord and visit the
#leaderboard
channel. - Feel free to reach out if you have any questions, concerns, or would like guidance while adding your new model. We’re happy to assist!
Thank you for contributing to the Berkeley Function Calling Leaderboard! We look forward to seeing your model added to the community.