Skip to content

Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)

License

Notifications You must be signed in to change notification settings

friendliai/gorilla

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Gorilla: Large Language Model Connected with Massive APIs

Arxiv Discord Gorilla Website Gorilla Blog Hugging Face

Latest Updates

πŸ“’ Check out our detailed Berkeley Function Calling Leaderboard changelog (Last updated: Last Updated) for the latest dataset / model updates to the Berkeley Function Calling Leaderboard!

  • 🎯 [10/04/2024] Introducing the Agent Arena by Gorilla X LMSYS Chatbot Arena! Compare different agents in tasks like search, finance, RAG, and beyond. Explore which models and tools work best for specific tasks through our novel ranking system and community-driven prompt hub. [Blog] [Arena] [Leaderboard] [Dataset] [Tweet]

  • πŸ“£ [09/21/2024] Announcing BFCL V3 - Evaluating multi-turn and multi-step function calling capabilities! New state-based evaluation system tests models on handling complex workflows, sequential functions, and service states. [Blog] [Leaderboard] [Code] [Tweet]

  • πŸš€ [08/20/2024] Released BFCL V2 β€’ Live! The Berkeley Function-Calling Leaderboard now features enterprise-contributed data and real-world scenarios. [Blog] [Live Leaderboard] [V2 Categories Leaderboard] [Tweet]

  • ⚑️ [04/12/2024] Excited to release GoEx - a runtime for LLM-generated actions like code, API calls, and more. Featuring "post-facto validation" for assessing LLM actions after execution, "undo" and "damage confinement" abstractions to manage unintended actions & risks. This paves the way for fully autonomous LLM agents, enhancing interaction between apps & services with human-out-of-loop. [Blog] [Code] [Paper] [Tweet]

  • ⏰ [04/01/2024] Introducing cost and latency metrics into Berkeley function calling leaderboard!

  • πŸš€ [03/15/2024] RAFT: Adapting Language Model to Domain Specific RAG is live! [MSFT-Meta blog] [Berkeley Blog]

  • πŸ† [02/26/2024] Berkeley Function Calling Leaderboard is live!

  • 🎯 [02/25/2024] OpenFunctions v2 sets new SoTA for open-source LLMs!

  • πŸ”₯ [11/16/2023] Excited to release Gorilla OpenFunctions

  • πŸ’» [06/29/2023] Released gorilla-cli, LLMs for your CLI!

  • 🟒 [06/06/2023] Released Commercially usable, Apache 2.0 licensed Gorilla models

  • πŸš€ [05/30/2023] Provided the CLI interface to chat with Gorilla!

  • πŸš€ [05/28/2023] Released Torch Hub and TensorFlow Hub Models!

  • πŸš€ [05/27/2023] Released the first Gorilla model! Colab or πŸ€—!

  • πŸ”₯ [05/27/2023] We released the APIZoo contribution guide for community API contributions!

  • πŸ”₯ [05/25/2023] We release the APIBench dataset and the evaluation code of Gorilla!

About

Gorilla enables LLMs to use tools by invoking APIs. Given a natural language query, Gorilla comes up with the semantically- and syntactically- correct API to invoke.

With Gorilla, we are the first to demonstrate how to use LLMs to invoke 1,600+ (and growing) API calls accurately while reducing hallucination. This repository contains inference code for running Gorilla finetuned models, evaluation code for reproducing results from our paper, and APIBench - the largest collection of APIs, curated and easy to be trained on!

Since our initial release, we've served ~500k requests and witnessed incredible adoption by developers worldwide. The project has expanded to include tools, evaluations, leaderboard, end-to-end finetuning recipes, infrastructure components, and the Gorilla API Store:

Project Type Description (click to expand)
Gorilla Paper πŸ€– Model
πŸ“ Fine-tuning
πŸ“š Dataset
πŸ“Š Evaluation
πŸ”§ Infra
Large Language Model Connected with Massive APIsβ€’ Novel finetuning approach for API invocation
β€’ Evaluation on 1,600+ APIs (APIBench)
β€’ Retrieval-augmented training for test-time adaptation
Gorilla OpenFunctions-V2 πŸ€– Model
Drop-in alternative for function calling, supporting multiple complex data types and parallel executionβ€’ Multiple & parallel function execution with OpenAI-compatible endpoints
β€’ Native support for Python, Java, JavaScript, and REST APIs with expanded data types
β€’ Function relevance detection to reduce hallucinations
β€’ Enhanced RESTful API formatting capabilities
β€’ State-of-the-art performance among open-source models
Berkeley Function Calling Leaderboard (BFCL) πŸ“Š Evaluation
πŸ† Leaderboard
πŸ”§ Function Calling Infra
πŸ“š Dataset
Comprehensive evaluation of function-calling capabilitiesβ€’ V1: Expert-curated dataset for evaluating single-turn function calling
β€’ V2: Enterprise-contributed data for real-world scenarios
β€’ V3: Multi-turn & multi-step function calling evaluation
β€’ Cost and latency metrics for all models
β€’ Interactive API explorer for testing
β€’ Community-driven benchmarking platform
Agent Arena πŸ“Š Evaluation
πŸ† Leaderboard
Compare LLM agents across models, tools, and frameworksβ€’ Head-to-head agent comparisons with ELO rating system
β€’ Framework compatibility testing (LangChain, AutoGPT)
β€’ Community-driven evaluation platform
β€’ Real-world task performance metrics
Gorilla Execution Engine (GoEx) πŸ”§ Infra
Runtime for executing LLM-generated actions with safety guaranteesβ€’ Post-facto validation for verifying LLM actions after execution
β€’ Undo capabilities and damage confinement for risk mitigation
β€’ OAuth2 and API key authentication for multiple services
β€’ Support for RESTful APIs, databases, and filesystem operations
β€’ Docker-based sandboxed execution environment
Retrieval-Augmented Fine-tuning (RAFT) πŸ“ Fine-tuning
πŸ€– Model
Fine-tuning LLMs for robust domain-specific retrievalβ€’ Novel fine-tuning recipe for domain-specific RAG
β€’ Chain-of-thought answers with direct document quotes
β€’ Training with oracle and distractor documents
β€’ Improved performance on PubMed, HotpotQA, and Gorilla benchmarks
β€’ Efficient adaptation of smaller models for domain QA
Gorilla CLI πŸ€– Model
πŸ”§ Local CLI Infra
LLMs for your command-line interfaceβ€’ User-friendly CLI tool supporting ~1500 APIs (Kubernetes, AWS, GCP, etc.)
β€’ Natural language command generation with multi-LLM fusion
β€’ Privacy-focused with explicit execution approval
β€’ Command history and interactive selection interface
Gorilla API Zoo πŸ“š Dataset
A community-maintained repository of up-to-date API documentationβ€’ Centralized, searchable index of APIs across domains
β€’ Structured documentation format with arguments, versioning, and examples
β€’ Community-driven updates to keep pace with API changes
β€’ Rich data source for model training and fine-tuning
β€’ Enables retrieval-augmented training and inference
β€’ Reduces hallucination through up-to-date documentation

Getting Started

Quick Start

Try Gorilla in your browser:

Installation Options

  1. Gorilla CLI - Fastest way to get started
pip install gorilla-cli
gorilla generate 100 random characters into a file called test.txt

Learn more about Gorilla CLI β†’

  1. Run Gorilla Locally
git clone https://github.com/ShishirPatil/gorilla.git
cd gorilla/inference

Detailed local setup instructions β†’

  1. Use OpenFunctions
import openai

openai.api_key = "EMPTY"
openai.api_base = "http://luigi.millennium.berkeley.edu:8000/v1"

# Define your functions
functions = [{
    "name": "get_current_weather",
    "description": "Get weather in a location",
    "parameters": {
        "type": "object",
        "properties": {
            "location": {"type": "string"},
            "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
        },
        "required": ["location"]
    }
}]

# Make API call
completion = openai.ChatCompletion.create(
    model="gorilla-openfunctions-v2",
    messages=[{"role": "user", "content": "What's the weather in San Francisco?"}],
    functions=functions
)

OpenFunctions documentation β†’

πŸ”§ Other Quick Starts

Frequently Asked Questions

  1. I would like to use Gorilla commercially. Is there going to be an Apache 2.0 licensed version?

Yes! We now have models that you can use commercially without any obligations.

  1. Can we use Gorilla with other tools like Langchain etc?

Absolutely! You've highlighted a great aspect of our tools. Gorilla is an end-to-end model, specifically tailored to serve correct API calls (tools) without requiring any additional coding. It's designed to work as part of a wider ecosystem and can be flexibly integrated within agentic frameworks and other tools.

Langchain, is a versatile developer tool. Its "agents" can efficiently swap in any LLM, Gorilla included, making it a highly adaptable solution for various needs.

The beauty of these tools truly shines when they collaborate, complementing each other's strengths and capabilities to create an even more powerful and comprehensive solution. This is where your contribution can make a difference. We enthusiastically welcome any inputs to further refine and enhance these tools.

Check out our blog on How to Use Gorilla: A Step-by-Step Walkthrough to see all the different ways you can integrate Gorilla in your projects.

Project Roadmap

In the immediate future, we plan to release the following:

  • Multimodal function-calling leaderboard
  • Agentic function-calling leaderboard
  • New batch of user contributed live function calling evals.
  • BFCL metrics to evaluate contamination
  • Openfunctions-v3 model to support more languages and multi-turn capability
  • Agent Arena to compare LLM agents across models, tools, and frameworks [10/04/2024]
  • Multi-turn and multi-step function calling evaluation [09/21/2024]
  • User contributed Live Function Calling Leaderboard [08/20/2024]
  • BFCL systems metrics including cost and latency [04/01/2024]
  • Gorilla Execution Engine (GoEx) - Runtime for executing LLM-generated actions with safety guarantees [04/12/2024]
  • Berkeley Function Calling leaderboard (BFCL) for evaluating tool-calling/function-calling models [02/26/2024]
  • Openfunctions-v2 with more languages (Java, JS, Python), relevance detection [02/26/2024]
  • API Zoo Index for easy access to all APIs [02/16/2024]
  • Openfunctions-v1, Apache 2.0, with parallel and multiple function calling [11/16/2023]
  • Openfunctions-v0, Apache 2.0 function calling model [11/16/2023]
  • Release a commercially usable, Apache 2.0 licensed Gorilla model [06/05/2023]
  • Release weights for all APIs from APIBench [05/28/2023]
  • Run Gorilla LLM locally [05/28/2023]
  • Release weights for HF model APIs [05/27/2023]
  • Hosted Gorilla LLM chat for HF model APIs [05/27/2023]
  • Opening up the APIZoo for contributions from community
  • Dataset and Eval Code

License

Gorilla is Apache 2.0 licensed, making it suitable for both academic and commercial use.

Contact

Citation

@article{patil2023gorilla,
  title={Gorilla: Large Language Model Connected with Massive APIs},
  author={Shishir G. Patil and Tianjun Zhang and Xin Wang and Joseph E. Gonzalez},
  year={2023},
  journal={arXiv preprint arXiv:2305.15334},
} 

About

Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)

Topics

Resources

License

Stars

Watchers

Forks

Languages

  • Python 69.8%
  • Jupyter Notebook 20.7%
  • JavaScript 8.8%
  • CSS 0.3%
  • Rust 0.2%
  • Scheme 0.1%
  • Other 0.1%