Skip to content

chains-project/bacardi

Repository files navigation

Bacardi: Fixing Breaking Dependency updates with Large Language Models

Fix breaking dependency updates

πŸ”§ Tool Architecture Overview

Bacardi automates the repair of breaking dependency updates through a three-step pipeline:

  1. Extract Failed Build Information
  2. Use LLM to Generate Updated Code
  3. Validate Updated Code

The following diagram and module breakdown illustrate how the components interact across the system.


πŸ“Œ Architectural Diagram

The diagram shows how build failures and API diffs flow through the system, how LLMs are invoked with prompt templates, and how fixes are validated in Docker builds.


πŸ—‚οΈ Module Structure

Step 1 β€” Extract Failed Build Information

  • git-manager:
    • Manages repositories and creates new branches for each failure category.
  • extractor:
    • Extracts contextual information from the client’s repository and API diffs.
  • breaking-classifier:
    • Classifies dependency-related failures and parses build logs.

Step 2 β€” Use LLM to Generate Updated Code

  • llm:
    • Connects to LLM providers (OpenAI, Google, Anthropic, OpenRouter), resolved from .env (LLM, LLM_TYPE).

      Example

      LLM=openai
      LLM_TYPE=gpt-4o-mini
  • prompts:
    • Stores reusable prompt templates tailored to different failure scenarios.

Step 3 β€” Validate Updated Code

  • docker-build:
    • Builds reproducible Docker environments to test and validate proposed fixes.
  • core:
    • Manage the full workflow, manages configurations, and outputs logs and results to results/.

Setup and Installation

To replicate the experiments, you need to set up the environment and build the project. Follow these steps:

  1. Clone the repository:
    git clone [email protected]:chains-project/bacardi.git
    cd bacardi
  2. Build the project:
    mvn clean install
  3. Setup environment variable:
    cp .env.example .env

Usage

Run the application using the following command:

    java -jar ./core/target/Bump.jar

To execute a single breaking dependency update:

BACARDI can target and fix one breaking dependency update at a time. Each case in the BUMP dataset is represented by a JSON record containing, among other fields, a breaking_commit hash. To repair just that one update:

  1. Open the .env file.
  2. Add/replace the line below, replacing <breaking_commit_hash> with the exact value from the dataset’s JSON entry:
   SPECIFIC_FILE=<breaking_commit_hash>

You can find the breaking_commit_hash in the corresponding api-diff.txt file.

  1. Run the application using the following command:
    java -jar ./core/target/Bump.jar

Only the specified commit will be processed.


LLM configuration and selection

Bacardi is designed to work flexibly with multiple Large Language Models (LLMs). It does not run on all LLMs by default.

πŸ”§ How to Select an LLM

You can control which LLM is used via these environment variables in your .env file:

Variable Description Example
LLM Name of the model to use gpt-4o-mini, gemini-2.0-flash-001, o3-mini-2025-01-31, deepseek-deepseek-chat, qwen-qwen2.5-32b-instruct
LLM_TYPE Type of inference provider openai, openrouter, google,
OPENROUTER_PROVIDER (optional) If using OpenRouter, specify the provider open-inference/bf16, chutes/fp8
API_KEY / GOOGLE_API_KEY / OPENROUTER_API_KEY API keys for selected providers your-api-key-here

Bacardi currently supports these five models, each running with their respective inference providers:

  • gpt-4o-mini (OpenAI)
  • o3-mini-2025-01-31 (OpenAI)
  • gemini-2.0-flash-001 (Google)
  • deepseek-deepseek-chat (via OpenRouter)
  • qwen-qwen2.5-32b-instruct (via OpenRouter)

❗ Important:

Bacardi does not automatically run on all models. You must configure the desired one before execution.

Only the configured LLM will be used when running the repair pipeline.


Results

The results of the experiments are stored in the results directory. Each experiment is organized into subdirectories, containing the following files:

  • results_<prompt_name>: Contains the results of the experiment for a specific prompt.
  • <model_n>: Contains the results for a specific model.
πŸ“ results
β”œβ”€β”€ πŸ“ results_<prompt_name> each prompt results
β”‚   β”œβ”€β”€ πŸ“ <model_n> model results
β”‚   β”œβ”€β”€ -----
β”‚   β”œβ”€β”€ πŸ“ <model_n> model results

License

This project is licensed under the MIT License.


Thank you for using Bacardi! If you encounter any issues or have feedback, feel free to create an issue.

About

fix breaking dependency updates πŸ› οΈ

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages