Fix breaking dependency updates
Bacardi automates the repair of breaking dependency updates through a three-step pipeline:
- Extract Failed Build Information
- Use LLM to Generate Updated Code
- Validate Updated Code
The following diagram and module breakdown illustrate how the components interact across the system.
The diagram shows how build failures and API diffs flow through the system, how LLMs are invoked with prompt templates, and how fixes are validated in Docker builds.
git-manager
:- Manages repositories and creates new branches for each failure category.
extractor
:- Extracts contextual information from the clientβs repository and API diffs.
breaking-classifier
:- Classifies dependency-related failures and parses build logs.
llm
:-
Connects to LLM providers (OpenAI, Google, Anthropic, OpenRouter), resolved from
.env
(LLM
,LLM_TYPE
).Example
LLM=openai LLM_TYPE=gpt-4o-mini
-
prompts
:- Stores reusable prompt templates tailored to different failure scenarios.
docker-build
:- Builds reproducible Docker environments to test and validate proposed fixes.
core
:- Manage the full workflow, manages configurations, and outputs logs and results to
results/
.
- Manage the full workflow, manages configurations, and outputs logs and results to
To replicate the experiments, you need to set up the environment and build the project. Follow these steps:
- Clone the repository:
git clone [email protected]:chains-project/bacardi.git cd bacardi
- Build the project:
mvn clean install
- Setup environment variable:
cp .env.example .env
Run the application using the following command:
java -jar ./core/target/Bump.jar
BACARDI can target and fix one breaking dependency update at a time.
Each case in the BUMP dataset is represented by a JSON record containing, among other fields, a breaking_commit
hash. To repair just that one update:
- Open the
.env
file. - Add/replace the line below, replacing
<breaking_commit_hash>
with the exact value from the datasetβs JSON entry:
SPECIFIC_FILE=<breaking_commit_hash>
You can find the breaking_commit_hash
in the corresponding api-diff.txt
file.
- Run the application using the following command:
java -jar ./core/target/Bump.jar
Only the specified commit will be processed.
Bacardi is designed to work flexibly with multiple Large Language Models (LLMs). It does not run on all LLMs by default.
You can control which LLM is used via these environment variables in your .env
file:
Variable | Description | Example |
---|---|---|
LLM |
Name of the model to use | gpt-4o-mini , gemini-2.0-flash-001 , o3-mini-2025-01-31 , deepseek-deepseek-chat , qwen-qwen2.5-32b-instruct |
LLM_TYPE |
Type of inference provider | openai , openrouter , google , |
OPENROUTER_PROVIDER (optional) |
If using OpenRouter, specify the provider | open-inference/bf16 , chutes/fp8 |
API_KEY / GOOGLE_API_KEY / OPENROUTER_API_KEY |
API keys for selected providers | your-api-key-here |
Bacardi currently supports these five models, each running with their respective inference providers:
- gpt-4o-mini (OpenAI)
- o3-mini-2025-01-31 (OpenAI)
- gemini-2.0-flash-001 (Google)
- deepseek-deepseek-chat (via OpenRouter)
- qwen-qwen2.5-32b-instruct (via OpenRouter)
β Important:
Bacardi does not automatically run on all models. You must configure the desired one before execution.
Only the configured LLM will be used when running the repair pipeline.
The results of the experiments are stored in the results directory. Each experiment is organized into subdirectories, containing the following files:
results_<prompt_name>
: Contains the results of the experiment for a specific prompt.<model_n>
: Contains the results for a specific model.
π results
βββ π results_<prompt_name> each prompt results
β βββ π <model_n> model results
β βββ -----
β βββ π <model_n> model results
This project is licensed under the MIT License.
Thank you for using Bacardi! If you encounter any issues or have feedback, feel free to create an issue.