Skip to content

matmulai/Buggy-DS-1000

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Buggy DS-1000

The use of GenAI for coding can be separated into two types of tasks: proposal and refinement. The initial request yields a proposal, and the subsequent calls are about refining the solution. Most of the calls to GenAI for coding are refinement calls. The refinement cycle is as follows: the developer runs the proposed solution, encounters a bug, pastes over the relevant output, gets a new suggestion, tries the solution, and iterates till the program runs.

Given the importance of the refinement calls, we produce a novel benchmark dataset of buggy code--Buggy DS-1000- to test the ability of LLMs to fix bugs. To create the dataset, we start with the DS-1000, a popular benchmark for data science tasks with a comprehensive evaluation suite with a relatively low false positive and negative rate (~ 5.7% for both). We introduce a variety of non-trivial errors in the code. Our current method deterministically introduces bugs in the code.

In the near future, we plan to support an LLM Agentic flow where we ask an LLM to propose bugs, another to fix bugs, and only keep bug proposals that can't be fixed in one shot.

Installation

# Basic installation
pip install -e .

# With OpenAI support
pip install -e ".[openai]"

# With Ollama support
pip install -e ".[ollama]"

# With all features
pip install -e ".[openai,ollama]"

Usage

python run.py --output-dir outputs/ --model ollama --model-name qwen2.5-coder:14b --num-samples 1 --bugs-per-problem 3

Dataset

The tool loads the DS-1000 dataset directly from Hugging Face (xlangai/DS-1000) and caches it locally for faster subsequent runs.

Bug Types

The generator can introduce the following types of bugs:

  • Logic errors in conditionals or calculations
  • Off-by-one errors in loops or indexing
  • Incorrect usage of library APIs
  • Wrong parameters in function calls
  • Missing checks for edge cases
  • Type errors or conversions
  • Variable scope issues
  • Incorrect or conflicting imports
  • Memory leaks or unbounded resource usage
  • Race conditions or concurrency issues

Output Format

For each problem in the dataset, the generator:

  1. Saves the original problem as JSON
  2. Saves the original solution code
  3. Creates a buggy version of the solution
  4. Creates a bug_metadata.json file with information about the introduced bugs

The metadata file contains:

  • A list of introduced bugs, each with:
    • Bug type
    • Description
    • Line numbers where changes were made

About

Buggy DS-1000: DS-1000 with bugs.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages