GitHub - matmulai/Buggy-DS-1000: Buggy DS-1000: DS-1000 with bugs.

Buggy DS-1000

The use of GenAI for coding can be separated into two types of tasks: proposal and refinement. The initial request yields a proposal, and the subsequent calls are about refining the solution. Most of the calls to GenAI for coding are refinement calls. The refinement cycle is as follows: the developer runs the proposed solution, encounters a bug, pastes over the relevant output, gets a new suggestion, tries the solution, and iterates till the program runs.

Given the importance of the refinement calls, we produce a novel benchmark dataset of buggy code--Buggy DS-1000- to test the ability of LLMs to fix bugs. To create the dataset, we start with the DS-1000, a popular benchmark for data science tasks with a comprehensive evaluation suite with a relatively low false positive and negative rate (~ 5.7% for both). We introduce a variety of non-trivial errors in the code. Our current method deterministically introduces bugs in the code.

In the near future, we plan to support an LLM Agentic flow where we ask an LLM to propose bugs, another to fix bugs, and only keep bug proposals that can't be fixed in one shot.

Installation

# Basic installation
pip install -e .

# With OpenAI support
pip install -e ".[openai]"

# With Ollama support
pip install -e ".[ollama]"

# With all features
pip install -e ".[openai,ollama]"

Usage

python run.py --output-dir outputs/ --model ollama --model-name qwen2.5-coder:14b --num-samples 1 --bugs-per-problem 3

Dataset

The tool loads the DS-1000 dataset directly from Hugging Face (xlangai/DS-1000) and caches it locally for faster subsequent runs.

Bug Types

The generator can introduce the following types of bugs:

Logic errors in conditionals or calculations
Off-by-one errors in loops or indexing
Incorrect usage of library APIs
Wrong parameters in function calls
Missing checks for edge cases
Type errors or conversions
Variable scope issues
Incorrect or conflicting imports
Memory leaks or unbounded resource usage
Race conditions or concurrency issues

Output Format

For each problem in the dataset, the generator:

Saves the original problem as JSON
Saves the original solution code
Creates a buggy version of the solution
Creates a bug_metadata.json file with information about the introduced bugs

The metadata file contains:

A list of introduced bugs, each with:
- Bug type
- Description
- Line numbers where changes were made

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
src		src
.gitignore		.gitignore
README.md		README.md
check_dataset.py		check_dataset.py
dataset_generator.log		dataset_generator.log
ds1000_cache.json		ds1000_cache.json
run.py		run.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Buggy DS-1000

Installation

Usage

Dataset

Bug Types

Output Format

About

Releases

Packages

Contributors 2

Languages

matmulai/Buggy-DS-1000

Folders and files

Latest commit

History

Repository files navigation

Buggy DS-1000

Installation

Usage

Dataset

Bug Types

Output Format

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages