Skip to content

An autonomous LLM-agent for large-scale, repository-level code auditing

License

Notifications You must be signed in to change notification settings

PurCL/RepoAudit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RepoAudit

RepoAudit is a repo-level bug detector for general bugs. Currently it supports the detection of three types of bug: Null Pointer Dereference (NPD), Memory Leak (MLK), and Use After Free (UAF). It leverages LLMSCAN to parse the codebase and use LLM to simulate the program's execution to analyze the data-flow facts starting with the designated source points.

Features

  • Compilation Free Analysis
  • Multi-Linguistic Support
  • Multiple Bug Type Detection
  • Detailed Bug Reports
  • Convenient WebUI Interface

Installation

  1. Clone the repository:

    git clone [email protected]:PurCL/RepoAudit.git --recursive
    cd RepoAudit
  2. Install the required dependencies:

    pip install -r requirements.txt
  3. Ensure you have the Tree-sitter library and language bindings installed:

    cd lib
    python build.py
  4. Configure LLM API keys. For Claude3.5, we use the model hosted by Amazon Bedrock.

    export OPENAI_API_KEY=xxxxxx >> ~/.bashrc
    export DEEPSEEK_API_KEY=xxxxxx >> ~/.bashrc

Quick Start

  1. Prepare the project that you want to analyze and store them in directory banchmark. Here we've provided several projects in the directory benchmark for your quick-start.

  2. Command Line: Run run_repoaudit.sh.

    > cd src
    
    # For NPE detection upon the directory `benchmark/Java/toy/NPD`.
    > sh run_repoaudit.sh

Result

Buggy Trace Report Format

Key: Buggy trace
Each key represents a unique buggy trace propagation path, detailing how a bug propagates through function calls and code execution. The structure is as follows:

{
    Explanation: Detailed explanation of the propagation path.
    Path: Array of steps along the trace, where each step is an object with:
          {
              source:      The source code fragment or operation (e.g., "return NULL;"),
              src_line:    The corresponding line number in the source file,
              function_name: The name of the function where the step occurs,
              function_code: The full code of the function (providing context),
              file_name:   The file path where the function is located
          }
    Vali_LLM:    The validation result produced by the LLM (e.g., "True" or "False"),
    Vali_human:  The human validation result (typically empty until reviewed)
}

WebUI

You can also use our webUI to quickly check the detection results.

cd src/webUI
streamlit run home.py

More

For more information, please refer this paper: RepoAudit: An Autonomous LLM-Agent for Repository-Level Code Auditing. The detailed documentation will be ready very soon.

License

This project is licensed under the GNU General Public License v2.0 (GPLv2). You are free to use, modify, and distribute the software under the terms of this license, provided that derivative works are also distributed under the same license.

For full details, see the LICENSE file or visit the official license page: https://www.gnu.org/licenses/old-licenses/gpl-2.0.html