GitHub - ramkotirk/LLM-Agent-and-Evaluation-Framework

LLM Agent and Evaluation Framework for Autonomous Penetration Testing

A framework for evaluating the performance of Large Language Models (LLMs) in autonomous penetration testing.

Overview This framework provides a comprehensive evaluation environment for assessing the capabilities of LLMs in autonomous penetration testing. It includes a modular architecture for integrating different LLMs, a simulation environment for testing, and a set of evaluation metrics for assessing performance.

Requirements

Python 3.8+
Docker
NVIDIA GPU (optional)
transformers library
torch library
neptune library
docker library
paramiko library
pyelftools library
pwntools library

Installation

Clone the repository: git clone https://github.com/your-username/llm-agent.git
Install required packages: pip install -r requirements.txt
Build the Docker image: docker build -t llm-agent .
Run the Docker container: docker run -it llm-agent

Usage

Configure the framework by modifying the config.json file.
Run the evaluation script: python evaluate.py
View the results in the Neptune dashboard.

Evaluation Metrics

Success rate
Average time to exploit
Number of failed attempts
Coverage of vulnerabilities

LLM Integration The framework provides a modular architecture for integrating different LLMs. Currently, it supports the following LLMs:

BERT
RoBERTa
XLNet

Simulation Environment The framework includes a simulation environment for testing the LLMs. The environment consists of a set of virtual machines with different vulnerabilities.

Acknowledgments

Thanks to [RamkotiRK] for creating this project.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
configs		configs
overthewire_bench		overthewire_bench
picoctf_bench		picoctf_bench
.env_example		.env_example
LICENSE.md		LICENSE.md
README.md		README.md
docker_setup.py		docker_setup.py
pentest_agent.py		pentest_agent.py
requirements.txt		requirements.txt
run.py		run.py
run_bench.py		run_bench.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

License

ramkotirk/LLM-Agent-and-Evaluation-Framework

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages