Enhancing Vulnerability Management With Autonomous LLM Agents
AutoPentest is an experimental framework for conducting autonomous black-box penetration tests using Large Language Models (LLMs). Integrated with the most popular LLM providers and the LangChain agent framework, AutoPentest can autonomously perform complex, multi-step security assessments—augmented by external tools and knowledge bases.
This project is based on this paper: https://arxiv.org/abs/2505.10321
📄 Abstract
A recent area of increasing research is the use of Large Language Models (LLMs) in penetration testing, which promises to reduce costs and thus allow for higher frequency. We conduct a review of related work, identifying best practices and common evaluation issues. We then present AutoPentest, an application for performing black-box penetration tests with a high degree of autonomy. AutoPentest is based on the LLM GPT-4o from OpenAI and the LLM agent framework LangChain. It can perform complex multi-step tasks, augmented by external tools and knowledge bases. We conduct a study on three capture-the-flag style Hack The Box (HTB) machines, comparing our implementation AutoPentest with the baseline approach of manually using the ChatGPT-4o user interface. Both approaches are able to complete 15-25 % of the subtasks on the HTB machines, with AutoPentest slightly outperforming ChatGPT. We measure a total cost of $96.20 US when using AutoPentest across all experiments, while a one-month subscription to ChatGPT Plus costs $20. The results show that further implementation efforts and the use of more powerful LLMs released in the future are likely to make this a viable part of vulnerability management.Academic inquiries: [email protected]
This project is intended for educational and research purposes only.
- ❌ Do not use AutoPentest for unauthorized or illegal penetration testing.
- 🛡️ The authors and contributors do not provide any warranty or guarantee of functionality, accuracy, or security.
- 🧪 This is an experimental tool and may produce unpredictable or incorrect results.
- 📜 By using this software, you agree to the terms of the license.
Currently, installation is only possible directly from the source code.
- Choose a VM to run autopentest on. Kali Linux is recommended as it comes with many penetration testing tools preinstalled.
- Clone this repo and navigate to the root directory:
git clone https://github.com/JuliusHenke/autopentest.git cd autopentest
- In the repo root directory, copy .env.example to .env and fill in the necessary environment variables.
Refer to https://python.langchain.com/docs/integrations/chat/ for details on which environment variables each chat model provider requires.
cp .env.example .env
- Install Poetry, then use it to install all required python
packages. It will automatically create a virtual environment for you.
By default, only dependencies for providers
poetry install
openai
andazure-openai
are installed. Optionally, you can add a specific LLM provider. Have a look at./pyproject.toml
for the available extras.Or install all providers:poetry install --extras "anthropic"
poetry install --extras "all-providers"
- Using playwright, to install browser binaries:
playwright install
To run AutoPentest, use the following command in any directory of the AutoPentest project (e.g. ./experiments
). Always ensure that you have
permission to test the target system!
poetry run autopentest <TARGET_IP_OR_DOMAIN>
Contributions are welcome! Please open issues or submit pull requests to help improve AutoPentest.
This product uses the NVD API but is not endorsed or certified by the NVD.