Skip to content

lallouslab/ctf-agent

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CTF Agent

Autonomous CTF (Capture The Flag) solver that races multiple AI models against challenges in parallel. Built in a weekend, we used it to solve all 52/52 challenges and win 1st place at BSidesSF 2026 CTF.

Built by Veria Labs, founded by members of .;,;. (smiley), the #1 US CTF team on CTFTime in 2024 and 2025. We build AI agents that find and exploit real security vulnerabilities for large enterprises.

Results

Competition Challenges Solved Result
BSidesSF 2026 52/52 (100%) 1st place ($1,500)

The agent solves challenges across all categories — pwn, rev, crypto, forensics, web, and misc.

How It Works

A coordinator LLM manages the competition while solver swarms attack individual challenges. Each swarm runs multiple models simultaneously — the first to find the flag wins.

                        +-----------------+
                        |  CTFd Platform  |
                        +--------+--------+
                                 |
                        +--------v--------+
                        |  Poller (5s)    |
                        +--------+--------+
                                 |
                        +--------v--------+
                        | Coordinator LLM |
                        | (Claude/Codex)  |
                        +--------+--------+
                                 |
              +------------------+------------------+
              |                  |                  |
     +--------v--------+ +------v---------+ +------v---------+
     | Swarm:          | | Swarm:         | | Swarm:         |
     | challenge-1     | | challenge-2    | | challenge-N    |
     |                 | |                | |                |
     |  Opus (med)     | |  Opus (med)    | |                |
     |  Opus (max)     | |  Opus (max)    | |     ...        |
     |  GPT-5.4        | |  GPT-5.4       | |                |
     |  GPT-5.4-mini   | |  GPT-5.4-mini  | |                |
     |  GPT-5.3-codex  | |  GPT-5.3-codex | |                |
     +--------+--------+ +--------+-------+ +----------------+
              |                    |
     +--------v--------+  +-------v--------+
     | Docker Sandbox  |  | Docker Sandbox |
     | (isolated)      |  | (isolated)     |
     |                 |  |                |
     | pwntools, r2,   |  | pwntools, r2,  |
     | gdb, python...  |  | gdb, python... |
     +-----------------+  +----------------+

Each solver runs in an isolated Docker container with CTF tools pre-installed. Solvers never give up — they keep trying different approaches until the flag is found.

Quick Start

# Install
uv sync

# Build sandbox image
docker build -f sandbox/Dockerfile.sandbox -t ctf-sandbox .

# Configure credentials
cp .env.example .env
# Edit .env with your API keys and CTFd token

# Run against a CTFd instance
uv run ctf-solve \
  --ctfd-url https://ctf.example.com \
  --ctfd-token ctfd_your_token \
  --challenges-dir challenges \
  --max-challenges 10 \
  -v

Coordinator Backends

# Claude SDK coordinator (default)
uv run ctf-solve --coordinator claude ...

# Codex coordinator (GPT-5.4 via JSON-RPC)
uv run ctf-solve --coordinator codex ...

Solver Models

Default model lineup (configurable in backend/models.py):

Model Provider Notes
Claude Opus 4.6 (medium) Claude SDK Balanced speed/quality
Claude Opus 4.6 (max) Claude SDK Deep reasoning
GPT-5.4 Codex Best overall solver
GPT-5.4-mini Codex Fast, good for easy challenges
GPT-5.3-codex Codex Reasoning model (xhigh effort)

Sandbox Tooling

Each solver gets an isolated Docker container pre-loaded with CTF tools:

Category Tools
Binary radare2, GDB, objdump, binwalk, strings, readelf
Pwn pwntools, ROPgadget, angr, unicorn, capstone
Crypto SageMath, RsaCtfTool, z3, gmpy2, pycryptodome, cado-nfs
Forensics volatility3, Sleuthkit (mmls/fls/icat), foremost, exiftool
Stego steghide, stegseek, zsteg, ImageMagick, tesseract OCR
Web curl, nmap, Python requests, flask
Misc ffmpeg, sox, Pillow, numpy, scipy, PyTorch, podman

Features

  • Multi-model racing — multiple AI models attack each challenge simultaneously
  • Auto-spawn — new challenges detected and attacked automatically
  • Coordinator LLM — reads solver traces, crafts targeted technical guidance
  • Cross-solver insights — findings shared between models via message bus
  • Docker sandboxes — isolated containers with full CTF tooling
  • Operator messaging — send hints to running solvers mid-competition

Configuration

Copy .env.example to .env and fill in your keys:

cp .env.example .env
CTFD_URL=https://ctf.example.com
CTFD_TOKEN=ctfd_your_token
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
GEMINI_API_KEY=...

All settings can also be passed as environment variables or CLI flags.

Requirements

  • Python 3.14+
  • Docker
  • API keys for at least one provider (Anthropic, OpenAI, Google)
  • codex CLI (for Codex solver/coordinator)
  • claude CLI (bundled with claude-agent-sdk)

About

Autonomous CTF solver that races multiple AI models in parallel. 1st place BSidesSF 2026.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%