CTF Agent

Autonomous CTF (Capture The Flag) solver that races multiple AI models against challenges in parallel. Built in a weekend, we used it to solve all 52/52 challenges and win 1st place at BSidesSF 2026 CTF.

Built by Veria Labs, founded by members of .;,;. (smiley), the #1 US CTF team on CTFTime in 2024 and 2025. We build AI agents that find and exploit real security vulnerabilities for large enterprises.

Results

Competition	Challenges Solved	Result
BSidesSF 2026	52/52 (100%)	1st place ($1,500)

The agent solves challenges across all categories — pwn, rev, crypto, forensics, web, and misc.

How It Works

A coordinator LLM manages the competition while solver swarms attack individual challenges. Each swarm runs multiple models simultaneously — the first to find the flag wins.

                        +-----------------+
                        |  CTFd Platform  |
                        +--------+--------+
                                 |
                        +--------v--------+
                        |  Poller (5s)    |
                        +--------+--------+
                                 |
                        +--------v--------+
                        | Coordinator LLM |
                        | (Claude/Codex)  |
                        +--------+--------+
                                 |
              +------------------+------------------+
              |                  |                  |
     +--------v--------+ +------v---------+ +------v---------+
     | Swarm:          | | Swarm:         | | Swarm:         |
     | challenge-1     | | challenge-2    | | challenge-N    |
     |                 | |                | |                |
     |  Opus (med)     | |  Opus (med)    | |                |
     |  Opus (max)     | |  Opus (max)    | |     ...        |
     |  GPT-5.4        | |  GPT-5.4       | |                |
     |  GPT-5.4-mini   | |  GPT-5.4-mini  | |                |
     |  GPT-5.3-codex  | |  GPT-5.3-codex | |                |
     +--------+--------+ +--------+-------+ +----------------+
              |                    |
     +--------v--------+  +-------v--------+
     | Docker Sandbox  |  | Docker Sandbox |
     | (isolated)      |  | (isolated)     |
     |                 |  |                |
     | pwntools, r2,   |  | pwntools, r2,  |
     | gdb, python...  |  | gdb, python... |
     +-----------------+  +----------------+

Each solver runs in an isolated Docker container with CTF tools pre-installed. Solvers never give up — they keep trying different approaches until the flag is found.

Quick Start

# Install
uv sync

# Build sandbox image
docker build -f sandbox/Dockerfile.sandbox -t ctf-sandbox .

# Configure credentials
cp .env.example .env
# Edit .env with your API keys and CTFd token

# Run against a CTFd instance
uv run ctf-solve \
  --ctfd-url https://ctf.example.com \
  --ctfd-token ctfd_your_token \
  --challenges-dir challenges \
  --max-challenges 10 \
  -v

Coordinator Backends

# Claude SDK coordinator (default)
uv run ctf-solve --coordinator claude ...

# Codex coordinator (GPT-5.4 via JSON-RPC)
uv run ctf-solve --coordinator codex ...

Solver Models

Default model lineup (configurable in backend/models.py):

Model	Provider	Notes
Claude Opus 4.6 (medium)	Claude SDK	Balanced speed/quality
Claude Opus 4.6 (max)	Claude SDK	Deep reasoning
GPT-5.4	Codex	Best overall solver
GPT-5.4-mini	Codex	Fast, good for easy challenges
GPT-5.3-codex	Codex	Reasoning model (xhigh effort)

Sandbox Tooling

Each solver gets an isolated Docker container pre-loaded with CTF tools:

Category	Tools
Binary	radare2, GDB, objdump, binwalk, strings, readelf
Pwn	pwntools, ROPgadget, angr, unicorn, capstone
Crypto	SageMath, RsaCtfTool, z3, gmpy2, pycryptodome, cado-nfs
Forensics	volatility3, Sleuthkit (mmls/fls/icat), foremost, exiftool
Stego	steghide, stegseek, zsteg, ImageMagick, tesseract OCR
Web	curl, nmap, Python requests, flask
Misc	ffmpeg, sox, Pillow, numpy, scipy, PyTorch, podman

Features

Multi-model racing — multiple AI models attack each challenge simultaneously
Auto-spawn — new challenges detected and attacked automatically
Coordinator LLM — reads solver traces, crafts targeted technical guidance
Cross-solver insights — findings shared between models via message bus
Docker sandboxes — isolated containers with full CTF tooling
Operator messaging — send hints to running solvers mid-competition

Configuration

Copy .env.example to .env and fill in your keys:

cp .env.example .env

CTFD_URL=https://ctf.example.com
CTFD_TOKEN=ctfd_your_token
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
GEMINI_API_KEY=...

All settings can also be passed as environment variables or CLI flags.

Requirements

Python 3.14+
Docker
API keys for at least one provider (Anthropic, OpenAI, Google)
codex CLI (for Codex solver/coordinator)
claude CLI (bundled with claude-agent-sdk)

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
backend		backend
sandbox		sandbox
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
pull_challenges.py		pull_challenges.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CTF Agent

Results

How It Works

Quick Start

Coordinator Backends

Solver Models

Sandbox Tooling

Features

Configuration

Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CTF Agent

Results

How It Works

Quick Start

Coordinator Backends

Solver Models

Sandbox Tooling

Features

Configuration

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages