Autonomous CTF (Capture The Flag) solver that races multiple AI models against challenges in parallel. Built in a weekend, we used it to solve all 52/52 challenges and win 1st place at BSidesSF 2026 CTF.
Built by Veria Labs, founded by members of .;,;. (smiley), the #1 US CTF team on CTFTime in 2024 and 2025. We build AI agents that find and exploit real security vulnerabilities for large enterprises.
| Competition | Challenges Solved | Result |
|---|---|---|
| BSidesSF 2026 | 52/52 (100%) | 1st place ($1,500) |
The agent solves challenges across all categories — pwn, rev, crypto, forensics, web, and misc.
A coordinator LLM manages the competition while solver swarms attack individual challenges. Each swarm runs multiple models simultaneously — the first to find the flag wins.
+-----------------+
| CTFd Platform |
+--------+--------+
|
+--------v--------+
| Poller (5s) |
+--------+--------+
|
+--------v--------+
| Coordinator LLM |
| (Claude/Codex) |
+--------+--------+
|
+------------------+------------------+
| | |
+--------v--------+ +------v---------+ +------v---------+
| Swarm: | | Swarm: | | Swarm: |
| challenge-1 | | challenge-2 | | challenge-N |
| | | | | |
| Opus (med) | | Opus (med) | | |
| Opus (max) | | Opus (max) | | ... |
| GPT-5.4 | | GPT-5.4 | | |
| GPT-5.4-mini | | GPT-5.4-mini | | |
| GPT-5.3-codex | | GPT-5.3-codex | | |
+--------+--------+ +--------+-------+ +----------------+
| |
+--------v--------+ +-------v--------+
| Docker Sandbox | | Docker Sandbox |
| (isolated) | | (isolated) |
| | | |
| pwntools, r2, | | pwntools, r2, |
| gdb, python... | | gdb, python... |
+-----------------+ +----------------+
Each solver runs in an isolated Docker container with CTF tools pre-installed. Solvers never give up — they keep trying different approaches until the flag is found.
# Install
uv sync
# Build sandbox image
docker build -f sandbox/Dockerfile.sandbox -t ctf-sandbox .
# Configure credentials
cp .env.example .env
# Edit .env with your API keys and CTFd token
# Run against a CTFd instance
uv run ctf-solve \
--ctfd-url https://ctf.example.com \
--ctfd-token ctfd_your_token \
--challenges-dir challenges \
--max-challenges 10 \
-v# Claude SDK coordinator (default)
uv run ctf-solve --coordinator claude ...
# Codex coordinator (GPT-5.4 via JSON-RPC)
uv run ctf-solve --coordinator codex ...Default model lineup (configurable in backend/models.py):
| Model | Provider | Notes |
|---|---|---|
| Claude Opus 4.6 (medium) | Claude SDK | Balanced speed/quality |
| Claude Opus 4.6 (max) | Claude SDK | Deep reasoning |
| GPT-5.4 | Codex | Best overall solver |
| GPT-5.4-mini | Codex | Fast, good for easy challenges |
| GPT-5.3-codex | Codex | Reasoning model (xhigh effort) |
Each solver gets an isolated Docker container pre-loaded with CTF tools:
| Category | Tools |
|---|---|
| Binary | radare2, GDB, objdump, binwalk, strings, readelf |
| Pwn | pwntools, ROPgadget, angr, unicorn, capstone |
| Crypto | SageMath, RsaCtfTool, z3, gmpy2, pycryptodome, cado-nfs |
| Forensics | volatility3, Sleuthkit (mmls/fls/icat), foremost, exiftool |
| Stego | steghide, stegseek, zsteg, ImageMagick, tesseract OCR |
| Web | curl, nmap, Python requests, flask |
| Misc | ffmpeg, sox, Pillow, numpy, scipy, PyTorch, podman |
- Multi-model racing — multiple AI models attack each challenge simultaneously
- Auto-spawn — new challenges detected and attacked automatically
- Coordinator LLM — reads solver traces, crafts targeted technical guidance
- Cross-solver insights — findings shared between models via message bus
- Docker sandboxes — isolated containers with full CTF tooling
- Operator messaging — send hints to running solvers mid-competition
Copy .env.example to .env and fill in your keys:
cp .env.example .envCTFD_URL=https://ctf.example.com
CTFD_TOKEN=ctfd_your_token
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
GEMINI_API_KEY=...All settings can also be passed as environment variables or CLI flags.
- Python 3.14+
- Docker
- API keys for at least one provider (Anthropic, OpenAI, Google)
codexCLI (for Codex solver/coordinator)claudeCLI (bundled with claude-agent-sdk)
- es3n1n/Eruditus — CTFd interaction and HTML helpers in
pull_challenges.py