Awesome AI Security

Curated resources, research, and tools for securing AI systems.

Best Practices and Security Standards
- Governance & Management Frameworks
- Standards, Controls & Top 10s
- Testing & Red Teaming
- Implementation Guides & Best Practices
- Agentic Systems — Governance, Standards & Guides
Tools
- Prompt-Injection Detection & Mitigation
- Jailbreak & Policy Enforcement (Guardrails)
- Model Artifact Scanners
- Agent Tooling and MCP Security
- Execution Sandboxing for Agent Code
- Gateways & Policy Proxies
- Code Review
- Red-Teaming Harnesses & Automated Security Testing
- Supply Chain: AI/ML BOM and Attestation
- Vector/Memory Store Security
- Data/Model Poisoning Defenses
- Sensitive Data Leak Prevention (DLP for AI)
- Monitoring, Logging & Anomaly Detection
Attack & Defense Matrices
- Attack
- Defense
Checklists
Foundations: Glossary, SoK/Surveys & Taxonomies
- Glossary
- SoK & Surveys
- Taxonomy
Datasets
Courses
Certifications
Learning Resources
Research Working Groups
Communities & Social Groups
Benchmarking
Incident Response
- Incident Repositories, Trackers & Monitors
- Guides & Playbooks
Supply Chain Security
- Standards & Specs
- Third-Party Assessment
Newsletter
Conferences and Events
Reports and Research
- Research Feed
- Reports
CTFs & Challenges
Podcasts
Market Landscape
Startups Blogs
- Scope & Plan
- Develop & Experiment
- Augment & Fine-Tune Data
- Test & Evaluate
- Release
- Deploy
- Operate
- Monitor
- Govern
Related Awesome Lists
Common Acronyms

Best Practices and Security Standards

Governance & Management Frameworks

Standards, Controls & Top 10s

Controls & Verification Standards

OWASP — LLM Security Verification Standard (LLMSVS)
OWASP — Artificial Intelligence Security Verification Standard (AISVS)
CSA — AI Controls Matrix (AICM) — The AICM contains 243 control objectives across 18 domains and maps to ISO 42001, ISO 27001, NIST AI RMF 1.0, and BSI AIC4. Freely downloadable.

Top 10s

Scoring & Rating Systems

OWASP — Artificial Intelligence Vulnerability Scoring System

Testing & Red Teaming

Implementation Guides & Best Practices

Agentic Systems — Governance, Standards & Guides

Tools

Inclusion criteria (open-source tools): must have 220+ GitHub stars, active maintenance in the last 12 months, and ≥3 contributors.

Prompt-Injection Detection & Mitigation

Detect and stop prompt-injection (direct/indirect) across inputs, context, and outputs; filter hostile content before it reaches tools or models.

(none from your current list yet)

Jailbreak & Policy Enforcement (Guardrails)

Enforce safety policies and block jailbreaks at runtime via rules/validators/DSLs, with optional human-in-the-loop for sensitive actions.

NeMo Guardrails
LLM Guard
Llama Guard
LlamaFirewall
Code Shield
Guardrails — Runtime policy enforcement for LLM apps: compose input/output validators (PII, toxicity, jailbreak/PI, regex, competitor checks), then block/redact/rewrite/retry on fail; optional server mode; also supports structured outputs (Pydantic/function-calling).

Model Artifact Scanners

Analyze serialized model files for unsafe deserialization and embedded code; verify integrity/metadata and block or quarantine on fail.

Agent Tooling and MCP Security

Scan/audit MCP servers & client configs; detect tool poisoning, unsafe flows; constrain tool access with least-privilege and audit trails.

Tool manifest/metadata validators

MCP Inspector
mcp-scan

Servers & Dev tooling

PortSwigger — MCP Server

Execution Sandboxing for Agent Code

Run untrusted or LLM-triggered code in isolated sandboxes (FS/network/process limits) to contain RCE and reduce blast radius.

E2B — SDK + self-hostable infra to run untrusted, LLM-generated code in isolated cloud sandboxes (Firecracker microVMs).

Gateways & Policy Proxies

Centralize auth, quotas/rate limits, cost caps, egress/DLP filters, and guardrail orchestration across all model/providers.

(none from your current list yet)

Code Review

Claude Code Security Reviewer - An AI-powered security review GitHub Action using Claude to analyze code changes for security vulnerabilities.
Vulnhuntr - Vulnhuntr leverages the power of LLMs to automatically create and analyze entire code call chains starting from remote user input and ending at server output for detection of complex, multi-step, security-bypassing vulnerabilities that go far beyond what traditional static code analysis tools are capable of performing.

Red-Teaming Harnesses & Automated Security Testing

Automate attack suites (prompt-injection, leakage, jailbreak, goal-based tasks) in CI; score results and produce regression evidence.

Goal-directed agent attack tasks

CI pipelines & regression gates

promptfoo
Agentic Radar
DeepTeam
Buttercup — Trail of Bits’ AIxCC Cyber Reasoning System: runs OSS-Fuzz–style campaigns to find vulns, then uses a multi-agent LLM patcher to generate & validate fixes for C/Java repos; ships SigNoz observability; requires at least one LLM API key.

Scoring/leaderboards & evidence reports

(none from your current list yet)

Supply Chain: AI/ML BOM and Attestation

Generate and verify AI/ML BOMs, signatures, and provenance for models/datasets/dependencies; enforce allow/deny policies.

(none from your current list yet)

Vector/Memory Store Security

Harden RAG memory: isolate namespaces, sanitize queries/content, detect poisoning/outliers, and prevent secret/PII retention.

(none from your current list yet)

Data/Model Poisoning Defenses

Detect and mitigate dataset/model poisoning and backdoors; validate training/fine-tuning integrity and prune suspicious behaviors.

Adversarial Robustness Toolbox (ART)

Sensitive Data Leak Prevention (DLP for AI)

Prevent secret/PII exfiltration in prompts/outputs via detection, redaction, and policy checks at I/O boundaries.

Presidio — PII/PHI detection & redaction for text, images, and structured data; use as a pre/post-LLM DLP filter and for dataset sanitization.

Monitoring, Logging & Anomaly Detection

Collect AI-specific security logs/signals; detect abuse patterns (PI/jailbreak/leakage), enrich alerts, and support forensics.

LangKit — LLM observability metrics toolkit (whylogs-compatible): prompt-injection/jailbreak similarity, PII patterns, hallucination/consistency, relevance, sentiment/toxicity, readability.
Alibi Detect — Production drift/outlier/adversarial detection for tabular, text, images, and time series; online/offline detectors with TF/PyTorch backends; returns scores, thresholds, and flags for alerting.

Attack & Defense Matrices

Matrix-style resources covering adversarial TTPs and curated defensive techniques for AI systems.

Attack

MITRE ATLAS – Adversarial TTP matrix and knowledge base for threats to AI systems.
GenAI Attacks Matrix – Matrix of TTPs targeting GenAI apps, copilots, and agents.
MCP Security Tactics, Techniques, and Procedures (TTPs)

Defense

AIDEFEND — AI Defense Framework — Interactive defensive countermeasures knowledge base with Tactics / Pillars / Phases views; maps mitigations to MITRE ATLAS, MAESTRO, and OWASP LLM risks. • Live demo: https://edward-playground.github.io/aidefense-framework/

Checklists

Supply Chain Security

Guidance and standards for securing the AI/ML software supply chain (models, datasets, code, pipelines). Primarily specs and frameworks; includes vetted TPRM templates.

Standards & Specs

Normative formats and specifications for transparency and traceability across AI components and dependencies.

OWASP — AI Bill of Materials (AIBOM) — Bill of materials format for AI components, datasets, and model dependencies.

Third-Party Assessment

Questionnaires and templates to assess external vendors, model providers, and integrators for security, privacy, and compliance.

FS-ISAC — Generative AI Vendor Evaluation & Qualitative Risk Assessment — Assessment Tool XLSX • Guide PDF — Vendor due-diligence toolkit for GenAI: risk tiering by use case, integration and data sensitivity; questionnaires across privacy, security, model development and validation, integration, legal and compliance; auto-generated reporting.

Foundations: Glossary, SoK/Surveys & Taxonomies

(Core references and syntheses for orientation and shared language.)

Glossary

(Authoritative definitions for AI/ML security, governance, and risk—use to align terminology across docs and reviews.)

NIST — “The Language of Trustworthy AI: An In-Depth Glossary of Terms.” - Authoritative cross-org terminology aligned to NIST AI RMF; useful for standardizing terms across teams.
ISO/IEC 22989:2022 — Artificial intelligence — Concepts and terminology - International standard that formalizes core AI concepts and vocabulary used in policy and engineering.

SoK & Surveys

(Systematizations of Knowledge (SoK), surveys, systematic reviews, and mapping studies.)

Taxonomy

(Reusable classification schemes—clear dimensions, categories, and labeling rules for attacks, defenses, datasets, and risks.)

CSA — Large Language Model (LLM) Threats Taxonomy - Community taxonomy of LLM-specific threats; clarifies categories/definitions for risk discussion and control mapping.
ARC — PI (Prompt Injection) Taxonomy - Focused taxonomy for prompt-injection behaviors/variants with practical labeling guidance for detection and defense.

Datasets

Cybersecurity Skills

Interactive CTFs and self-contained labs for hands-on security skills (web, pwn, crypto, forensics, reversing). Used to assess practical reasoning, tool use, and end-to-end task execution.

NYU CTF Bench

Cybersecurity Knowledge

Structured Q&A datasets assessing security knowledge and terminology. Used to evaluate factual recall and conceptual understanding.

CyberMetric

Secure Coding & Vulnerability Detection

Code snippet datasets labeled as vulnerable or secure, often tied to CWEs (Common Weakness Enumeration). Used to evaluate the model’s ability to recognize insecure code patterns and suggest secure fixes.

LLMSecEval
SecCodePLT

Jailbreak & Guardrail Evaluation

Adversarial prompt datasets—both text-only and multimodal—designed to bypass safety mechanisms or test refusal logic. Used to test how effectively a model resists jailbreaks and enforces policy-based refusal.

Prompt Injection & Malicious Prompt Detection

Datasets labeled with whether prompts are benign or malicious (i.e., injection attempts). Used to evaluate an LLM’s ability to detect and neutralize prompt-injection style attacks.

Malicious Prompt Detection
LLMail-Inject

Courses

Microsoft AI Security Learning Path – Free training modules on AI security, covering secure AI model development, risk management, and threat mitigation.
AWS AI Security Training – Free AWS courses on securing AI applications, risk management, and implementing security best practices in AI/ML environments.

Certifications

Governance & Audit

ISACA — Advanced in AI Security Management (AAISM™) — AI-centric security management certification to manage AI-related risk, implement policy, and ensure responsible/effective use across the organization
ISACA — Advanced in AI Audit (AAIA™) — Validates ability to audit complex systems and mitigate AI-related risks; domains include AI governance & risk, AI operations, and AI tools/techniques.
IAPP — Artificial Intelligence Governance Professional (AIGP) — Demonstrates capability to ensure safety and trust in the development, deployment, and ongoing management of ethical AI; aligned training focuses on building trustworthy AI in line with emerging laws/policies.
NIST AI RMF 1.0 Architect — Certified Information Security — Credential offered by Certified Information Security aligned to NIST AI RMF 1.0 (listed on NICCS); prepares professionals to design and lead AI risk-management programs.
ISO/IEC 42001 — AI Management System (Lead Implementer, PECB) — Prepares professionals to implement an artificial intelligence management system (AIMS) in accordance with ISO/IEC 42001.
ISO/IEC 42001 — AI Management System (Lead Auditor, PECB) — Develops expertise to audit artificial intelligence management systems (AIMS) using recognized audit principles, procedures, and techniques.
ISO/IEC 23894 — AI Risk Management (AI Risk Manager, PECB) — Training & certification focused on identifying, assessing, and mitigating AI-related risks; aligned to frameworks such as ISO/IEC 23894 and NIST AI RMF.

Learning Resources

Nightfall AI Security 101 – A centralized learning hub for AI security, offering an evolving library of concepts, emerging risks, and foundational principles in securing AI systems.

Foundations

SANS — AI Cybersecurity Careers — Career pathways poster + training map; helpful baseline skills that transfer to AI security (IR, DFIR, detection, threat hunting).

Research Working Groups

Cloud Security Alliance (CSA) AI Security Working Groups – Collaborative research groups focused on AI security, cloud security, and emerging threats in AI-driven systems.
OWASP Top 10 for LLM & Generative AI Security Risks Project – An open-source initiative addressing critical security risks in Large Language Models (LLMs) and Generative AI applications, offering resources and guidelines to mitigate emerging threats.
CWE Artificial Intelligence Working Group (AI WG) – The AI WG was established by CWE™ and CVE® community stakeholders to identify and address gaps in the CWE corpus where AI-related weaknesses are not adequately covered, and work collaboratively to fix them.

📌 (More working groups to be added.)

Communities & Social Groups

AI Security Hub (LinkedIn Group)

Benchmarking

Benchmarks

Purple Llama — CyberSecEval
JailbreakBench

Categories of AI Security Benchmarks

Robustness & Adversarial Resilience

Purpose: Evaluates how AI systems withstand adversarial attacks, including evasion, poisoning, and model extraction. Ensures AI remains functional under manipulation.
NIST AI RMF Alignment: Measure, Manage

Measure: Identify risks related to adversarial attacks.
Manage: Implement mitigation strategies to ensure resilience.

Model & Data Integrity

Purpose: Assesses AI models for unauthorized modifications, including backdoors and dataset poisoning. Supports trustworthiness and security of model outputs.
NIST AI RMF Alignment: Map, Measure

Map: Understand and identify risks to model/data integrity.
Measure: Evaluate and mitigate risks through validation techniques.
CVE-Bench — @uiuc-kang-lab — How well AI agents can exploit real-world software vulnerabilities that are listed in the CVE database.

Governance & Compliance

Purpose: Ensures AI security aligns with governance frameworks, industry regulations, and security policies. Supports auditability and risk management.
NIST AI RMF Alignment: Govern

Govern: Establish policies, accountability structures, and compliance controls.

Privacy & Data Protection

Purpose: Evaluates AI for risks like data leakage, membership inference, and model inversion. Helps ensure privacy preservation and compliance.
NIST AI RMF Alignment: Measure, Manage

Measure: Identify and assess AI-related privacy risks.
Manage: Implement security controls to mitigate privacy threats.

Explainability & Trustworthiness

Purpose: Assesses AI for transparency, fairness, and bias mitigation. Ensures AI operates in an interpretable and ethical manner.
NIST AI RMF Alignment: Govern, Map, Measure

Govern: Establish policies for fairness, bias mitigation, and transparency.
Map: Identify potential explainability risks in AI decision-making.
Measure: Evaluate AI outputs for fairness, bias, and interpretability.

Incident Response

Incident Repositories, Trackers & Monitors

Guides & Playbooks

Newsletter

Adversarial AI Digest - A digest of AI security research, threats, governance challenges, and best practices for securing AI systems.

Conferences and Events

Reports and Research

Research Feed

AI Security Research Feed – Continuously updated feed of AI security–related academic papers, preprints, and research indexed from arXiv.
AI Security Portal – Literature Database – Categorized database of AI security literature, taxonomy, and related resources.

Reports

CSA — The State of AI and Security Survey Report
CSA — Principles to Practice: Responsible AI in a Dynamic Regulatory Environment
CSA — AI Resilience: A Revolutionary Benchmarking Model for AI Safety – Governance & compliance benchmarking model.
CSA — Using AI for Offensive Security

📌 (More to be added – A collection of AI security reports, white papers, and academic studies.)

CTFs & Challenges

AI GOAT
Gandalf CTF
Damn Vulnerable LLM Agent
AI Red Teaming Playground Labs — Microsoft — Self-hostable lab environment with 12 challenges (direct/indirect prompt injection, metaprompt extraction, Crescendo multi-turn, guardrail bypass).

Podcasts

The MLSecOps Podcast – Insightful conversations with industry leaders and AI experts, exploring the fascinating world of machine learning security operations.

Market Landscape

Curated market maps of tools and vendors for securing LLM and agentic AI applications across the lifecycle.

OWASP — LLM and Generative AI Security Solutions Landscape
OWASP — AI Security Solutions Landscape for Agentic AI
Latio — 2025 AI Security Report – Market trends and vendor landscape snapshot for AI security.
Woodside Capital Partners — Cybersecurity Sector — A snapshot with vendor breakdowns and landscape view.

Startups Blogs

A curated list of startups securing agentic AI applications, organized by the OWASP Agentic AI lifecycle (Scope & Plan → Govern). Each company appears once in its best-fit stage based on public positioning, and links point to blog/insights for deeper context. Some startups span multiple stages; placements reflect primary focus.

Inclusion criteria

Startup has not been acquired
Has an active blog
Has an active GitHub organization/repository

Scope & Plan

Design-time security: non-human identities, agent threat modeling, privilege boundaries/authn, and memory scoping/isolation.

no startups here with active blog and active GitHub account

Develop & Experiment

Secure agent loops and tool use; validate I/O contracts; embed policy hooks; test resilience during co-engineering.

no startups here with active blog and active GitHub account

Augment & Fine-Tune Data

Sanitize/trace data and reasoning; validate alignment; protect sensitive memory with privacy controls before deployment.

Skyflow

Test & Evaluate

Adversarial testing for goal drift, prompt injection, and tool misuse; red-team sims; sandboxed calls; decision validation.

Release

Sign models/plugins/memory; verify SBOMs; enforce cryptographically validated policies; register agents/capabilities.

no startups here with active blog and active GitHub account

Deploy

Zero-trust activation: rotate ephemeral creds, apply allowlists/LLM firewalls, and fine-grained least-privilege authorization.

Pomerium

Operate

Monitor memory mutations for drift/poisoning, detect abnormal loops/misuse, enforce HITL overrides, and scan plugins—continuous, real-time vigilance for resilient operations as systems scale and self-orchestrate.

Monitor

Correlate agent steps/tools/comms; detect anomalies (e.g., goal reversal); keep immutable logs for auditability.

Govern

Enforce role/task policies, version/retire agents, prevent privilege creep, and align evidence with AI regulations.

Related Awesome Lists

Common Acronyms

Acronym	Full Form
AI	Artificial Intelligence
AGI	Artificial General Intelligence
ALBERT	A Lite BERT
BERT	Bidirectional Encoder Representations from Transformers
BGMAttack	Black-box Generative Model-based Attack
CBA	Composite Backdoor Attack
CCPA	California Consumer Privacy Act
DAN	Do Anything Now
DNN	Deep Neural Network
DP	Differential Privacy
FL	Federated Learning
GDPR	General Data Protection Regulation
GA	Genetic Algorithm
GPT	Generative Pre-trained Transformer
HIPAA	Health Insurance Portability and Accountability Act
LM	Language Model
LLM	Large Language Model
Llama	Large Language Model Meta AI
MIA	Membership Inference Attack
MDP	Masking-Differential Prompting
MLM	Masked Language Model
NLP	Natural Language Processing
OOD	Out Of Distribution
PI	Prompt Injection
PII	Personally Identifiable Information
PAIR	Prompt Automatic Iterative Refinement
PLM	pre-trained Language Model
RL	Reinforcement Learning
RLHF	Reinforcement Learning from Human Feedback
RoBERTa	Robustly optimized BERT approach
SGD	Stochastic Gradient Descent
TAG	Gradient Attack on Transformer-based Language Models
XLNet	Transformer-XL with autoregressive and autoencoding pre-training

Contributing

Contributions are welcome! If you have new resources, tools, or insights to add, feel free to submit a pull request.

This repository follows the Awesome Manifesto guidelines.

Name		Name	Last commit message	Last commit date
Latest commit History 305 Commits
AGENTS.md		AGENTS.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

License

TalEliyahu/Awesome-AI-Security

Folders and files

Latest commit

History

Repository files navigation

Awesome AI Security

Table of Contents

Best Practices and Security Standards

Governance & Management Frameworks

Standards, Controls & Top 10s

Controls & Verification Standards

Top 10s

Scoring & Rating Systems

Testing & Red Teaming

Implementation Guides & Best Practices

Agentic Systems — Governance, Standards & Guides

Tools

Prompt-Injection Detection & Mitigation

Jailbreak & Policy Enforcement (Guardrails)

Model Artifact Scanners

Agent Tooling and MCP Security

Tool manifest/metadata validators

Servers & Dev tooling

Execution Sandboxing for Agent Code

Gateways & Policy Proxies

Code Review

Red-Teaming Harnesses & Automated Security Testing

Prompt-injection test suites

Data-leakage/secret-exfil test suites

Jailbreak catalogs & adversarial prompts

Adversarial-robustness (evasion) toolkits

Goal-directed agent attack tasks

CI pipelines & regression gates

Scoring/leaderboards & evidence reports

Supply Chain: AI/ML BOM and Attestation

Vector/Memory Store Security

Data/Model Poisoning Defenses

Sensitive Data Leak Prevention (DLP for AI)

Monitoring, Logging & Anomaly Detection

Attack & Defense Matrices

Attack

Defense

Checklists

Supply Chain Security

Standards & Specs

Third-Party Assessment

Foundations: Glossary, SoK/Surveys & Taxonomies

Glossary

SoK & Surveys

Taxonomy

Datasets

Cybersecurity Skills

Cybersecurity Knowledge

Secure Coding & Vulnerability Detection

Jailbreak & Guardrail Evaluation

Prompt Injection & Malicious Prompt Detection

Courses

Certifications

Governance & Audit

Learning Resources

Foundations

Research Working Groups

Communities & Social Groups

Benchmarking

Benchmarks

Categories of AI Security Benchmarks

Robustness & Adversarial Resilience

Model & Data Integrity

Governance & Compliance

Privacy & Data Protection

Explainability & Trustworthiness

Incident Response

Incident Repositories, Trackers & Monitors

Guides & Playbooks

Newsletter

Conferences and Events

Reports and Research

Research Feed

Reports

Packages