Curated resources, research, and tools for securing AI systems.
- Best Practices and Security Standards
- Tools
- Prompt-Injection Detection & Mitigation
- Jailbreak & Policy Enforcement (Guardrails)
- Model Artifact Scanners
- Agent Tooling and MCP Security
- Execution Sandboxing for Agent Code
- Gateways & Policy Proxies
- Code Review
- Red-Teaming Harnesses & Automated Security Testing
- Supply Chain: AI/ML BOM and Attestation
- Vector/Memory Store Security
- Data/Model Poisoning Defenses
- Sensitive Data Leak Prevention (DLP for AI)
- Monitoring, Logging & Anomaly Detection
- Attack & Defense Matrices
- Checklists
- Foundations: Glossary, SoK/Surveys & Taxonomies
- Datasets
- Courses
- Certifications
- Learning Resources
- Research Working Groups
- Communities & Social Groups
- Benchmarking
- Incident Response
- Supply Chain Security
- Newsletter
- Conferences and Events
- Reports and Research
- CTFs & Challenges
- Podcasts
- Market Landscape
- Startups Blogs
- Related Awesome Lists
- Common Acronyms
- NIST — AI Risk Management Framework (AI RMF)
- ISO/IEC 42001 (AI Management System)
- OWASP — AI Maturity Assessment (AIMA)
- Google — Secure AI Framework (SAIF)
- OWASP — LLM & GenAI Security Center of Excellence (CoE) Guide
- CSA — AI Model Risk Management Framework
- NIST — Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile
- OWASP — LLM Security Verification Standard (LLMSVS)
- OWASP — Artificial Intelligence Security Verification Standard (AISVS)
- CSA — AI Controls Matrix (AICM) — The AICM contains 243 control objectives across 18 domains and maps to ISO 42001, ISO 27001, NIST AI RMF 1.0, and BSI AIC4. Freely downloadable.
- OWASP — Top 10 for Large Language Model Applications
- CSA - MCP Client Top 10
- CSA - MCP Server Top 10
- OWASP — AI Testing Guide
- OWASP — Red Teaming Guide
- OWASP — LLM Exploit Generation
- CSA — Agentic AI Red Teaming Guide
- OWASP — AI Security and Privacy Guide
- OWASP — LLM and Gen AI Data Security Best Practices
- OWASP — GenAI Security Project
- CSA — Secure LLM Systems: Essential Authorization Practices
- NIST — Four Principles of Explainable Artificial Intelligence
- OASIS CoSAI — Preparing Defenders of AI Systems
- CISA — AI Data Security: Best Practices for Securing Data Used to Train & Operate AI Systems
- OWASP — Agent Observability Standard (AOS)
- OWASP — Agent Name Service (ANS) for Secure AI Agent Discovery
- OWASP — Agentic AI - Threats and Mitigations
- OWASP — Securing Agentic Applications Guide
- OWASP — Multi-Agentic System Threat Modeling Guide
- OWASP — State of Agentic AI Security and Governance
- CSA — Secure Agentic System Design: A Trait-Based Approach
- CSA — Agentic AI Identity & Access Management — 08/25
Inclusion criteria (open-source tools): must have 220+ GitHub stars, active maintenance in the last 12 months, and ≥3 contributors.
Detect and stop prompt-injection (direct/indirect) across inputs, context, and outputs; filter hostile content before it reaches tools or models.
- (none from your current list yet)
Enforce safety policies and block jailbreaks at runtime via rules/validators/DSLs, with optional human-in-the-loop for sensitive actions.
- NeMo Guardrails
- LLM Guard
- Llama Guard
- LlamaFirewall
- Code Shield
- Guardrails
— Runtime policy enforcement for LLM apps: compose input/output validators (PII, toxicity, jailbreak/PI, regex, competitor checks), then block/redact/rewrite/retry on fail; optional server mode; also supports structured outputs (Pydantic/function-calling).
Analyze serialized model files for unsafe deserialization and embedded code; verify integrity/metadata and block or quarantine on fail.
Scan/audit MCP servers & client configs; detect tool poisoning, unsafe flows; constrain tool access with least-privilege and audit trails.
Run untrusted or LLM-triggered code in isolated sandboxes (FS/network/process limits) to contain RCE and reduce blast radius.
- E2B
— SDK + self-hostable infra to run untrusted, LLM-generated code in isolated cloud sandboxes (Firecracker microVMs).
Centralize auth, quotas/rate limits, cost caps, egress/DLP filters, and guardrail orchestration across all model/providers.
- (none from your current list yet)
- Claude Code Security Reviewer
- An AI-powered security review GitHub Action using Claude to analyze code changes for security vulnerabilities.
- Vulnhuntr
- Vulnhuntr leverages the power of LLMs to automatically create and analyze entire code call chains starting from remote user input and ending at server output for detection of complex, multi-step, security-bypassing vulnerabilities that go far beyond what traditional static code analysis tools are capable of performing.
Automate attack suites (prompt-injection, leakage, jailbreak, goal-based tasks) in CI; score results and produce regression evidence.
- promptfoo
- Agentic Radar
- DeepTeam
- Buttercup
— Trail of Bits’ AIxCC Cyber Reasoning System: runs OSS-Fuzz–style campaigns to find vulns, then uses a multi-agent LLM patcher to generate & validate fixes for C/Java repos; ships SigNoz observability; requires at least one LLM API key.
- (none from your current list yet)
Generate and verify AI/ML BOMs, signatures, and provenance for models/datasets/dependencies; enforce allow/deny policies.
- (none from your current list yet)
Harden RAG memory: isolate namespaces, sanitize queries/content, detect poisoning/outliers, and prevent secret/PII retention.
- (none from your current list yet)
Detect and mitigate dataset/model poisoning and backdoors; validate training/fine-tuning integrity and prune suspicious behaviors.
Prevent secret/PII exfiltration in prompts/outputs via detection, redaction, and policy checks at I/O boundaries.
- Presidio
— PII/PHI detection & redaction for text, images, and structured data; use as a pre/post-LLM DLP filter and for dataset sanitization.
Collect AI-specific security logs/signals; detect abuse patterns (PI/jailbreak/leakage), enrich alerts, and support forensics.
-
LangKit
— LLM observability metrics toolkit (whylogs-compatible): prompt-injection/jailbreak similarity, PII patterns, hallucination/consistency, relevance, sentiment/toxicity, readability.
-
Alibi Detect
— Production drift/outlier/adversarial detection for tabular, text, images, and time series; online/offline detectors with TF/PyTorch backends; returns scores, thresholds, and flags for alerting.
Matrix-style resources covering adversarial TTPs and curated defensive techniques for AI systems.
- MITRE ATLAS – Adversarial TTP matrix and knowledge base for threats to AI systems.
- GenAI Attacks Matrix – Matrix of TTPs targeting GenAI apps, copilots, and agents.
- MCP Security Tactics, Techniques, and Procedures (TTPs)
- AIDEFEND — AI Defense Framework
— Interactive defensive countermeasures knowledge base with Tactics / Pillars / Phases views; maps mitigations to MITRE ATLAS, MAESTRO, and OWASP LLM risks. • Live demo: https://edward-playground.github.io/aidefense-framework/
Guidance and standards for securing the AI/ML software supply chain (models, datasets, code, pipelines). Primarily specs and frameworks; includes vetted TPRM templates.
Normative formats and specifications for transparency and traceability across AI components and dependencies.
- OWASP — AI Bill of Materials (AIBOM)
— Bill of materials format for AI components, datasets, and model dependencies.
Questionnaires and templates to assess external vendors, model providers, and integrators for security, privacy, and compliance.
- FS-ISAC — Generative AI Vendor Evaluation & Qualitative Risk Assessment — Assessment Tool XLSX • Guide PDF — Vendor due-diligence toolkit for GenAI: risk tiering by use case, integration and data sensitivity; questionnaires across privacy, security, model development and validation, integration, legal and compliance; auto-generated reporting.
(Core references and syntheses for orientation and shared language.)
(Authoritative definitions for AI/ML security, governance, and risk—use to align terminology across docs and reviews.)
- NIST — “The Language of Trustworthy AI: An In-Depth Glossary of Terms.” - Authoritative cross-org terminology aligned to NIST AI RMF; useful for standardizing terms across teams.
- ISO/IEC 22989:2022 — Artificial intelligence — Concepts and terminology - International standard that formalizes core AI concepts and vocabulary used in policy and engineering.
(Systematizations of Knowledge (SoK), surveys, systematic reviews, and mapping studies.)
(Reusable classification schemes—clear dimensions, categories, and labeling rules for attacks, defenses, datasets, and risks.)
- CSA — Large Language Model (LLM) Threats Taxonomy - Community taxonomy of LLM-specific threats; clarifies categories/definitions for risk discussion and control mapping.
- ARC — PI (Prompt Injection) Taxonomy - Focused taxonomy for prompt-injection behaviors/variants with practical labeling guidance for detection and defense.
Interactive CTFs and self-contained labs for hands-on security skills (web, pwn, crypto, forensics, reversing). Used to assess practical reasoning, tool use, and end-to-end task execution.
Structured Q&A datasets assessing security knowledge and terminology. Used to evaluate factual recall and conceptual understanding.
Code snippet datasets labeled as vulnerable or secure, often tied to CWEs (Common Weakness Enumeration). Used to evaluate the model’s ability to recognize insecure code patterns and suggest secure fixes.
Adversarial prompt datasets—both text-only and multimodal—designed to bypass safety mechanisms or test refusal logic. Used to test how effectively a model resists jailbreaks and enforces policy-based refusal.
Datasets labeled with whether prompts are benign or malicious (i.e., injection attempts). Used to evaluate an LLM’s ability to detect and neutralize prompt-injection style attacks.
- Microsoft AI Security Learning Path – Free training modules on AI security, covering secure AI model development, risk management, and threat mitigation.
- AWS AI Security Training – Free AWS courses on securing AI applications, risk management, and implementing security best practices in AI/ML environments.
- ISACA — Advanced in AI Security Management (AAISM™) — AI-centric security management certification to manage AI-related risk, implement policy, and ensure responsible/effective use across the organization
- ISACA — Advanced in AI Audit (AAIA™) — Validates ability to audit complex systems and mitigate AI-related risks; domains include AI governance & risk, AI operations, and AI tools/techniques.
- IAPP — Artificial Intelligence Governance Professional (AIGP) — Demonstrates capability to ensure safety and trust in the development, deployment, and ongoing management of ethical AI; aligned training focuses on building trustworthy AI in line with emerging laws/policies.
- NIST AI RMF 1.0 Architect — Certified Information Security — Credential offered by Certified Information Security aligned to NIST AI RMF 1.0 (listed on NICCS); prepares professionals to design and lead AI risk-management programs.
- ISO/IEC 42001 — AI Management System (Lead Implementer, PECB) — Prepares professionals to implement an artificial intelligence management system (AIMS) in accordance with ISO/IEC 42001.
- ISO/IEC 42001 — AI Management System (Lead Auditor, PECB) — Develops expertise to audit artificial intelligence management systems (AIMS) using recognized audit principles, procedures, and techniques.
- ISO/IEC 23894 — AI Risk Management (AI Risk Manager, PECB) — Training & certification focused on identifying, assessing, and mitigating AI-related risks; aligned to frameworks such as ISO/IEC 23894 and NIST AI RMF.
- Nightfall AI Security 101 – A centralized learning hub for AI security, offering an evolving library of concepts, emerging risks, and foundational principles in securing AI systems.
- SANS — AI Cybersecurity Careers — Career pathways poster + training map; helpful baseline skills that transfer to AI security (IR, DFIR, detection, threat hunting).
- Cloud Security Alliance (CSA) AI Security Working Groups – Collaborative research groups focused on AI security, cloud security, and emerging threats in AI-driven systems.
- OWASP Top 10 for LLM & Generative AI Security Risks Project – An open-source initiative addressing critical security risks in Large Language Models (LLMs) and Generative AI applications, offering resources and guidelines to mitigate emerging threats.
- CWE Artificial Intelligence Working Group (AI WG) – The AI WG was established by CWE™ and CVE® community stakeholders to identify and address gaps in the CWE corpus where AI-related weaknesses are not adequately covered, and work collaboratively to fix them.
📌 (More working groups to be added.)
Purpose: Evaluates how AI systems withstand adversarial attacks, including evasion, poisoning, and model extraction. Ensures AI remains functional under manipulation.
NIST AI RMF Alignment: Measure, Manage
- Measure: Identify risks related to adversarial attacks.
- Manage: Implement mitigation strategies to ensure resilience.
Purpose: Assesses AI models for unauthorized modifications, including backdoors and dataset poisoning. Supports trustworthiness and security of model outputs.
NIST AI RMF Alignment: Map, Measure
-
Map: Understand and identify risks to model/data integrity.
-
Measure: Evaluate and mitigate risks through validation techniques.
-
CVE-Bench — @uiuc-kang-lab
— How well AI agents can exploit real-world software vulnerabilities that are listed in the CVE database.
Purpose: Ensures AI security aligns with governance frameworks, industry regulations, and security policies. Supports auditability and risk management.
NIST AI RMF Alignment: Govern
- Govern: Establish policies, accountability structures, and compliance controls.
Purpose: Evaluates AI for risks like data leakage, membership inference, and model inversion. Helps ensure privacy preservation and compliance.
NIST AI RMF Alignment: Measure, Manage
- Measure: Identify and assess AI-related privacy risks.
- Manage: Implement security controls to mitigate privacy threats.
Purpose: Assesses AI for transparency, fairness, and bias mitigation. Ensures AI operates in an interpretable and ethical manner.
NIST AI RMF Alignment: Govern, Map, Measure
- Govern: Establish policies for fairness, bias mitigation, and transparency.
- Map: Identify potential explainability risks in AI decision-making.
- Measure: Evaluate AI outputs for fairness, bias, and interpretability.
- AI Incident Database (AIID)
- MIT AI Risk Repository — Incident Tracker
- AIAAIC Repository
- OECD.AI — AIM: AI Incidents and Hazards Monitor
- Adversarial AI Digest - A digest of AI security research, threats, governance challenges, and best practices for securing AI systems.
- AI Security Research Feed – Continuously updated feed of AI security–related academic papers, preprints, and research indexed from arXiv.
- AI Security Portal – Literature Database – Categorized database of AI security literature, taxonomy, and related resources.
- CSA — The State of AI and Security Survey Report
- CSA — Principles to Practice: Responsible AI in a Dynamic Regulatory Environment
- CSA — AI Resilience: A Revolutionary Benchmarking Model for AI Safety – Governance & compliance benchmarking model.
- CSA — Using AI for Offensive Security
📌 (More to be added – A collection of AI security reports, white papers, and academic studies.)
- AI GOAT
- Gandalf CTF
- Damn Vulnerable LLM Agent
- AI Red Teaming Playground Labs — Microsoft
— Self-hostable lab environment with 12 challenges (direct/indirect prompt injection, metaprompt extraction, Crescendo multi-turn, guardrail bypass).
- The MLSecOps Podcast – Insightful conversations with industry leaders and AI experts, exploring the fascinating world of machine learning security operations.
Curated market maps of tools and vendors for securing LLM and agentic AI applications across the lifecycle.
- OWASP — LLM and Generative AI Security Solutions Landscape
- OWASP — AI Security Solutions Landscape for Agentic AI
- Latio — 2025 AI Security Report – Market trends and vendor landscape snapshot for AI security.
- Woodside Capital Partners — Cybersecurity Sector — A snapshot with vendor breakdowns and landscape view.
A curated list of startups securing agentic AI applications, organized by the OWASP Agentic AI lifecycle (Scope & Plan → Govern). Each company appears once in its best-fit stage based on public positioning, and links point to blog/insights for deeper context. Some startups span multiple stages; placements reflect primary focus.
Inclusion criteria
- Startup has not been acquired
- Has an active blog
- Has an active GitHub organization/repository
Design-time security: non-human identities, agent threat modeling, privilege boundaries/authn, and memory scoping/isolation.
no startups here with active blog and active GitHub account
Secure agent loops and tool use; validate I/O contracts; embed policy hooks; test resilience during co-engineering.
no startups here with active blog and active GitHub account
Sanitize/trace data and reasoning; validate alignment; protect sensitive memory with privacy controls before deployment.
Adversarial testing for goal drift, prompt injection, and tool misuse; red-team sims; sandboxed calls; decision validation.
Sign models/plugins/memory; verify SBOMs; enforce cryptographically validated policies; register agents/capabilities.
no startups here with active blog and active GitHub account
Zero-trust activation: rotate ephemeral creds, apply allowlists/LLM firewalls, and fine-grained least-privilege authorization.
Monitor memory mutations for drift/poisoning, detect abnormal loops/misuse, enforce HITL overrides, and scan plugins—continuous, real-time vigilance for resilient operations as systems scale and self-orchestrate.
Correlate agent steps/tools/comms; detect anomalies (e.g., goal reversal); keep immutable logs for auditability.
Enforce role/task policies, version/retire agents, prevent privilege creep, and align evidence with AI regulations.
- Awesome LLMSecOps — wearetyomsmnv
- OSS LLM Security — kaplanlior
- Awesome LLM Security — corca-ai
- Security for AI — zmre
- Awesome AI Security — DeepSpaceHarbor
- Awesome AI for Cybersecurity — Billy1900
- Awesome ML Security — Trail of Bits
- Awesome MLSecOps — RiccardoBiosas
- MLSecOps References — disesdi
- Awesome ML Privacy Attacks — StratosphereIPS
- Awesome LLM Supply Chain Security — ShenaoW
- Awesome Prompt Injection — FonduAI
- Awesome Jailbreak on LLMs — yueliu1999
- Awesome LM-SSP (Large Model Security, Safety & Privacy) — ThuCCSLab
- Security & Privacy for LLMs (llm-sp) — chawins
- Awesome LVLM Attack — liudaizong
- Awesome ML/SP Papers — gnipping
- Awesome LLM JailBreak Papers — WhileBug
- Awesome Adversarial Machine Learning — man3kin3ko
- LLM Security & Privacy — briland
- Awesome GenAI Security — jassics
- Awesome GenAI CyberHub — Ashfaaq98
- Awesome AI for Security — AmanPriyanshu
- Awesome ML for Cybersecurity — jivoi
- Awesome AI Security — ottosulin
- Awesome AI4DevSecOps — awsm-research
- Prompt Hacking Resources — PromptLabs
- Awesome LALMs Jailbreak — WangCheng0116
- Awesome LRMs Safety — WangCheng0116
- Awesome LLM Safety — ydyjya
- Awesome MCP Security — Puliczek
Acronym | Full Form |
---|---|
AI | Artificial Intelligence |
AGI | Artificial General Intelligence |
ALBERT | A Lite BERT |
BERT | Bidirectional Encoder Representations from Transformers |
BGMAttack | Black-box Generative Model-based Attack |
CBA | Composite Backdoor Attack |
CCPA | California Consumer Privacy Act |
DAN | Do Anything Now |
DNN | Deep Neural Network |
DP | Differential Privacy |
FL | Federated Learning |
GDPR | General Data Protection Regulation |
GA | Genetic Algorithm |
GPT | Generative Pre-trained Transformer |
HIPAA | Health Insurance Portability and Accountability Act |
LM | Language Model |
LLM | Large Language Model |
Llama | Large Language Model Meta AI |
MIA | Membership Inference Attack |
MDP | Masking-Differential Prompting |
MLM | Masked Language Model |
NLP | Natural Language Processing |
OOD | Out Of Distribution |
PI | Prompt Injection |
PII | Personally Identifiable Information |
PAIR | Prompt Automatic Iterative Refinement |
PLM | pre-trained Language Model |
RL | Reinforcement Learning |
RLHF | Reinforcement Learning from Human Feedback |
RoBERTa | Robustly optimized BERT approach |
SGD | Stochastic Gradient Descent |
TAG | Gradient Attack on Transformer-based Language Models |
XLNet | Transformer-XL with autoregressive and autoencoding pre-training |
Contributions are welcome! If you have new resources, tools, or insights to add, feel free to submit a pull request.
This repository follows the Awesome Manifesto guidelines.
© 2025 Tal Eliyahu. Licensed under the MIT License. See LICENSE
.