Skip to content

TalEliyahu/Awesome-AI-Security

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome AI Security Awesome

Curated resources, research, and tools for securing AI systems.


Table of Contents


Best Practices and Security Standards

Governance & Management Frameworks

Standards, Controls & Top 10s

Controls & Verification Standards

Top 10s

Scoring & Rating Systems

Testing & Red Teaming

Implementation Guides & Best Practices

Agentic Systems — Governance, Standards & Guides


Tools

Inclusion criteria (open-source tools): must have 220+ GitHub stars, active maintenance in the last 12 months, and ≥3 contributors.

Prompt-Injection Detection & Mitigation

Detect and stop prompt-injection (direct/indirect) across inputs, context, and outputs; filter hostile content before it reaches tools or models.

  • (none from your current list yet)

Jailbreak & Policy Enforcement (Guardrails)

Enforce safety policies and block jailbreaks at runtime via rules/validators/DSLs, with optional human-in-the-loop for sensitive actions.

Model Artifact Scanners

Analyze serialized model files for unsafe deserialization and embedded code; verify integrity/metadata and block or quarantine on fail.

Agent Tooling and MCP Security

Scan/audit MCP servers & client configs; detect tool poisoning, unsafe flows; constrain tool access with least-privilege and audit trails.

Tool manifest/metadata validators

Servers & Dev tooling

Execution Sandboxing for Agent Code

Run untrusted or LLM-triggered code in isolated sandboxes (FS/network/process limits) to contain RCE and reduce blast radius.

  • E2B GitHub Repo stars — SDK + self-hostable infra to run untrusted, LLM-generated code in isolated cloud sandboxes (Firecracker microVMs).

Gateways & Policy Proxies

Centralize auth, quotas/rate limits, cost caps, egress/DLP filters, and guardrail orchestration across all model/providers.

  • (none from your current list yet)

Code Review

  • Claude Code Security Reviewer GitHub Repo stars - An AI-powered security review GitHub Action using Claude to analyze code changes for security vulnerabilities.
  • Vulnhuntr GitHub Repo stars - Vulnhuntr leverages the power of LLMs to automatically create and analyze entire code call chains starting from remote user input and ending at server output for detection of complex, multi-step, security-bypassing vulnerabilities that go far beyond what traditional static code analysis tools are capable of performing.

Red-Teaming Harnesses & Automated Security Testing

Automate attack suites (prompt-injection, leakage, jailbreak, goal-based tasks) in CI; score results and produce regression evidence.

Prompt-injection test suites

Data-leakage/secret-exfil test suites

Jailbreak catalogs & adversarial prompts

Adversarial-robustness (evasion) toolkits

Goal-directed agent attack tasks

CI pipelines & regression gates

  • promptfoo GitHub Repo stars
  • Agentic Radar GitHub Repo stars
  • DeepTeam GitHub Repo stars
  • Buttercup GitHub Repo stars — Trail of Bits’ AIxCC Cyber Reasoning System: runs OSS-Fuzz–style campaigns to find vulns, then uses a multi-agent LLM patcher to generate & validate fixes for C/Java repos; ships SigNoz observability; requires at least one LLM API key.

Scoring/leaderboards & evidence reports

  • (none from your current list yet)

Supply Chain: AI/ML BOM and Attestation

Generate and verify AI/ML BOMs, signatures, and provenance for models/datasets/dependencies; enforce allow/deny policies.

  • (none from your current list yet)

Vector/Memory Store Security

Harden RAG memory: isolate namespaces, sanitize queries/content, detect poisoning/outliers, and prevent secret/PII retention.

  • (none from your current list yet)

Data/Model Poisoning Defenses

Detect and mitigate dataset/model poisoning and backdoors; validate training/fine-tuning integrity and prune suspicious behaviors.

Sensitive Data Leak Prevention (DLP for AI)

Prevent secret/PII exfiltration in prompts/outputs via detection, redaction, and policy checks at I/O boundaries.

  • Presidio GitHub Repo stars — PII/PHI detection & redaction for text, images, and structured data; use as a pre/post-LLM DLP filter and for dataset sanitization.

Monitoring, Logging & Anomaly Detection

Collect AI-specific security logs/signals; detect abuse patterns (PI/jailbreak/leakage), enrich alerts, and support forensics.

  • LangKit GitHub Repo stars — LLM observability metrics toolkit (whylogs-compatible): prompt-injection/jailbreak similarity, PII patterns, hallucination/consistency, relevance, sentiment/toxicity, readability.

  • Alibi Detect GitHub Repo stars — Production drift/outlier/adversarial detection for tabular, text, images, and time series; online/offline detectors with TF/PyTorch backends; returns scores, thresholds, and flags for alerting.


Attack & Defense Matrices

Matrix-style resources covering adversarial TTPs and curated defensive techniques for AI systems.

Attack

Defense


Checklists


Supply Chain Security

Guidance and standards for securing the AI/ML software supply chain (models, datasets, code, pipelines). Primarily specs and frameworks; includes vetted TPRM templates.

Standards & Specs

Normative formats and specifications for transparency and traceability across AI components and dependencies.

  • OWASP — AI Bill of Materials (AIBOM) GitHub Repo stars — Bill of materials format for AI components, datasets, and model dependencies.

Third-Party Assessment

Questionnaires and templates to assess external vendors, model providers, and integrators for security, privacy, and compliance.

  • FS-ISAC — Generative AI Vendor Evaluation & Qualitative Risk AssessmentAssessment Tool XLSXGuide PDF — Vendor due-diligence toolkit for GenAI: risk tiering by use case, integration and data sensitivity; questionnaires across privacy, security, model development and validation, integration, legal and compliance; auto-generated reporting.

Foundations: Glossary, SoK/Surveys & Taxonomies

(Core references and syntheses for orientation and shared language.)

Glossary

(Authoritative definitions for AI/ML security, governance, and risk—use to align terminology across docs and reviews.)

SoK & Surveys

(Systematizations of Knowledge (SoK), surveys, systematic reviews, and mapping studies.)

Taxonomy

(Reusable classification schemes—clear dimensions, categories, and labeling rules for attacks, defenses, datasets, and risks.)


Datasets

Cybersecurity Skills

Interactive CTFs and self-contained labs for hands-on security skills (web, pwn, crypto, forensics, reversing). Used to assess practical reasoning, tool use, and end-to-end task execution.

Cybersecurity Knowledge

Structured Q&A datasets assessing security knowledge and terminology. Used to evaluate factual recall and conceptual understanding.

Secure Coding & Vulnerability Detection

Code snippet datasets labeled as vulnerable or secure, often tied to CWEs (Common Weakness Enumeration). Used to evaluate the model’s ability to recognize insecure code patterns and suggest secure fixes.

Jailbreak & Guardrail Evaluation

Adversarial prompt datasets—both text-only and multimodal—designed to bypass safety mechanisms or test refusal logic. Used to test how effectively a model resists jailbreaks and enforces policy-based refusal.

Prompt Injection & Malicious Prompt Detection

Datasets labeled with whether prompts are benign or malicious (i.e., injection attempts). Used to evaluate an LLM’s ability to detect and neutralize prompt-injection style attacks.


Courses

  • Microsoft AI Security Learning Path – Free training modules on AI security, covering secure AI model development, risk management, and threat mitigation.
  • AWS AI Security Training – Free AWS courses on securing AI applications, risk management, and implementing security best practices in AI/ML environments.

Certifications

Governance & Audit


Learning Resources

  • Nightfall AI Security 101 – A centralized learning hub for AI security, offering an evolving library of concepts, emerging risks, and foundational principles in securing AI systems.

Foundations

  • SANS — AI Cybersecurity Careers — Career pathways poster + training map; helpful baseline skills that transfer to AI security (IR, DFIR, detection, threat hunting).

Research Working Groups

📌 (More working groups to be added.)


Communities & Social Groups


Benchmarking

Benchmarks

Categories of AI Security Benchmarks

Robustness & Adversarial Resilience

Purpose: Evaluates how AI systems withstand adversarial attacks, including evasion, poisoning, and model extraction. Ensures AI remains functional under manipulation.
NIST AI RMF Alignment: Measure, Manage

  • Measure: Identify risks related to adversarial attacks.
  • Manage: Implement mitigation strategies to ensure resilience.

Model & Data Integrity

Purpose: Assesses AI models for unauthorized modifications, including backdoors and dataset poisoning. Supports trustworthiness and security of model outputs.
NIST AI RMF Alignment: Map, Measure

  • Map: Understand and identify risks to model/data integrity.

  • Measure: Evaluate and mitigate risks through validation techniques.

  • CVE-Bench — @uiuc-kang-lab GitHub Repo stars — How well AI agents can exploit real-world software vulnerabilities that are listed in the CVE database.

Governance & Compliance

Purpose: Ensures AI security aligns with governance frameworks, industry regulations, and security policies. Supports auditability and risk management.
NIST AI RMF Alignment: Govern

  • Govern: Establish policies, accountability structures, and compliance controls.

Privacy & Data Protection

Purpose: Evaluates AI for risks like data leakage, membership inference, and model inversion. Helps ensure privacy preservation and compliance.
NIST AI RMF Alignment: Measure, Manage

  • Measure: Identify and assess AI-related privacy risks.
  • Manage: Implement security controls to mitigate privacy threats.

Explainability & Trustworthiness

Purpose: Assesses AI for transparency, fairness, and bias mitigation. Ensures AI operates in an interpretable and ethical manner.
NIST AI RMF Alignment: Govern, Map, Measure

  • Govern: Establish policies for fairness, bias mitigation, and transparency.
  • Map: Identify potential explainability risks in AI decision-making.
  • Measure: Evaluate AI outputs for fairness, bias, and interpretability.

Incident Response

Incident Repositories, Trackers & Monitors

Guides & Playbooks


Newsletter

  • Adversarial AI Digest - A digest of AI security research, threats, governance challenges, and best practices for securing AI systems.

Conferences and Events


Reports and Research

Research Feed

Reports

📌 (More to be added – A collection of AI security reports, white papers, and academic studies.)


CTFs & Challenges


Podcasts

  • The MLSecOps Podcast – Insightful conversations with industry leaders and AI experts, exploring the fascinating world of machine learning security operations.

Market Landscape

Curated market maps of tools and vendors for securing LLM and agentic AI applications across the lifecycle.


Startups Blogs

A curated list of startups securing agentic AI applications, organized by the OWASP Agentic AI lifecycle (Scope & Plan → Govern). Each company appears once in its best-fit stage based on public positioning, and links point to blog/insights for deeper context. Some startups span multiple stages; placements reflect primary focus.

Inclusion criteria

  1. Startup has not been acquired
  2. Has an active blog
  3. Has an active GitHub organization/repository

Scope & Plan

Design-time security: non-human identities, agent threat modeling, privilege boundaries/authn, and memory scoping/isolation.

no startups here with active blog and active GitHub account

Develop & Experiment

Secure agent loops and tool use; validate I/O contracts; embed policy hooks; test resilience during co-engineering.

no startups here with active blog and active GitHub account

Augment & Fine-Tune Data

Sanitize/trace data and reasoning; validate alignment; protect sensitive memory with privacy controls before deployment.

Test & Evaluate

Adversarial testing for goal drift, prompt injection, and tool misuse; red-team sims; sandboxed calls; decision validation.

Release

Sign models/plugins/memory; verify SBOMs; enforce cryptographically validated policies; register agents/capabilities.

no startups here with active blog and active GitHub account

Deploy

Zero-trust activation: rotate ephemeral creds, apply allowlists/LLM firewalls, and fine-grained least-privilege authorization.

Operate

Monitor memory mutations for drift/poisoning, detect abnormal loops/misuse, enforce HITL overrides, and scan plugins—continuous, real-time vigilance for resilient operations as systems scale and self-orchestrate.

Monitor

Correlate agent steps/tools/comms; detect anomalies (e.g., goal reversal); keep immutable logs for auditability.

Govern

Enforce role/task policies, version/retire agents, prevent privilege creep, and align evidence with AI regulations.


Related Awesome Lists


Common Acronyms

Acronym Full Form
AI Artificial Intelligence
AGI Artificial General Intelligence
ALBERT A Lite BERT
BERT Bidirectional Encoder Representations from Transformers
BGMAttack Black-box Generative Model-based Attack
CBA Composite Backdoor Attack
CCPA California Consumer Privacy Act
DAN Do Anything Now
DNN Deep Neural Network
DP Differential Privacy
FL Federated Learning
GDPR General Data Protection Regulation
GA Genetic Algorithm
GPT Generative Pre-trained Transformer
HIPAA Health Insurance Portability and Accountability Act
LM Language Model
LLM Large Language Model
Llama Large Language Model Meta AI
MIA Membership Inference Attack
MDP Masking-Differential Prompting
MLM Masked Language Model
NLP Natural Language Processing
OOD Out Of Distribution
PI Prompt Injection
PII Personally Identifiable Information
PAIR Prompt Automatic Iterative Refinement
PLM pre-trained Language Model
RL Reinforcement Learning
RLHF Reinforcement Learning from Human Feedback
RoBERTa Robustly optimized BERT approach
SGD Stochastic Gradient Descent
TAG Gradient Attack on Transformer-based Language Models
XLNet Transformer-XL with autoregressive and autoencoding pre-training

Contributing

Contributions are welcome! If you have new resources, tools, or insights to add, feel free to submit a pull request.

This repository follows the Awesome Manifesto guidelines.


License

License: MIT

© 2025 Tal Eliyahu. Licensed under the MIT License. See LICENSE.

About

Curated resources, research, and tools for securing AI systems

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published