A curated list of awesome Chaos Engineering resources.
Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the system’s capability to withstand turbulent conditions in production. - Principles Of Chaos Engineering website.
Please take a look at the contribution guidelines first. Contributions are always welcome!
- Culture
- Books
- Education
- Notable Tools
- Papers
- Blogs & Newsletters
- Conferences & Meetups
- Forums
- Principles Of Chaos Engineering
- Chaos Community
- Chaos Engineering
- O'Reilly Velocity San Jose 2017: Precision Chaos
- The Discipline of Chaos Engineering
- Chaos Monkey for Fun and Profit
- Fault Injection in Production: Making the case for resilience testing
- Lord of Chaos - Becoming a Chaos Engineer
- Chaos testing - Preventing failure by instigation
- Orchestrated Chaos
- Choose your own adventure: Chaos Engineering - Video & Slides
- AMA Chaos Engineering + DiRT
- SRECON17: Principles of Chaos Engineering
- Chaos & Intuition Engineering at Netflix
- Mastering Chaos - A Netflix Guide to Microservices
- Too big to test: Breaking a production brokerage platform without causing financial devastation
- Inside Azure Search: Chaos Engineering
- Netflix, the Simian Army, and the culture of freedom and responsibility
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- The Verification of a Distributed System by Caitie McCaffrey
- The Journey to Chaos Engineering begins with a single step - Bruce Wong and James Burns (Twilio)
- Chaos Engineering by Lorin Hochstein
- Chaos Engineering - Casey Rosenthal
- The Road to Chaos - Velocity 2017- video & slides
- How Netflix DDoS’d Itself To Help Protect the Entire Internet
- 10 Years of Crashing Google
- Weathering the Unexpected
- SRECON17: Breaking Things on Purpose
- PuppetConf 2016: Chaos Patterns - Architecting for Failure in Distributed Systems
- Ship More, Sink Less - Changing Chaos Engineering and Distributed Tracing
- Cloudcast - Discipline of Chaos Engineering
- Software Engineering Daily - Failure Injection with Kolton Andrus podcast
- Responding to Failures in Playback Features with Haley Tucker podcast
- Distributed Chaos Operations
- "Antics, drift, and chaos" by Lorin Hochstein
- Chaos Engineering: Building Confidence in System Behavior through Experiment
- Site Reliability Engineering: How Google Runs Production Systems
- The Practice Of Cloud System Administration: Designing and Operating Large Distributed Systems
- Antifragile Systems and Teams
- A Chaos Engineering Bootcamp for O'Reilly Velocity 2017 - Slides & Source code
- Your First Chaos Experiment
- Chaos Engineering 101
- A Primer on Automating Chaos
- Chaos Monkey - A resiliency tool that helps applications tolerate random instance failures.
- The Simian Army - A suite of tools for keeping your cloud operating in top form
- orchestrator - MySQL replication topology management and HA
- kube-monkey - An implementation of Netflix's Chaos Monkey for Kubernetes clusters
- Gremlin Inc. - Failure as a Service
- Pumba - Chaos testing and network emulation for Docker containers (and clusters)
- Chaos Toolkit - A chaos engineering toolkit to help you build confidence in your software system.
- Wiremock - API mocking (Service Virtualization) which enables modeling real world faults and delays
- MockLab - API mocking (Service Virtualization) as a service which enables modeling real world faults and delays
- Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-Intensive Systems
- Lineage-driven Fault Injection
- Automating Failure Testing Research at Internet Scale
- Principles of Antifragile Software
- Netflix Technology Blog - Learn more about how Netflix designs, builds, and operates our systems and engineering organizations
- Production Ready - A mailing list about building resilient infrastructure and tools.
- SRE Weekly - Weekly Site Reliability Newsletter.
- Site Reliability Engineering resources - A curated list of awesome Site Reliability and Production Engineering resources.
- SysAdvent - One article for each day of December, ending on the 25th article.
- Gremlin Blog - Blogs on Chaos Engineering from Gremlin Inc.
- O’Reilly Systems Engineering and Operations Newsletter - Weekly systems engineering and operations news and insights from industry insiders.
- SRECon Conferences - The Official SRE Conference.
- LISA Conferences - Prominent Conference About SysAdmin/DevOps/SRE.
- O'Reilly Velocity Conference
- Chaos Engineering Community Meetup Group - Bay Area Meetup group for Chaos Engineers
- London Chaos Engineering Community