Drawing from hands-on experience of automatically injecting failure to systems to find problems proactively in a high-availability environment, we learn to design a chaos engineering experiment.
Do you know if your application or service will behave accordingly on events you expect should never happen? If you are asked a high availability, does your application design deliver on meeting those service level objectives? What if your environment is hostile and everything seems to be falling apart, is there a pattern to how your service responds? Chaos engineering is a disciplined approach to identify failures before they become outages.
Proactively testing how a system responds under stress, you can identify and fix failures before they end up in the news. You “break things on purpose” to learn how to build more resilient systems and compare what you think will happen to what actually happens in your systems.
In this talk, we will review some key concepts of chaos engineering and failure injection and the benefits of it. Through practical examples, you will learn the steps that you need to take to create automated production testing scripts with chaos engineering. Which chaos engineering experiments I should perform first? How do I plan for the first chaos engineering experiment? What can go wrong with my distributed application? Those and other questions get answered, so you can have an actionable plan to prevent failures before they happen.