Better Reliability with SLOs
Uptime is a poor measure of reliability. When Uptime is the core metric that you evaluate on, it might drive behaviors that optimize for that metric, such as slowing down innovation. Agile development’s fail-fast approach coupled with distributed applications and dynamic infrastructure requires us to have a better understanding of reliability.
Service level objectives (SLOs) are a measurement of how reliably you plan to run your new service. They help you understand the true health of your systems and how your end users experience them, as well as leaving you with room to innovate. A poorly-defined SLO—measuring the wrong thing—means you have a poor understanding of how your customers experience your service. In this talk you’ll learn how to define SLOs, and choose the right service level indicators to ensure reliability. From here, you will learn how to think about your SLOs, describe Error Budgets and why you want to use them, and how to have meaningful conversations around realistic availability so that you don’t lose sight of innovation that comes from experimentation.