When Things Go Bump in the Night

Audience:
Topic:

Software fails. It's a fact of life and when it does we need to be able to act quickly to fix it. In this talk we'll go over some processes that you might want to put into place to create an effective on-call and incident response system.

Some topics to include would be:

* Monitoring services and different ways to do it

* SLI/SLA/SLO

* On-call rotations

* Runbooks and Documentation

* Incident command. Different rolls played by different people in your organization

* Trying to prevent failure by causing it yourself during normal hours and practicing for a real issue

* After action reviews

Room:
Room 101
Time:
Saturday, March 10, 2018 - 11:30 to 12:30