Righting a Sinking Ship
Ever been stuck with a system that just can’t heal? A system that continuously falls over or fails spectacularly at seemingly random moments? Working with modern systems, especially containerized systems distributed across many clouds, can be difficult and frustrating for anyone on call when something goes wrong. I've certainly be there. Let’s dig into where you can gather data from a broken system, how to get data if you're not lucky enough to have logs, how you can figure out what’s happening using that data, and how best to act on that data. We'll also explore common trouble spots that might be hidden in that data for you to find. Finally, we'll take a look specifically at common issues with containers and when they'll appear so they're easier to spot.