SysAdmin to SRE: Creating Capacity to Make Tomorrow Better Than Today
Wouldn't everyone in operations love more time to work on exciting projects? Build out new platforms, improve performance, contribute to open source projects focus on security, level-up their automation — all things that add value to your companies and advance your career. But instead, the life of a traditional systems administrator is often buried in interruptions and repetitive work. Imagine the things you could do, if you just had the time to get to it.
Then along comes a new way of working and a new role called Site Reliability Engineering (SRE). But SRE almost seems too good to be true! People are doing what systems administrators used to do, but getting to spend more than 50% of their time doing engineering work that adds enduring value to their company? How can less than half of these SREs' time be wasted on the interruptions, repetitive work, and drudgery that seem to consume most of the traditional systems administrator's time? And do this with the same or less headcount?
This talk will first take a close look at what SRE is and what SRE isn't. We will break down the principles behind the SRE movement and highlight where SRE departs from the current conventional wisdom of Operations and Systems Administration work. You'll learn about key concepts like Toil, SLOs, Error Budgets, and Shared Responsibility Models.
Next, we'll look at how to move to an SRE style of working. We'll look at how traditional operations beliefs and practices can leave organizational scar tissue that is difficult to overcome. We'll examine examples of how silos, excessive toil, reliance on queues, and incorrectly applied governance models undermine the adoption of SRE principles and practices in the enterprise. We'll also look at the individual skills and mindset changes that you'll need to adopt an SRE way of working.
You'll leave this talk with an appreciation for how SRE can create the capacity you need to make tomorrow better than today.