Just as at home with electro-acoustic synthesizer electronics as with site reliability engineering, I find joy in operating inherently chaotic complex systems. I have a passion for exploring relationships between the artistic mind and the operation of socio-technical systems: my research on improvisation and intuition spans both the arts and resilience engineering. My experiences have led me to talk about reliability and resilience at LISA, SRECon, Music City Tech, Disney JETA, RE:Deploy, CMG ObservabilityCon, DevWeek, and several years at SCaLE.

Presentations

23x

Metrics As Music: an Open Source Symphony

Some have dreamed of the day where we can plug our complex systems into stereo speakers and know when there's trouble just by listening to the result. Monteverdi is a new Open Source platform that rethinks Observability and gets us closer to the dream.

This talk is a tour of application features, the pattern matching algorithm, a modular Plugin system that enables MIDI output, the TDD-based approach in Golang, and a look at its own metrics in OpenTelemetry. Along the way we dig into technical details like using GitHub Actions with GoReleaser to publish separate objects, or how it can be extended with Plugins to employ AI. The app will be displayed live and demoed, making sound through a MIDI device and DIY setup, using live system metrics to power the music.

See Presentation
20x

On-Call Reprised and Rejuvenated

Being On-Call in software is not as clearly defined as other industries. The start-up nature of things has led to companies implementing emergency Call in myriad ways, because there are advantages to picking what works for your small org with little-to-few IT/Ops staff. Blameless was such a company once upon a time, with SRE making up most of a single monolithic rotation for anything that went wrong. To keep up with scaling the operation, we worked many months to introspect how we could repair our painful rotation and implement continuous improvement, while making On-Call less painful.

See Presentation
13x

Measuring Distributed Databases Across the Globe

There are important distinctions between 'using' a key/value store in practice and 'operating' a distributed database in the enterprise. While it's crucial for engineers to understand how the database itself operates, it's even more important for operators to be intimately familiar with the entire ecosystem in a global network. 

See Presentation
18x

The Practice of Practice: Teamwork in Complexity

The talk will unravel the methodology around how we humans come together and operate complex software systems by taking a closer look at intuition through the eyes of performing in a music ensemble. It will introduce the concept of Fundamental Common Ground Breakdown and how this interrupts our efforts to collaborate and respond to events and incidents. A Chaos Engineering Game Day walkthrough will show that intuition is not an act of instinct, but a developed ability based on careful analysis and practice. It shows being inspired by working together in tech is a thing, like playing in a band.

See Presentation