Migrating to OpenTelemetry

The presentation will take place in Ballroom F on Saturday, March 7, 2026 - 12:30 to 13:30

Migrating to OpenTelemetry sounds straightforward on paper: instrument your services, ship traces/metrics/logs, and enjoy consistent observability. In practice—especially in a heterogeneous environment with many teams and varying levels of ownership—it’s a long-running migration full of sharp edges. This session tells the story of moving from a patchwork of legacy instrumentation and vendor-specific agents to OpenTelemetry, and why we chose to make self-service adoption the core strategy rather than a centralized “platform team does everything” approach.

We’ll walk through the real challenges that show up after the first few successful demos: establishing semantic conventions and span naming that stay consistent across languages, managing propagation and context reliably end-to-end, and balancing “golden path” defaults against the reality of edge-case services. We’ll cover collector architecture choices (local vs. gateway, scaling and resiliency patterns, routing rules, and multi-tenant concerns), configuration drift and sprawl, and the operational risks of changing signal pipelines without breaking incident response. Along the way, we’ll dig into the surprisingly tricky parts of cost management—cardinality explosions, sampling tradeoffs, attribute hygiene, and how to prevent well-meaning instrumentation from turning into a billing event.

The heart of the talk is how we tackled adoption friction with a home-grown, self-service solution built on Pulumi. Using infrastructure-as-code as the delivery mechanism, we built a paved road that teams could opt into with minimal changes: opinionated templates, safe defaults, automated wiring of collectors and exporters, and guardrails that enforce the things you only learn you need after being burned (version pinning, schema evolution, consistent resource attributes, and “known good” pipelines). We’ll show what worked, what didn’t, and how we iterated—plus the lessons learned about developer experience, change management, and how to measure real adoption beyond “we shipped a library.”

You’ll leave with concrete patterns you can apply to your own organization: how to design a migration plan that doesn’t stall, how to structure a self-service platform that scales with your teams, and how to keep the observability story coherent while your systems (and people) keep changing.