Python infrastructure, which includes PyPI, documentation hosting, mailing lists, and bug tracking, serves millions of developers worldwide every day. When something breaks, we need to know immediately. But observability at scale comes with real costs, both in tools and in engineering time.
This talk shares real lessons learned from managing observability for Python Software Foundation infrastructure, where we balance the need for comprehensive monitoring against budget and operational constraints.
We leverage self-hosted Grafana/Loki alongside Datadog (provided through their open source program). While not everyone has access to donated tools, the principles of building cheap yet good observability apply whether you're self-hosting open source solutions or working with commercial platforms.



