Comprehensive observability requires collecting and correlating metrics, logs, and traces—the three pillars. But implementing this without vendor lock-in or overwhelming operational complexity is challenging. This advanced workshop solves this problem using open-source CNCF standards.

Workshop Architecture:

You'll work with a real microservices application (OpenTelemetry Demo) running on Amazon EKS, implementing production-ready observability using:

- OpenTelemetry: CNCF standard for collecting all telemetry types
- Prometheus: Time-series metrics storage (AWS Managed Prometheus)
- OpenSearch: Logs and distributed traces (Amazon OpenSearch Service)
- Grafana: Unified visualization (Amazon Managed Grafana)

While we use AWS managed services for convenience and to avoid infrastructure management, all concepts and configurations transfer to self-hosted Prometheus, Grafana, and OpenSearch deployments.

Hands-On Modules:

Module 1: Getting Started (20 min)

- Explore pre-deployed OpenTelemetry Demo microservices application
- Access Grafana workspace via SAML authentication
- Access OpenSearch through secure proxy
- Generate sample telemetry data

Module 2: Metrics Pipeline (25 min)

- Create Prometheus workspace for metrics storage
- Deploy OpenTelemetry collector with SigV4 authentication
- Configure Prometheus data source in Grafana with IAM roles
- Build performance dashboards: request rates, latency percentiles, error rates (RED metrics)
- Implement service-level indicators (SLIs)

Module 3: Logs Pipeline (30 min)

- Create OpenSearch Ingestion Service (OSIS) pipeline for centralized logging
- Configure OpenTelemetry collector to send structured logs with trace correlation
- Set up OpenSearch data source in Grafana
- Build log analysis dashboards with filtering by service, severity, and trace ID
- Correlate logs with metrics using dynamic Grafana variables

Module 4: Traces Pipeline (25 min)

- Create OSIS pipeline for distributed tracing using trace analytics blueprint
- Configure OpenTelemetry collector for trace collection
- Set up OpenSearch traces data source in Grafana
- Analyze service maps showing dependencies and latency
- Build unified dashboard correlating metrics → logs → traces
- Investigate performance bottlenecks using trace timelines

Bonus Module: Alerting (20 min, if time permits)

- Create recording rules in Prometheus
- Configure Alertmanager routing
- Set up alerting rules based on metrics
- Visualize alerts in Grafana

Why OpenTelemetry?

OpenTelemetry is the CNCF standard for observability instrumentation, supported by all major vendors and open-source tools. By using OpenTelemetry, you avoid vendor lock-in—the same instrumentation works with Prometheus, Datadog, New Relic, Honeycomb, or any OTLP-compatible backend.

Why Managed Services?

We use AWS managed services to eliminate toil (no Prometheus operator management, no Grafana upgrades, no OpenSearch cluster tuning). This lets us focus on observability patterns, not infrastructure operations. However, all OpenTelemetry configurations, Grafana dashboards, and PromQL queries work identically with self-hosted deployments.

What You'll Leave With:

- OpenTelemetry collector configurations for metrics, logs, and traces
- Production-ready Grafana dashboards with correlation capabilities
- Understanding of OTLP (OpenTelemetry Protocol)
- Service mesh observability patterns
- Reusable Kubernetes manifests
- Practical experience troubleshooting microservices using all three pillars

Prerequisites:

- Strong Kubernetes experience (kubectl, deployments, services)
- AWS CLI familiarity
- Understanding of metrics, logs, and traces concepts
- Laptop with browser access (lab environment provided)

Target Audience:

DevOps engineers, SREs, platform engineers responsible for observability infrastructure. This is a level 400 workshop—expect hands-on command-line work and troubleshooting.