Hrittik Roy | SCALE

Hrittik is currently Platform Advocate at Loft Labs and a CNCF Ambassador, who has previously worked at various startups helping the scale their content efforts. He loves diving deep into distributed systems and creating articles on them and has spoken at conferences such as Azure Cloud Summit, UbuCon Asia and Kubernetes Community Days - Lagos and Chennai among others! His best days are when he finds ways to create impact in the communities he’s a part of either by code, content, or mentorship!

Presentations

23x

Help! My LLM is a Resource Hog: How We Tamed Inference with Kubernetes and Open Source Muscle

A client came to us with a problem we’re seeing more and more, their large language model (LLM) was deployed, but inference was painfully slow, GPU usage was unpredictable, and costs were spiraling out of control. Kubernetes alone wasn’t enough, they needed a production-ready, efficient, and scalable stack.

In this talk, we’ll walk through how we diagnosed and solved the issue using open-source CNCF tools, turning a chaotic deployment into a well-oiled inference machine.

You’ll learn how to:
1. Use KServe and Kubeflow to serve LLMs reliably.
2. Benchmark and auto-scale workloads using Volcano and KEDA while optimizing resource usage and latency.
3. Track model performance and drift with Prometheus, Grafana, and OpenTelemetry.

We’ll share benchmarks, architectures, and lessons from the field, all based on open-source tooling you can try today. Whether you’re running LLMs at scale or just exploring GenAI, this talk is packed with real-world solutions to help you do more with less.

See Presentation