As organizations race to adopt generative AI, platform engineering teams are being asked to provide more than infrastructure. They must now support multi-node inference workloads, serve models efficiently on GPUs, and manage a growing sprawl of APIs and providers, all while keeping the developer experience simple and the system safe.

This session takes a practical look at what happens when AI becomes just another workload on the platform. It explores Kubernetes-native approaches such as kserve for distributed model serving, high-throughput serving frameworks like vLLM, and gateways such as LiteLLM that standardize access to many models as a service. It also examines how golden paths, internal developer portals, and automation practices evolve under these new pressures.

The talk is designed for platform engineers and DevOps practitioners who want to understand what actually changes when AI enters the picture. Attendees will walk away with a set of concrete patterns for supporting AI at scale while holding onto the principles that made their platforms successful in the first place.