A K8S Operator for Enabling Hot Restarts of Stateful Applications


Kubernetes (K8s) is the de facto cloud operating system for enterprises implementing hybrid and multi-cloud architectures with greater efficiencies. As a result, an increasing percentage of the applications are stateful.


However, K8s’ high frequency of pod kills, and evictions makes it challenging to operate non-fault-tolerant stateful applications that were never designed for frequent cold restarts. This results in lower user productivity, lower resource utilization, and higher administrative costs.

In this talk, we will describe the technology and use cases for a Kubernetes operator that addresses these operational challenges by enabling stateful applications to resume from the point where they were interrupted gracefully. The underlying technology will be demonstrated for CPU-only and GPU-accelerated stateful applications. Operations such as node upgrades, running on Spot instances, and vertical and horizontal autoscaling will become less disruptive to Stateful applications. Kubernetes administrators can leverage the GPU capabilities to increase utilization and achieve better scheduling flexibility for AI/ML pipelines.

Lastly, a brief preview of the CXL industry consortium's new memory architecture and how it will provide K8s with even greater elasticity, scaling, and composability concerning memory will be described.

Ballroom G
Sunday, March 17, 2024 - 13:45 to 14:45