Kubernetes and Hybrid Data platforms at scale


A resource-effective, and cost-optimized multi-tenant, distributed big data ecosystem was one of the challenging problems for running SQL queries and batch workloads together. 

In this talk, we will share a detailed case study about our experience in running clusters in a multi-tenant manner by utilizing the h/w resource at best and still maintaining the characteristics of the legacy Hadoop-based cluster's capacity management in K8s.


This session will be a deep dive into 

- the issues that we faced while running multi-tenant batch workloads, 

- the learning we had from migrating SQL and Spark jobs from Hadoop clusters, 

- the systems we built to support next-gen observability. 


We will share our experience with running workloads for SQL, Batch (Spark), and Machine Learning experiences to highlight the separation of Storage from the Compute and the autoscaling capabilities with preemption for better resource utilization from using scheduler plugin-based Apache Yunikorn K8s scheduler.


Here is a case study about running legacy big data workloads like Apache Hive SQL queries, Apache Spark batch jobs, etc., in an enterprise-ready, multi-tenant Kubernetes cluster at scale. We will also walk through the challenges of various customers with noisy workloads in a shared multi-tenant K8s environment and an effective hierarchical quota management solution on K8s. This solution solved the large-scale capacity limitation issues with effective resource scheduling with preemption and elastic over-allocation that helped to run Spark jobs effectively in a dense resource-optimized manner.

Ballroom G
Friday, March 15, 2024 - 17:00 to 18:00