Accelerate Distributed SQL Workloads for Big Data in the Cloud

Audience:
Topic:

Presto is an open-source distributed SQL query engine used widely by Facebook, Uber, Twitter, Pinterest, and many other internet companies. Alluxio is an open-source data orchestration layer that brings data close to compute for big data and AI/ML workloads in the cloud. Iceberg is an open-source, high-performance format for huge analytics tables. The Presto + Alluxio + Iceberg stack brings the reliability and simplicity of SQL queries to big data in the cloud.  

In this talk, Beinan and Chunxu, both of whom are Presto Committers, will introduce the core principles of Presto, Alluxio and Iceberg. They will demonstrate the performance, scalability, and reliability of the stack combining Presto + Iceberg + Alluxio and how these features make it a good fit for a variety of use cases, including real-time dashboards, A/B testing, ad-hoc analytics, and warehouse ETL jobs. As part of the talk, they will also discuss the challenges and lessons learned when migrating Presto and Alluxio to the cloud.

What you’ll learn:

  • Deep knowledge of Presto’s internal architecture

  • Presto configurations and tuning tips

  • Deploy Presto + Alluxio to Kubernetes clusters in AWS

  • Understand Iceberg table data layout

  • Hive-metastore and Iceberg Native Catalog

  • How Presto reads Parquet files in AWS S3

  • How to join Iceberg tables in S3 with data from Hive table

Room:
Ballroom B
Time:
Sunday, March 12, 2023 - 13:30 to 14:30
Audio/Video: