Building a self-service data pipeline with Apache Spark

Audience:

Topic:

At ZipRecruiter, we are currently building our next generation in house streaming data platform to enable our 10 person data services team to support 20 distinct dev teams by providing a self-service system.

I’ll share the architecture we design based on the trade-offs we considered and the choices we’ve made.

Building a data pipeline for stats and analysis is a big job. We have a cornucopia of open source tools to choose from and so many decisions to make regarding:

Tools
orchestration
storage formats
streaming compute
SQL integration
data ingress, egress
job vetting
data integrity

Room:

Ballroom H

Time:

Saturday, March 10, 2018 - 15:00 to 16:00