Justin Miller lives in Los Angeles with his wife Megan and their dog Eddie. In his free time, he enjoys photography and learning about machine learning.

As a Principal Platform Engineer at ZEFR, Justin has introduced tools like Ray, developed data pipelines integrating NLP and CV embeddings with Qdrant and Snowflake, and implemented cost-saving measures that reduced expenses by reducing resource utilization. He also modernized infrastructure, transitioning services to Kubernetes and streamlining deployments using GitHub Actions and ArgoCD.

With prior roles at GoSpotCheck, ProtectWise, and eHarmony, Justin has extensive experience building scalable systems with Scala, Java, and Python. His projects include Kafka stream processors, Spark and Snowflake data warehouses, and media retrieval/storage services. He has extensive experience mentoring engineers across all levels to strengthen team capabilities.

Presentations

23x

Distributed Embeddings At Scale: Processing 10+ million rows per day with Ray, GPUs, and Qdrant

In this talk, we’ll describe a production-grade NLP pipeline that processes millions of pieces of social media content across TikTok, YouTube, and Instagram using Ray and GPU acceleration. Learn how we use Ray's distributed computing model to orchestrate scalable embedding generation, sharded batch writes to Qdrant for vector search, and end-to-end pipeline tracking with Snowflake.

See Presentation