Distributed Embeddings At Scale: Processing 10+ million rows per day with Ray and GPUs

The presentation will take place in Ballroom DE on Friday, March 6, 2026 - 14:30 to 15:30

In this talk, we’ll describe a production-grade NLP pipeline that processes millions of pieces of social media content across TikTok, YouTube, and Instagram using Ray and GPU acceleration. Learn how we use Ray's distributed computing model to orchestrate scalable embedding generation, sharded batch writes to Qdrant for vector search, and end-to-end pipeline tracking with Snowflake.

The architecture:

- Retrieves rows from Snowflake for each platform
- Cleans, chunks, and distributes embedding tasks across GPUs using Ray actors
- Writes results to GCS, Qdrant, and back to Snowflake
- Deletes stale shards and handles multi-platform ingestion

If you’re working with large, multi-source datasets and need GPU-heavy inference at scale, this talk is for you. This talk will combine code walkthrough with practical advice for Ray deployments.

Distributed Embeddings At Scale: Processing 10+ million rows per day with Ray and GPUs

Join our mailing list