Most organizational knowledge is still locked inside complex documents, making it difficult to extract and use the information effectively. Traditional tools often fail when working with real-world document formats, particularly PDFs. Tables lose their structure, figures get separated from captions, and multi-column layouts become unreadable text. These failures make it difficult to bring AI to document-heavy workflows.
Docling is an open-source project that takes a different approach, using deep learning models to parse documents the way humans read them. It preserves hierarchy, extracts structured data through a consistent API, and supports 15+ file formats out of the box. All of Docling is MIT-licensed, enabling fully local execution, allowing you to keep sensitive data on-premise while delivering low-latency processing and ingestion.
In this hands-on workshop, you'll build a complete document intelligence pipeline from the ground up. We'll work through three progressive modules: first, converting documents and exploring Docling's enrichment features like table detection and image classification; second, chunking strategies that preserve document semantics for retrieval; and third, building a multimodal RAG pipeline with visual grounding, creating an application that can cite the exact page and location where it found an answer.
No prior experience with Docling is required. Colab notebooks with hosted model endpoints will be provided, so you can follow along with just a browser. Attendees who prefer local execution should have Jupyter Notebook installed and the ability to download models from Hugging Face. Bring your own documents to experiment with, or use the samples provided.



