2025-12-08 –, Abigail Adams
Unlocking the full potential of AI starts with your data, but real-world documents come in countless formats and levels of complexity. This session will give you hands-on experience with Docling, an open-source Python library designed to convert complex documents into AI-ready formats. Learn how Docling simplifies document processing, enabling you to efficiently harness all your data for downstream AI and analytics applications.
With the rapid rise of AI, developers need better ways to transform complex documents into structured data ready for model training and inference. Enter Docling, an open source Python package that's quickly becoming the go-to for document parsing and export. In just a few months, Docling has earned over 25,000 GitHub stars and is already reshaping how developers approach document AI.
In this session, you'll get an in-depth introduction to Docling and how it can streamline your AI workflow, and get a chance to walk through a hands on workshop to create your first custom doc ingestion pipeline with Docling. Key features include:
Broad format support: Easily convert PDFs, DOCX, PPTX, HTML, images, and Markdown into structured Markdown or JSON.
Deep document understanding: Accurately capture page layouts, reading order, and tables—essential for complex document analysis.
AI integration: Use the DoclingDocument format with frameworks like LlamaIndex, LangChain, and InstructLab to power RAG, QA, and LLM training.
OCR support: Extract data from scanned or image-based documents.
Developer friendly CLI: Process documents quickly and consistently with a simple command-line interface.
This workshop will require users to have experience with Python programming and LLMs. It will be presented in Jupyter notebook format and will be accessible and runnable in Google collab, ensuring all participants devices will work for the session.
Ming Zhao is an open source developer and Developer Advocate at IBM, where he helps IBM leverage open technologies while building impactful tools and growing vibrant open-source communities. He’s passionate about making open tech accessible to all and ensuring developers have the tools they need to succeed in the rapidly developing AI space. Ming now leads community efforts around Docling, IBM’s fastest-growing open source project, recently welcomed into the LF AI & Data Foundation.