Documents Meet LLMs: Tales from the Trenches
2025-10-01 , Gaston Berger

Processing documents with LLMs comes with unexpected challenges: handling long inputs, enforcing structured outputs, catching hallucinations, and recovering from partial failures.
In this talk, we’ll cover why large context windows are not a silver bullet, why chunking is deceptively hard and how to design input and output that allow for intelligent retrial. We'll also share practical prompting strategies, discuss OCR and parsing tools, compare different LLMs (and their cloud APIs) and highlight real-world insights from our experience developing production GenAI applications with multiple document processing scenarios.


Processing documents with Large Language Models (LLMs) sounds simple: load a document, get structured output. It doesn't get easier. In practice, it's a battlefield, especially when building systems that must perform reliably in production.

In this talk, we’ll share lessons learned from building and operating real-world GenAI applications, processing complex documents at scale. You’ll hear tales from the trenches about:
- Long inputs, short outputs: Even with large context windows, output token limits matter. We'll discuss how to chunk documents meaningfully with minimal loss of context.
- Structured output: Tricks and tools for enforcing schemas when downstream services depend on consistent, machine-readable outputs.
- Tackling hallucinations and omissions: LLMs can invent details or omit parts of the input. We'll cover methods to validate completeness and correctness, including retry strategies and output merging.
- Prompt engineering "tricks": To guide the LLM consistently, we’ll share specific prompting techniques and examples that worked (and didn’t).
- Evaluation: We’ll discuss how to evaluate document-processing tasks when "ground truth" is subjective and multiple valid outputs exist.

Throughout, we’ll compare how different LLMs perform, and highlight and contrast libraries and tools that help support document pipelines, from OCR extraction to output validation.

Who should attend:
Data scientists, ML engineers, and developers working with LLMs, NLP, or document processing. Some experience with LLM APIs and basic Python is helpful.

Talk type and tone: Practical, experience-driven.

Key takeaways:
You’ll leave with:
- An understanding of hidden challenges in LLM document processing
- Writing better prompts for structured and task-specific outputs
- Detecting and handling hallucinations and partial failures
- Tools and libraries for parsing, evaluation and validation

Nour leads the Generative AI technical group at Modus Create. She has a PhD in Machine Learning, and has worked on Machine Learning, Data Science and Data Engineering problems in various domains, both inside and outside Academia.