From SQL to Python: Building Data Context for Agents and People PyData London 2026

From SQL to Python: Building Data Context for Agents and People
.ical
2026-06-07 10:15–11:00, Hardwick Hub

Text-to-SQL makes great demos, but in real systems generating queries is rarely the hard part - understanding data is. Modern data is increasingly S3-first and multimodal, where meaning is defined by Python workflows, not table schemas.

To work reliably, both agents and people need data context across multiple layers: storage context (what exists and where), metadata context (what’s inside files), dataset context (how files are grouped and versioned), and code context (the transformations that define semantics).

In this talk, I’ll share a practical framework for building these context layers in Python-first systems, and show how DataChain makes multimodal workflows agent-ready in domains like Physical AI and biotech.

Text-to-SQL is often presented as the future interface for AI-driven analytics: connect an LLM to your warehouse, ask questions, get answers. The demo works. But production systems reveal a deeper issue: SQL can query structure, but it cannot provide the context required to understand what data actually means.

After years of building data infrastructure, I’ve learned that context is the real bottleneck - for both people and agents. This becomes unavoidable in S3-first, multimodal environments: video, audio, medical scans, sensor streams, and model outputs. In these projects, the source of truth is object storage, and meaning is defined by Python pipelines.

To reason correctly, you need data context across multiple layers:

Storage context - what exists, where it lives, and how it changes
Metadata context - what’s inside files, extracted signals, and hierarchical structure
Dataset context - how files are grouped, reused across datasets, and versioned
Code context - the Python transformations that define semantics and intent

In this talk, I’ll present a practical framework for collecting and using these layers systematically. Using DataChain as a concrete example, I’ll show how typed schemas (e.g., Pydantic), vectorized metadata operations, and scalable Python execution make multimodal workflows understandable, reusable, and agent-ready - especially in Physical AI and biotech.

Attendees will leave with a clear mental model for building data platforms where meaning lives in code, and agents can operate with real context rather than isolated queries.

Dmitry Petrov

Dmitry Petrov is the creator of open-source tool DVC (Data Version Control), holds a PhD in Computer Science, previously worked as a Data Scientist at Microsoft, and is now the founder of DataChain.ai, a Python-first data platform for Physical AI.

From SQL to Python: Building Data Context for Agents and People .ical 2026-06-07 10:15–11:00, Hardwick Hub

From SQL to Python: Building Data Context for Agents and People
.ical
2026-06-07 10:15–11:00, Hardwick Hub