2.0 -//Pentabarf//Schedule//EN

PUBLISH YFYXAC@@pretalx.com

-YFYXAC

Making Databases LLM-Ready: Building Production Semantic Layers with Semantido en

20260605T090000 20260605T103000 013000

Making Databases LLM-Ready: Building Production Semantic Layers with Semantido

Learning Objectives By the end of this tutorial, participants will be able to: * Design and implement semantic models that capture business logic and domain knowledge alongside database schema definitions * Build LLM integrations that leverage semantic metadata for accurate query generation and validation * Deploy scalable semantic APIs that abstract database complexity from LLM applications * Use PydanticAI for an agentic analytics harness * Implement observability patterns for monitoring and debugging * How to evaluate semantic layer quality __Desired Tutorial Structure (90 minutes)__ __Part 1: Foundations (~20 minutes)__ - The Text2SQL Challenge: Why naive approaches fail - Semantic Layer Architecture: Core concepts, design patterns, and the role of metadata in LLM reliability - Semantido Quick Start: Installation, project setup, and connecting to the playground database - *Hands-on Exercise*: Participants will set up their development environment and connect semantido to a provided database. __Part 2: Building Your First Semantic Layer (~25 minutes - 35 mins)__ - Declarative Model Definition: Extending SQLAlchemy models with business metadata, descriptions, and constraints - Relationship Semantics: Annotating foreign keys, joins, and cross-table business rules - Domain Knowledge Injection: Adding enums, validation logic, and computed fields with business meaning - *Hands-on Exercise*: Participants will build a semantic layer for a given database, adding rich metadata that describes the tables and columns both in application and business terms. __Part 3: LLM Integration Patterns (~20 minutes)__ - Context aware Query Generation: Using semantic layers exposed via a FastAPI endpoint - *Hands-on Exercise*: Participants build a simple chatbot that answers natural language questions using the generated semantic layer. Participants will implement query validation and test it with ambiguous questions. __Part 4: Production Considerations (~20 minutes)__ - Observability and Debugging (6 min): Logging semantic context, tracing query generation, and monitoring LLM-database interactions - Evaluation Framework (5 min): Testing semantic layer quality with automated benchmarks and business logic validation - Deployment Patterns (4 min): Docker, FastAPI integration, and scaling considerations - *Hands-on Exercise*: Participants will add observability instrumentation to their semantic layer and run an evaluation suite that tests query accuracy against known business questions. __Part 5: Production Considerations (15 minutes)__ * Q&A: Open discussion and troubleshooting PUBLIC CONFIRMED Tutorial https://pretalx.com/pydata-london-2026/talk/YFYXAC/ Grand Hall 1 Dragos Crintea PUBLISH DBGAND@@pretalx.com

-DBGAND

GPU Algorithm Authoring with CUDA Tile en

20260605T105000 20260605T122000 013000

GPU Algorithm Authoring with CUDA Tile

CUDA Tile is NVIDIA's new programming model for writing GPU kernels in an array-centric style that is portable across NVIDIA GPU architectures. Instead of orchestrating thousands of threads directly, you express computation over small local arrays (tiles) and let the system manage the parallel execution details: synchronization, data movement, and coordination across the GPU. This interactive session introduces the core mental model behind tile programming and how it is realized in cuTile Python on top of the Tile IR compiler stack. You will write tile code, see how it maps onto real GPU execution, and learn how to evaluate and tune performance with NVIDIA's Nsight profilers. We'll explore examples from both DL and HPC, such as large language model inference and conjugate gradient solvers. This session is hands-on with no installation required, just a web browser. We'll use Brev, NVIDIA's developer cloud, to get access to GPUs, and all work will be done in a JupyterLab environment. By the end of this session, you will: - Build an accurate mental model of tiles, thread groups, and how tile code executes on GPUs. - Write and debug tile-based GPU kernels in Python for real workloads. - Use profiling traces to identify bottlenecks and guide optimizations inside a notebook workflow. - Decide when tile programming is the right tool versus SIMT, and how to mix the two when needed. Links: - Accelerated Computing Hub: https://github.com/NVIDIA/accelerated-computing-hub - cuTile Python: https://github.com/NVIDIA/cutile-python - Tile IR: https://github.com/NVIDIA/cuda-tile - TileGym examples: https://github.com/NVIDIA/TileGym PUBLIC CONFIRMED Tutorial https://pretalx.com/pydata-london-2026/talk/DBGAND/ Grand Hall 1 Katrina Riehl PUBLISH TAWYHU@@pretalx.com

-TAWYHU

Keynote: Samuel Colvin: Pydantic Monty & Logfire: Wild LLMs, from tool calling to computer use en

20260605T132000 20260605T140500 004500

Keynote: Samuel Colvin: Pydantic Monty & Logfire: Wild LLMs, from tool calling to computer use

PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/TAWYHU/ Grand Hall 1 Samuel Colvin PUBLISH M8TE3Q@@pretalx.com

-M8TE3Q

Flexible Statistical Modeling with Bayesian Additive Regression Trees en

20260605T141000 20260605T154000 013000

Flexible Statistical Modeling with Bayesian Additive Regression Trees

Machine learning models are often evaluated on predictive accuracy alone, but accuracy without uncertainty can be misleading. Classical tree ensemble methods like random forests and gradient boosting provide point predictions, and while techniques like conformal inference or bootstrap aggregation can add uncertainty estimates, these are often poorly calibrated or computationally expensive. Bayesian Additive Regression Trees (BART) offer a different approach: uncertainty quantification is built into the model, not ignored or bolted on afterward. BART models the response as a sum of small trees, with regularization priors that keep each tree weak. Posterior inference over the tree structures yields a full distribution over predictions—every fitted value comes with a credible interval that reflects genuine uncertainty about the underlying function. This tutorial introduces BART through three applications, each demonstrating how uncertainty changes the way we interpret results: **Regression:** We begin with continuous outcomes, fitting BART models and visualizing posterior predictive distributions. Rather than a single fitted curve, participants will see HDI bands that widen where data is sparse and narrow where evidence is strong. We'll explore variable importance—which comes with its own uncertainty—and partial dependence plots that reveal non-linear effects. **Classification:** For binary outcomes, BART produces predicted probabilities with uncertainty, not just class labels. We'll examine how this uncertainty propagates through decision-making and compare calibration against standard classifiers. ### Target audience Data scientists and analysts looking to add useful statistical methods to their toolkit. ## Takeaways Participants will leave able to fit BART models for continuous, binary, and time-to-event outcomes; interpret predictions with full posterior uncertainty; use variable importance and partial dependence plots appropriately; and decide when BART's uncertainty quantification justifies its computational cost over simpler alternatives. ## Materials GitHub repository with marimo notebooks, real-world datasets from sports, psychology, and other domains, environment files, and a one-page BART reference guide. Participants should clone the repository and verify their setup before the session. PUBLIC CONFIRMED Tutorial https://pretalx.com/pydata-london-2026/talk/M8TE3Q/ Grand Hall 1 Chris Fonnesbeck PUBLISH SUJZCA@@pretalx.com

-SUJZCA

Do you know how well your model is doing? Evaluate your LLMs en

20260605T160000 20260605T173000 013000

Do you know how well your model is doing? Evaluate your LLMs

Prerequisites: - Have experience coding in Python (with Python installed in the local machine) - Basic understanding of machine learning and LLMs - Experience with Hugging Face Transformers is preferred but not necessary - A Hugging Face Hub account (sign up for free) - A modern computer that can fine-turn small LLMs locally Description: We will begin with an essential revision of the Hugging Face Transformers library, covering basic LLM inference and fine-tuning. The core of the workshop will introduce and provide deep practice with Lighteval, an efficient and powerful LLM evaluation framework. Participants will learn how to leverage Lighteval to compare various LLMs available on the Hugging Face Hub using a range of pre-built tasks and metrics. Finally, we will delve into advanced evaluation techniques, focusing on creating custom tasks and metrics tailored to unique, real-world application requirements. Participants will learn how to prepare custom datasets on the Hugging Face Hub and integrate them into Lighteval for precise, domain-specific evaluation. By the end of this workshop, you will possess the practical skills to rigorously evaluate, benchmark, and fine-tune your LLMs with confidence. PUBLIC CONFIRMED Tutorial https://pretalx.com/pydata-london-2026/talk/SUJZCA/ Grand Hall 1 Cheuk Ting Ho PUBLISH 3RBSQM@@pretalx.com

-3RBSQM

After Conference Social- Fleets- Sponsored by PDFTA & Coefficient en

20260605T180000 20260605T210000 030000

After Conference Social- Fleets- Sponsored by PDFTA & Coefficient

PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/3RBSQM/ Grand Hall 1 PUBLISH 9PPVRK@@pretalx.com

-9PPVRK

Learn to Unlock Document Intelligence with Open-Source AI en

20260605T090000 20260605T103000 013000

Learn to Unlock Document Intelligence with Open-Source AI

Most organizational knowledge is still locked inside complex documents, making it difficult to extract and use the information effectively. Traditional tools often fail when working with real-world document formats, particularly PDFs. Tables lose their structure, figures get separated from captions, and multi-column layouts become unreadable text. These failures make it difficult to bring AI to document-heavy workflows. This workshop will give you hands on experience with Docling, an open-source project that takes a different approach, using deep learning models to parse documents the way humans read them. It preserves hierarchy, extracts structured data through a consistent API, and supports 15+ file formats out of the box. All of Docling is MIT-licensed, enabling fully local execution, allowing you to keep sensitive data on-premise while delivering low-latency processing and ingestion. You'll be building a complete document intelligence pipeline from the ground up. We'll work through three progressive modules: first, converting documents and exploring Docling's enrichment features like table detection and image classification; second, chunking strategies that preserve document semantics for retrieval; and finally, building on all our other components using Docling, we will build a multimodal RAG pipeline with visual grounding, creating an application that can cite the exact page and location where it found an answer. No prior experience with Docling is required. Colab notebooks with hosted model endpoints will be provided, so you can follow along with just a browser. Attendees who prefer local execution should have Jupyter Notebook installed and the ability to download models from Hugging Face. Bring your own documents to experiment with, or use the samples provided. Link to workshop, project resources, and more: https://red.ht/pydataLON PUBLIC CONFIRMED Tutorial https://pretalx.com/pydata-london-2026/talk/9PPVRK/ Doddington Forum Mingxuan Zhao Abby Tse Carol Chen PUBLISH APYSNR@@pretalx.com

-APYSNR

Observing Agentic AI in Production: MCP Server Tracing with OpenTelemetry and Animal Crossing en

20260605T105000 20260605T122000 013000

Observing Agentic AI in Production: MCP Server Tracing with OpenTelemetry and Animal Crossing

### Why this matters OpenTelemetry is rapidly becoming the standard telemetry backbone for AI agents, just as it is already for microservices. It is one of the most active CNCF projects after Kubernetes, with native support from 30+ observability vendors. Its GenAI Special Interest Group declared 2025 the "year of AI agents" and has since published purpose-built semantic conventions for LLM calls, agent orchestration, and MCP tool calls. The industry has followed: Amazon launched Bedrock AgentCore Observability built entirely on OTel and GenAI semantic conventions; Grafana Labs demonstrated production tracing of the OpenAI Agents SDK and AWS Bedrock AgentCore. However, most teams building agents today have none of this. The reason is a “developer experience gap”: many agent builders come from data science and ML research backgrounds, not distributed systems, and have never configured a tracing pipeline. Traditional monitoring tools don't capture the signals that matter for agents: token usage, cost per invocation, tool selection, multi-agent handoffs. Since agentic architecture is interaction-centric (98% of wall-clock time is spent in LLM API calls and tool executions, not your code), this means distributed tracing, not traditional metrics, is the primary observability signal. Without it, failures are invisible: one fintech company's agent ran in a loop for 11 hours accumulating $47,000 in costs before anyone noticed. ### What we will do We will instrument a FastMCP server that exposes tools for a fun real-time data engineering scenario, instrument it with OpenTelemetry and visualise the resulting traces. * Check out a FastMCP server (understand the MCP request/response lifecycle). * OpenTelemetry for agentic AI (traces, metrics, logs and why they're the primary signal for agents). * Instrument the MCP server (OpenTelemetry instrumentation, see how errors are automatically recorded with stack traces). * From traces to dashboards (build a dashboard that answers which tools are slowest, showing error rates and token costs). * Production patterns and case studies (patterns for sensitive data handling, sampling strategies for high-throughput agent workflows). * Connecting auth and observability (auth attributes appearing in traces when OAuth is enabled, giving per-user visibility). ### Target audience Data engineers, data scientists, ML/AI engineers and SRE/platform engineers who are building or operating AI agents and need production visibility into agentic workflows. This is relevant to anyone deploying LLM-powered tools, multi-agent orchestration or MCP servers. Or you’re just a fan of Animal Crossing and social simulation gaming. ### Prerequisites * Basic to Intermediate Python (comfortable with decorators, async/await basics and uv). * No prior knowledge of MCP, OpenTelemetry or FastMCP is required. ### Tutorial requirements * MacOS/Linux laptop or Windows with PowerShell. * Docker, Colima or OrbStack (to run Docker Compose for the local observability stack). * uv for package management. * A code editor (VS Code, Cursor, Kiro or similar). * LLM access, either via a vendor (Anthropic, OpenAI, etc) or local Ollama. We will be serving a local 1B model, so you’ll need enough RAM and disk space ~4 GB. * Visit the GitHub repo <https://tinyurl.com/anteaters26> and follow the `SETUP.md` to install all the tools prior to arrival. ### Key takeaways 1. Understand why distributed tracing (rather than traditional metrics) is the primary observability signal for agentic AI systems. 2. Be able to build an MCP server with custom tools using FastMCP and instrument it with OpenTelemetry. 3. Know the OpenTelemetry GenAI and MCP semantic conventions and how they standardise telemetry across agent frameworks. 4. Be able to visualise, query and dashboard agent traces using Jaeger. 5. Understand the production observability landscape: auto-instrumentation libraries, sensitive data handling and compliance considerations. PUBLIC CONFIRMED Tutorial https://pretalx.com/pydata-london-2026/talk/APYSNR/ Doddington Forum Tun Shwe Fei Phoon PUBLISH ZAR8AG@@pretalx.com

-ZAR8AG

Building a Browser Agent from Scratch: Teach an LLM to Navigate the Web en

20260605T141000 20260605T154000 013000

Building a Browser Agent from Scratch: Teach an LLM to Navigate the Web

The web is the world’s largest API, but it was designed for humans, not machines. Traditional browser automation tools like Selenium and Playwright require developers to write brittle scripts with hardcoded selectors that break whenever a website changes its layout. Browser agents flip this model: instead of telling the browser exactly what to click, you describe what you want to accomplish, and an LLM figures out how to do it; reading the page like a human would, reasoning about what to do next, and adapting when things don’t go as expected. This approach has seen explosive growth. The open-source browser-use library surpassed 60,000 GitHub stars within months of release, and its creators raised $17M in seed funding. Skyvern, Browserbase, and others have built commercial platforms around the same idea. Under the hood, these tools all share a remarkably similar architecture: a perception layer that converts web pages into LLM-readable context, a reasoning layer where the LLM decides what action to take, and an execution layer that carries out the action via browser automation. This tutorial strips away the abstraction layers and builds each component from scratch. The “from scratch” approach is deliberate: by understanding how the DOM is parsed, how screenshots are fed to vision models, and how the agent loop manages state, attendees gain transferable knowledge that applies to any browser agent tool or framework. When something breaks in production (and it will), this understanding is what separates debugging from guessing. PUBLIC CONFIRMED Tutorial https://pretalx.com/pydata-london-2026/talk/ZAR8AG/ Doddington Forum Richard Oreolorun Olu-Ipinlaye PUBLISH HC3SLQ@@pretalx.com

-HC3SLQ

From Synthetic Examples to Production Signals: Multimodal Training Data Pipelines with Privacy-Safe Feedback en

20260605T160000 20260605T173000 013000

From Synthetic Examples to Production Signals: Multimodal Training Data Pipelines with Privacy-Safe Feedback

This tutorial is for AI builders who want more discipline around training-data creation. The central premise is simple: the data models consume deserves the same engineering rigor as the models themselves. Across three progressive Jupyter notebooks, participants will: - Learn the Data Designer workflow through a text QA example, using explicit controls for the mix of examples, seed datasets, templated LLM generation, structured outputs, and LLM-as-a-judge quality checks. - Apply the same pipeline shape to multimodal document data, using rich synthetic business document images as source records, VLM-classified visual focus areas, VLM-generated question-answer pairs, and a judge step to filter for correctness and visual grounding. - Anonymize production-style usage data from a fine-tuned model, comparing privacy strategies that reduce sensitive-data risk while preserving useful training signal. Participants leave with a working repo, runnable notebooks, and a reusable mental model for building training-data pipelines across text and images. **Takeaways** - A reproducible pattern for multimodal training-data generation. - Practical use of source datasets, example-mix controls, dependency-aware columns, structured LLM outputs, and judge-based validation. - A privacy workflow for turning production usage logs into safer source data for future training iterations. - Hands-on experience with NeMo Data Designer and NeMo Anonymizer. - A clear view of how synthetic generation, quality validation, and anonymized production feedback support a training-data lifecycle. **Why Attend This Session?** Most synthetic-data tutorials stop after generation. This session follows the full lifecycle: define the source material and the kinds of examples you want, generate text and multimodal training data, validate quality, anonymize production feedback, and prepare the anonymized data to be transformed into the next set of training examples. **Prerequisites** This is a hands-on notebook workshop. To follow along, please bring: - A laptop where you can run Python and Jupyter notebooks. - Basic comfort with Python, pandas-style dataframes, and editing notebook cells. - Ability to clone a GitHub repository and run simple terminal commands. Setup instructions will include installing uv if you do not already have it. - One hosted model API key configured in your environment or .env file. You can create a free `NVIDIA_API_KEY` at `build.nvidia.com`, or use `OPENROUTER_API_KEY` / `OPENAI_API_KEY`; if you use OpenRouter or OpenAI, any cost incurred during the session should be very minimal. - Internet access for calling hosted LLM APIs during the exercises. The workshop repository URL (https://github.com/nabinchha/pydata-london-2026-data-designer-anonymizer) will be made public before the session. You do not need prior experience with NeMo Data Designer, or NeMo Anonymizer. PUBLIC CONFIRMED Tutorial https://pretalx.com/pydata-london-2026/talk/HC3SLQ/ Doddington Forum Nabin Mulepati Lipika Ramaswamy PUBLISH SKBDNF@@pretalx.com

-SKBDNF

Beyond ML Model Calibration: Hands-On Multicalibration with MCGrad en

20260605T090000 20260605T103000 013000

Beyond ML Model Calibration: Hands-On Multicalibration with MCGrad

A globally well-calibrated model can still be systematically overconfident for one subgroup and underconfident for another, these errors cancel out in aggregate, passing standard checks while silently degrading decisions for specific populations. Multicalibration fixes this by ensuring predictions are calibrated across all subgroups simultaneously, while improving other notions of model performance. This tutorial introduces multicalibration from scratch using **MCGrad**, an open-source library (`pip install mcgrad`) that has been deployed on hundreds of production ML models at a major tech company, and the methodology was recently accepted at KDD 2026. Attendees train a classifier on a public dataset, discover hidden subgroup miscalibration, then fix it with MCGrad in a few lines of code, all inside a ready-to-run Colab notebook. We also cover hyperparameter tuning, safety mechanisms, and when not to apply multicalibration. OUTLINE: - **Welcome & Setup** (5 min) Goals, format, open Colab notebook, pip install mcgrad. - **The Calibration Gap** (15 min) What is calibration? And why should ML practitioners care about it? Train a logistic regression on the dataset. Apply isotonic regression -- global calibration looks perfect. Reveal: the model is still badly miscalibrated for specific subgroups. - **From Calibration to Multicalibration** (15 min) Define multicalibration and the MCE metric. Why practitioners need it: you rarely know which subgroups matter in advance. Deployment lessons from a major tech company (hundreds of production models). - **MCGrad in Action -- Hands-On** (30 min) Walk through the MCGrad API (`fit`/`predict`). Fit MCGrad on the dataset, inspect the learning curve, compare base model vs. isotonic regression vs. MCGrad. Visualise segment-level error reduction. Mini-exercise: change segment features, observe impact on MCE. - **Advanced Features & Production Tips** (15 min) Hyperparameter tuning, safety mechanisms (no-op failsafe), regression multicalibration, model serialization, when not to use multicalibration. - **Wrap-Up & Q&A** (10 min) Recap the three-step workflow (measure MCE, fit MCGrad, verify). Pointers to docs and tutorials. Open Q&A. Attendees leave with a working notebook, a new metric *multicalibration error* (MCE) for auditing their own models, and a pip-installable tool to act on the results. PUBLIC CONFIRMED Tutorial https://pretalx.com/pydata-london-2026/talk/SKBDNF/ Hardwick Hub Niek Tax PUBLISH WDQZLR@@pretalx.com

-WDQZLR

Hands-On with Tabular Foundation Models: From Zero to Strong Baselines en

20260605T105000 20260605T122000 013000

Hands-On with Tabular Foundation Models: From Zero to Strong Baselines

Tabular foundation models are generating excitement, but most practitioners haven't used them yet. This **90-minute hands-on tutorial** bridges that gap. Participants will work through **four progressive notebooks** on real-world datasets of varying difficulty. By the end, they won't just know *about* tabular FMs — they'll have **run them, broken them, and compared them** against familiar baselines. ### Who is this for? Data scientists and ML engineers who: - Use sklearn / XGBoost / LightGBM regularly - Are curious about tabular FMs but haven't tried them - Want to build informed opinions grounded in hands-on experience ### What we'll use - **Models:** Any TFMs (TabICL, TabPFN or Neuralk proprietary model with free credits), XGBoost, Random Forest - **Datasets:** 3 curated real-world datasets chosen to expose different behaviors: - A small medical dataset (~500 rows, 12 features) — where TFMs tend to shine - A medium e-commerce dataset (~5K rows, 40+ features with mixed types) — a realistic "grey zone" - A large, noisy dataset (~50K rows) — where trees typically dominate - **Stack:** Python 3.9+, sklearn, tabicl, xgboost, matplotlib, pandas ### Detailed outline (90 min) | Time | Phase | What participants do | Expected output | | --- | --- | --- | --- | | 0–15 | **Conceptual grounding** | Short lecture: what tabular FMs are, how they differ from fitted models, what to expect. No code yet. | Shared mental model before touching code | | 15–30 | **Notebook 1: First predictions** | Install a TFM, load the small medical dataset, generate predictions. Compare API with sklearn's `.fit()/.predict()` pattern. | Working predictions; comfort with the API | | 30–45 | **Notebook 2: Rigorous benchmarking** | Run XGBoost and Random Forest on all 3 datasets with proper cross-validation. Compare with TFMs using the same splits. Discuss evaluation pitfalls (leakage, metric choice). | A comparison table with confidence intervals across 3 datasets | | 45–60 | **Notebook 3: When things break** | Deliberately stress-test the TFMs: add noisy features, increase dataset size, introduce heavy cardinality categoricals. Observe where performance degrades relative to trees. | Intuition for failure modes, backed by their own experiments | | 60–75 | **Notebook 4: Diagnostics & interpretation** | Apply SHAP to both TFMs and XGBoost on the same dataset. Compare explanations. Discuss: are these explanations trustworthy? What can we still learn? Calibration plots and confidence analysis. | Practical diagnostic skills; awareness of interpretability caveats | | 75–85 | **Wrap-up: Decision framework** | Collaborative exercise: given 3 new dataset descriptions, participants vote on which model they'd choose and why. We discuss as a group. | Internalized decision criteria | | 85–90 | **Q&A and next steps** | Open discussion. Pointers to further resources, papers, and community. | | ### Requirements - Laptop with Python 3.9+ - Familiarity with sklearn (fit/predict/cross_val_score) - No deep learning experience needed - All materials (notebooks + datasets + environment setup) will be distributed via a **public GitHub repository** at least 2 weeks before the event > **Note on materials:** The repository is currently being prepared and will contain all notebooks, datasets, and a `requirements.txt` for easy setup. A link will be shared with organizers as soon as it is live.  > ### What attendees will be able to do after this tutorial - **Run** tabular foundation models on their own datasets using a familiar sklearn-compatible API - **Benchmark** TFMs against tree-based baselines with proper cross-validation and meaningful metrics - **Diagnose** model behavior: identify when a TFM is failing, why, and what to do about it - **Interpret** TFM outputs using SHAP while understanding the limitations of post-hoc explanations on learned priors - **Decide** whether to adopt a tabular FM for a new project based on concrete, experience-backed criteria ### Key takeaways - A working local environment with tabular FM tooling ready to use - Four completed notebooks they can reuse as templates on their own data - Confidence to try (or deliberately skip) tabular FMs on their next project PUBLIC CONFIRMED Tutorial https://pretalx.com/pydata-london-2026/talk/WDQZLR/ Hardwick Hub Nicolas Makaroff PUBLISH MRKKWJ@@pretalx.com

-MRKKWJ

Test-Driven Data Analysis en

20260605T141000 20260605T154000 013000

Test-Driven Data Analysis

Test-Driven Data Analysis is a methodology for reducing errors in data and data analysis, and also a an open-source Python package (tdda) for supporting key aspects of the methodology. This tutorial will provide hands-on experience using the library to - generate constraints characterising data in data frames automatically; - validate data using previously generated constraints; - test structured data resulting from analyses in data frames; - test unstructured date from analyses, typically in text files and graphical form, as well as highlighting other libraries that can be used for similar purposes. It will also discuss a taxonomy of errors arising during analysis and highlight approaches to reducing those errors, including through the use of 22 TDDA-focused checklists. This major error categories that will be considered are - errors of interpretation (of formulation and of communication), - errors of implementation, - errors of process, - errors of applicability, and - errors of judgement. **ATTENDEES** No prior experience is required, but it would be helpful to have the tdda library installed and to have some familiarity with DataFrames in polars or pandas. If you want to develop hands-on experience during the tutorial follow the instructions to install tdda at [tdda.readthedocs.io](https://tdda.readthedocs.io/en/latest/installation.html). If this works, you should be able to use the tdda command. If you change to a directory you are happy to put data in, the command tdda examples all will download all the data that will be used in the tutorial in subdirectories. There is wifi available at the conference, but if you do this ahead the tutorial, you will fight fewer people for bandwidth and will have more chance to check it works before you need the library. PUBLIC CONFIRMED Tutorial https://pretalx.com/pydata-london-2026/talk/MRKKWJ/ Hardwick Hub Nick Radcliffe PUBLISH JL7YAJ@@pretalx.com

-JL7YAJ

Model criticism through posterior predictive checks en

20260605T160000 20260605T173000 013000

Model criticism through posterior predictive checks

The main expected audience of this tutorial are practitioners, either in academia or industry, working with probabilistic models, and it will also include multiple elements of interest to anyone working with any kind of statistical model or fitting data through simulations. The material for the tutorial will be published on [GitHub](https://github.com/OriolAbril/pydata2026-ppc) beforehand so attendees can download the data and prepare their environments. The tutorial will assume attendees are familiar with Python, Jupyter notebooks and basic statistical concepts. Knowledge about Bayesian inference and posterior predictive sampling will be helpful but they are not required. The tutorial will have an initial introductory section of ~40 minutes. The introduction will cover posterior predictive checks conceptually as well as usage examples using ArviZ. This will be followed by multiple hands-on exercises on provided example datasets to practice model criticism through posterior predictive checks. The main topics covered will be: * Understanding the need for distributional comparisons * Understanding how to adapt model criticism to the type of data * Diagnosing models of heterogeneous data at both the population and group level * Translating model criticism visualizations to model issues * Multiple uncertainty visualization designs * How to use ArviZ for predefined and custom posterior predictive checks PUBLIC CONFIRMED Tutorial https://pretalx.com/pydata-london-2026/talk/JL7YAJ/ Hardwick Hub Oriol Abril Pla PUBLISH PUL99Q@@pretalx.com

-PUL99Q

Keynote- Rachel Lee Nabors- The Community Is the Boat en

20260606T091000 20260606T095500 004500

Keynote- Rachel Lee Nabors- The Community Is the Boat

PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/PUL99Q/ Grand Hall 1 Rachel Lee Nabors PUBLISH V3D3LS@@pretalx.com

-V3D3LS

The Rules Nobody Writes Down: Decoding and Shifting Team Culture From Any Seat en

20260606T102000 20260606T110500 004500

The Rules Nobody Writes Down: Decoding and Shifting Team Culture From Any Seat

Most talks on culture are aimed at managers, the people with formal authority to change things. But data scientists, engineers, and ML practitioners navigate team dynamics every day, often without that authority. This talk is for you: how to read the unwritten rules, understand what drives them, and shift them from any position. The central idea is simple. Team culture isn't what is on the company wiki. It is a paradigm or a system of habits that shapes behaviour more powerfully than any stated policy. Beneath those habits sits the team's collective self-image: the shared belief about "who we are" and "what's possible." That self-image sets the upper limit of performance. A team that sees itself as "always firefighting" will keep firefighting, even when the fires are out. The good news: you can influence this from any seat. Not through announcements, but through consistent action. I will cover how to read the paradigm, the signals that reveal real culture. How failure is handled. Who speaks first. What gets celebrated versus quietly ignored. The stories that get repeated. These are data points that tell you what the team actually believes. I will also share specific questions you can use in interviews to decode culture before you join. Questions about past failures, who thrives versus struggles, how disagreement is managed. The answers matter less than how people respond: the hesitation, the energy, the discomfort. There is a trap worth knowing about. The longer you stay, the less you see. What felt strange in week one feels normal by month three. Your first weeks are a window of clarity. I will cover how to use it before it closes. The core of the talk is about influence. Three ways are available to anyone: modelling the behaviour you want to see, naming what others leave unspoken, and holding a different picture of what's possible. This isn't positive thinking. It is praxis, which is integrating belief with behaviour through consistent action. Finally, I will use AI adoption as an example. AI tools are shifting work toward individual tasks while new unwritten rules form around their use. Who is using AI openly? Who is hiding it? What is the unspoken agreement about quality and trust? This is a chance to watch a paradigm form in real-time and shape it before it solidifies. This is a practical talk, not a tutorial. The target audience is data scientists, data engineers, and ML practitioners at any level and especially if you'] have recently joined a team, are navigating a tricky dynamic, or want more impact without moving into management. You will leave with a framework for reading team culture, understanding what drives it, and influencing it through consistent behaviour starting the day you get back to work. PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/V3D3LS/ Grand Hall 1 Margaritha Groenendijk PUBLISH WGJMXV@@pretalx.com

-WGJMXV

Columnar Thinking - Designing for high-performance execution with Arrow and Polars en

20260606T110500 20260606T115000 004500

Columnar Thinking - Designing for high-performance execution with Arrow and Polars

Everyday production-scale data and systems engineering still reflects a row-oriented mental model. Loops, iterations, mutations are seen as easy to read and are understandable. While these work for small datasets and toy models during explorations in notebooks, they fail to perform when workloads scale - be it for rolling analytics, high-throughput pipelines or multi-million row aggregations. This mismatch between row-wise thinking and modern CPU architecture becomes a structural bottleneck that becomes very costly to fix. We’ll explore the shift from row-oriented design to columnar thinking, designing and developing high-performance workloads right from the onset. Using Arrow’s columnar memory format and Polars’ execution engine, armed with concrete examples from real-life quantitative calculations, we will examine how contiguous buffers, SIMD-compatible layouts, and lazy query planning are a natural combination for performant analytical workloads. You’ll leave with: 1. A clear understanding of how columnar memory impacts execution, in contrast to row-oriented or traditional vectorised approaches. 2. Practical patterns for structuring column-first transformations. 3. Insights into how Arrow reduces data movement overhead in distributed systems. 4. Guidance on when lazy execution and query optimisation matters. 5. Ideal design principles for building scalable calculation pipelines with Polars and Arrow tools. PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/WGJMXV/ Grand Hall 1 Kamlesh Shah PUBLISH NVXBEM@@pretalx.com

-NVXBEM

JupyterLite: run all your code in a web browser using WebAssembly en

20260606T115000 20260606T123500 004500

JupyterLite: run all your code in a web browser using WebAssembly

JupyterLite is a JupyterLab distribution that runs entirely in the web browser, backed by in-browser language kernels. Standard JupyterLab uses kernels run in separate processes and communicate with the client by message passing, whereas JupyterLite uses kernels that run entirely in the browser, based on JavaScript and WebAssembly, such as pyodide and xeus-python. This means that JupyterLite deployments can be scaled to millions of users without the need for individual containers for each user session, only static files need to be served which can be done with a simple web server like GitHub pages. This talk will present a comprehensive summary of all things JupyterLite, and demonstrate key features. Highlights include the wide variety of language kernels supported, a terminal for those who wish to run `git` or `vim` at a command line in the browser, and access to AI agents in a safe sandboxed browser environment. It will explain the technology behind JupyerLite and how your favourite packages are built to run in the browser. JupyterLite sites are easy to deploy and there will be a live demonstration of a deployment to illustrate this. Talk outline: - Overview - Comparison of JupyterLab and JupyterLite - Live demonstration of basic functionality - How it works - Kernels, including why are there two different python kernels (pyodide and xeus-python) and how to choose between them - Emscripten-forge package building - Key features such as shared in-browser filesystem - More detailed demos such as installing packages on the fly - What it is good and bad at - Terminal for `vim`, `git`, etc - Jupyterlite AI - Use in project documentation using jupyterlite-sphinx - Deployment, including live demo - Making it easier to deploy and share using notebook.link - Where JupyterLite is going PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/NVXBEM/ Grand Hall 1 Ian Thomas PUBLISH QU33NS@@pretalx.com

-QU33NS

Keynote- Jeremiah Lowin- Build Reasonable Software en

20260606T133500 20260606T142000 004500

Keynote- Jeremiah Lowin- Build Reasonable Software

PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/QU33NS/ Grand Hall 1 Jeremiah Lowin PUBLISH AYDUBL@@pretalx.com

-AYDUBL

Reading the Mind of an LLM en

20260606T144500 20260606T153000 004500

Reading the Mind of an LLM

What if we could step inside an LLM and watch it think in real time? This talk distills the latest research from Anthropic, DeepMind, and OpenAI to present the current state of the art in **LLM interpretability**. We’ll start with the modern interpretation of **embeddings** as sparse, monosemantic features living in high-dimensional space. From there, we’ll explore emerging techniques such as **circuit tracing** and **attribution graphs**, and see how researchers reconstruct the computational pathways behind behaviors like multilingual reasoning, refusals, and hallucinations. We’ll also look at new evidence suggesting that models may have limited forms of introspection—clarifying what they can, and crucially cannot, reliably report about their internal processes. Finally, we’ll connect these “microscopic” insights to **real engineering practice**: how feature-level understanding can improve debugging, safety, and robustness in deployed AI systems, and where current methods still fall short. PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/AYDUBL/ Grand Hall 1 Luca Baggi PUBLISH ABYV3J@@pretalx.com

-ABYV3J

SELECT instance FROM cloud WHERE workload = ? ORDER BY cost_efficiency en

20260606T153000 20260606T161500 004500

SELECT instance FROM cloud WHERE workload = ? ORDER BY cost_efficiency

Selecting a cloud instance for DS/ML/AI workloads is typically done using heuristics, vendor guidance, or trial-and-error. While cloud providers publish pricing tables and hardware specifications, this information is fragmented, inconsistently structured, and challenging to compare across vendors – especially once real workload performance is considered. This talk introduces Spare Cores Navigator, a vendor-independent, open-source, Python-based ecosystem that treats cloud instance selection as a data problem. The project maintains a continuously updated benchmark dataset covering thousands of server types across multiple cloud providers, with standardized hardware metadata, performance measurements, and cost-efficiency metrics across over 500 workloads. We describe how the dataset is built by automatically discovering and provisioning cloud instances at scale using public GitHub Actions to run hardware inspection tools and a diverse benchmark suite. This includes general CPU performance, memory bandwidth, compression algorithms, cryptographic workloads, web serving, and data store performance, as well as DS/ML-specific benchmarks such as gradient-boosted model training and LLM inference on CPUs and GPUs. The main focus of the talk is demonstrating practical use cases for server type selection by querying the dataset under different workload characteristics, compliance and budget constraints, and optimization goals – such as minimizing cost-efficiency trade-offs or reducing environmental impact. PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/ABYV3J/ Grand Hall 1 Gergely Daroczi PUBLISH ZFR8VH@@pretalx.com

-ZFR8VH

Building a Scientific Taxonomy at Scale with Graph Clustering, Embeddings, and LLMs en

20260606T161500 20260606T170000 004500

Building a Scientific Taxonomy at Scale with Graph Clustering, Embeddings, and LLMs

### The problem If you've ever tried to make sense of author-provided keywords across millions of papers, you know the pain. *"Machine learning"*, *"ML"*, *"machine-learning"*: same thing, three entries. Other terms look identical but mean completely different things depending on the field. Manual cleanup? Doesn't scale. Regex and string matching? Misses the semantics entirely. ### What we built We took OpenAlex's 4-level hierarchy (**Domain → Field → Subfield → Topic**) and added a fifth in-house **Concept** layer: 115K+ fine-grained concepts, each with a clear position in the tree. The core idea: embed all candidate concepts with **SPECTER2**, build a mutual kNN similarity graph per field, and cluster it with **Leiden (CPM resolution)** at 100K+ node scale. We tuned hyperparameters via grid search, scored against hand-curated concept pairs - things like *"Cryptocurrency"* and *"Crypto Currency"* must land together, while *"Decision Trees"* and *"Random Forest"* must stay apart. LLMs come in at **five specific points** where embeddings alone aren't enough: filtering concept granularity, classifying into fields, renaming clusters, generating explanations, and validating topic assignments. Everything else is deterministic: no LLM in the loop means reproducible and cheap. ### Paper tagging Once the taxonomy exists, we use it to tag papers. With SPECTER2 embeddings, we retrieve an initial pool of ~150 candidate concepts per paper (eight different text-splitting strategies over title, abstract, and keywords). Deterministic filters prune by field/subfield distribution and merge near-synonyms with Jaccard + union-find. Then an LLM reranker picks the final **5–8 concepts** with domain verification and keyword mapping, ranked. ### What comes next With millions of papers tagged consistently, the obvious next step is **trend detection**: tracking how concept frequency and co-occurrence shift over time to spot emerging research areas. We'll sketch out the approach. ### Tech stack **SPECTER2** (embeddings) · **igraph + leidenalg** (Leiden/CPM clustering) · **hnswlib** (ANN for kNN graphs) · **Qdrant** (vector search for hierarchical attachment) · **Azure OpenAI** (structured LLM inference) · human + automated validation framework ### You'll walk away knowing - When LLMs actually help in large-scale NLP pipelines and when they're overkill - How to scale graph clustering to 100K+ nodes in Python - How to evaluate clustering with custom pair-based constraints - Practical trade-offs between embeddings, graph methods, and LLMs PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/ZFR8VH/ Grand Hall 1 Daniele Raimondi Feichi Lu PUBLISH PGFXAR@@pretalx.com

-PGFXAR

Conference Social en

20260606T170000 20260606T180000 010000

Conference Social

PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/PGFXAR/ Grand Hall 1 PUBLISH RQ3FJQ@@pretalx.com

-RQ3FJQ

Building Production Multi-Agent RAG Systems on Serverless AWS en

20260606T102000 20260606T110500 004500

Building Production Multi-Agent RAG Systems on Serverless AWS

Modern AI applications increasingly require multiple specialised agents working together, but orchestrating them reliably at scale is challenging. This talk walks through the architecture and lessons learned from building a production multi-agent financial analysis platform. Outline: - Minutes 0-5: Why single-agent RAG hits limits — the case for multi-agent orchestration - Minutes 5-15: Architecture deep-dive — SQS-based agent coordination, Lambda handlers, and how RAG retrieval integrates across agents - Minutes 15-22: Cross-region Bedrock routing and why latency geography matters - Minutes 22-30: Cost lessons — achieving 90% vector storage savings with S3 Vector Search vs managed alternatives - Minutes 30-37: Observability and failure handling — Langfuse tracing, DLQs, and debugging distributed agent calls - Minutes 37-40: Key takeaways and Q&A Target audience: ML engineers, data scientists, and developers building production AI systems. Familiarity with RAG concepts and basic AWS services assumed; no multi-agent experience required. PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/RQ3FJQ/ Grand Hall 2 Samuel Jaja PUBLISH 3JJHZF@@pretalx.com

-3JJHZF

Production-Ready AI Agents: From LLMs to Small Language Models en

20260606T110500 20260606T115000 004500

Production-Ready AI Agents: From LLMs to Small Language Models

In this talk we will cover the complete Agent Development Lifecycle from Prototype to a scalable and robust Production agent with cost effective Small Language Models. The talk will present the following topics, gathered from real engagements with product teams: 1. **The Production Agent Problem (3 min)** The prototype-to-production gap, why closed, frontier LLMs don't scale, and the agent development lifecycle. 2. **Small Models, Big Impact (2 min)** The case for small open language models, the current model landscape and pursuing an iterative migration pattern. 3. **Test-Driven Agent Development (5 min)** Starting with clear use cases and adapting testing practices for non-deterministic systems. Covering evaluation patterns and practical examples of testing agent behavior for different types of agents. 4. **Techniques for migrating to Small Language Models (7 min)** Introducing task decomposition patterns, use of multi-model approaches and agent architectures better suited to Small Language Model utilisation. 5. **CI/CD for Agents** (7 min) Treating models and prompts as config rather than code. Building deployment pipelines that handle model and prompt versioning, integration and end-to-end testing for agents with MCP and A2A considerations, and agent packaging for production rollout. 6. **Observability and Monitoring** (4 min) Instrumenting agents with structured logging, tracking key metrics beyond traditional monitoring, and building dashboards and alerts that surface quality issues. Monitoring non-functional metrics such as cost, latency and concurrency. 7. **Continuous Improvement Loops** (4 min) Creating feedback pipelines from production data, triaging failures and automating analysis. Strategies for iterative improvement, and methods for measuring progress through A/B testing. As part of this talk, we will reference some Jupyter Notebooks and reusable code snippets with the PyData stack to enable attendees to begin their own Agentic journeys to production with Small Language Models. PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/3JJHZF/ Grand Hall 2 Prattyush Mangal PUBLISH 8Y9GRD@@pretalx.com

-8Y9GRD

Evaluating multi-turn conversations: A practical guide to AI Agent evals en

20260606T115000 20260606T123500 004500

Evaluating multi-turn conversations: A practical guide to AI Agent evals

As AI agents move from demos to production, evaluating their performance becomes one of the most important challenges for teams shipping them. Unlike single-turn LLM calls, conversations are messy. You can't evaluate a response in isolation, each turn depends on prior context and a perfectly correct answer in one conversation might be wrong in another. In this talk we'll discuss a systematic approach to evaluating complex multi-turn conversations. We'll talk about: - Defining what makes a "good" conversation - The unique challenges of multi-turn evaluation - Metrics for assessing conversation quality - Constructing evaluation datasets for conversational AI agents - Automated pipelines for continuous agent evaluation in production We'll show practical implementations using Python, with real-world examples from production agent systems across different domains. Attendees will leave with: - A structured framework for defining and measuring conversation quality in their domain - Practical techniques for evaluating multi-turn interactions at scale The session will provide actionable insights for AI engineers, data scientists, and product managers looking to evaluate AI agents rigorously and build stakeholder trust. PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/8Y9GRD/ Grand Hall 2 Lena Shakurova PUBLISH J99JNR@@pretalx.com

-J99JNR

Fast-Forward(ing) Models: Accelerating High-Dimensional Inference with AI Emulators en

20260606T144500 20260606T153000 004500

Fast-Forward(ing) Models: Accelerating High-Dimensional Inference with AI Emulators

This talk aims to show how we can accelerate the solving of complex, imperfect, high-dimensional physical models using machine learning. We will be discussing: - The motivation for accelerating models (3 mins) - An introduction to emulation (5 mins) - The most common emulation architectures (4 mins) - Effective sampling and parameter selection for training data (4 mins) - Model dimensionality reduction techniques and optimisation (4 mins) - Emulator uncertainty quantification and inference techniques (4 mins) - Designing an emulator workflow (2 mins) - Data augmentation for existing datasets (5 mins) - Use case examples (9 mins) Attendees will gain a deep understanding of how to architect surrogate models and the libraries typically used to create them, enabling users to "fast-forward" their own computationally intensive numerical models. Prerequisites: Basic familiarity with Python and regression concepts. No physics background required. PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/J99JNR/ Grand Hall 2 Austen Wallis PUBLISH YYTLFF@@pretalx.com

-YYTLFF

Bridging Pandas and Polars: The Hidden Costs of Dataframe Interoperability en

20260606T153000 20260606T161500 004500

Bridging Pandas and Polars: The Hidden Costs of Dataframe Interoperability

As organisations adopt Polars alongside Pandas, a critical question emerges: how do you move data between the two without silent data loss, performance regressions, or broken round-trips? The answer is more complex than calling `polars.from_pandas`. Pandas stores data in NumPy arrays by default, though as of 3.0 it uses Arrow for strings. Polars is built entirely on Apache Arrow's columnar format. For each area where these formats diverge, this talk will explain the problem and show how ArcticDB, a dataframe database that must serialize, store, and reconstruct both formats, solves it in practice: - **Memory layout**: How NumPy and Arrow represent the same logical data differently, and how a dataframe database can bridge the two - **Strings**: NumPy object arrays vs. Arrow's offset-based binary buffers -- why Arrow is dramatically more efficient and the cost of conversion - **Missing values**: NaN/NaT/None sentinels vs. Arrow's validity bitmask -- why a Pandas NaN behaves differently from a Polars null and what breaks during conversion - **Schema differences**: Different supported data types and different allowed column names -- e.g. Pandas allows mixed-type columns that Arrow cannot represent - **Pandas-specific metadata** that has no Arrow equivalent: Index and RangeIndex semantics, and MultiIndex which uses an entirely different memory layout with its own performance implications Together, these issues make conversion between Pandas and Polars far from trivial. This is especially challenging for a dataframe database like ArcticDB, where petabytes of Pandas DataFrames are stored and users increasingly want to read them back as Arrow. The talk will include benchmarks comparing native format reads against conversion-based approaches, and practical takeaways for anyone migrating a codebase, building a library that supports both formats, or choosing a dataframe database. The talk will include benchmarks comparing native format reads against conversion-based approaches, and practical takeaways for anyone migrating a codebase, building a library that supports both formats, or choosing a dataframe database. PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/YYTLFF/ Grand Hall 2 Ivo Dilov PUBLISH HBPSDS@@pretalx.com

-HBPSDS

Using coding agents with open models en

20260606T161500 20260606T170000 004500

Using coding agents with open models

PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/HBPSDS/ Grand Hall 2 Sujee Maniyam PUBLISH BPJEKV@@pretalx.com

-BPJEKV

Kafka Streaming, the Pythonic Way en

20260606T102000 20260606T110500 004500

Kafka Streaming, the Pythonic Way

The talk opens with a concrete example of stream processing. We have data flowing in, and a clear task to perform on it. No theory, no definitions, just a practical scenario the audience can immediately relate to. From there, we step back and look at how Kafka works. Topics, consumers, partitions, message formats. Just enough to understand the architecture behind the example, and to appreciate why Kafka has become the standard backbone for streaming systems. Then comes the friction. When you consume from Kafka, you get one message at a time. Each message is serialized as JSON or Protobuf. If you're a Python developer used to working with DataFrames, this feels like going back to writing for loops over rows. We'll look at what the naive approach looks like in code, and why it quickly becomes painful as processing logic gets more complex. With the problem clearly felt, we introduce the solution: treating Kafka not as a stream of individual messages but as a source of micro-batches, and deserializing those batches directly into Arrow-backed DataFrames using confluent-kafka and Apache Arrow. The processing code that follows looks identical to what you'd write against a Parquet file. We'll see both versions side by side to make this concrete. We close with lessons learned from applying this pattern in production over ten years. What breaks, what surprises you, and what trade-offs you should be aware of before adopting this approach in your own systems. The talk assumes familiarity with Python and basic data processing with DataFrames. No prior knowledge of Kafka or streaming is required. PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/BPJEKV/ Doddington Forum Arthur Andres PUBLISH T7GMEL@@pretalx.com

-T7GMEL

Beyond Spark MLlib: Deduplicating Common Crawl at Scale en

20260606T110500 20260606T115000 004500

Beyond Spark MLlib: Deduplicating Common Crawl at Scale

<Why This Matters for LLM Practitioners> Training data quality is the bottleneck for LLM performance. Research shows duplicate content causes memorization, reduced generalization, and wasted compute (Lee et al., 2022). Yet existing tools fail at web scale: Spark MLlib's MinHashLSH suffers shuffle explosion causing OOM errors, while Google's deduplicate-text-datasets requires 600GB+ RAM on a single machine. <What You'll Learn> This talk introduces a partition-aware MinHash LSH system built with PySpark and NumPy that scales horizontally on commodity clusters. The key innovation: using LSH band hashes to drive Spark's partitioning, co-locating similar documents before comparison and eliminating cross-partition shuffles entirely. <Target Audience> Data engineers and ML practitioners working with large text corpora for NLP/LLM applications. Familiarity with PySpark basics and general understanding of similarity matching is helpful but not required. <Talk Outline> Minutes 0-5: The deduplication challenge - why LLM training data needs deduplication, why O(N²) comparisons are infeasible, why you can't split into independent batches Minutes 5-10: Why existing tools fail - MLlib shuffle explosion, Google's memory requirements Minutes 10-18: Our solution - partition-aware MinHash LSH architecture, code walkthrough showing pandas_udf vectorization and band-based partitioning Minutes 18-25: Worked example - following two documents through the pipeline: hashing → band assignment → partition co-location → local candidate generation → connected components Minutes 25-30: Benchmarks and practical lessons - 253M documents, 2.1B candidate pairs, under 5 hours, under $100. boilerplate filtering with MAX_BAND_SIZE Minutes 30-40: Q&A <Key Takeaways> How to use LSH band hashes to drive Spark partitioning for local similarity computation Vectorized MinHash generation with NumPy and pandas_udf to avoid Python UDF overhead Strategies for handling boilerplate-induced false positives at scale A production-ready architecture that will be open-sourced <Background Knowledge> Basic PySpark familiarity (DataFrames, partitions). No prior knowledge of MinHash or LSH required — these concepts will be explained. PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/T7GMEL/ Doddington Forum Ken Obata PUBLISH 8JJUKQ@@pretalx.com

-8JJUKQ

Governance-as-Code for the Lakehouse: Zero Trust with Iceberg REST Catalog and Policy Engines en

20260606T115000 20260606T123500 004500

Governance-as-Code for the Lakehouse: Zero Trust with Iceberg REST Catalog and Policy Engines

Lakehouse architectures unify data lakes and warehouses, but governance models often lag behind the architectural innovation. Access control is frequently engine-specific, policies are fragmented, and trust is implicit. This talk argues that the missing layer in many lakehouse implementations is governance-as-code enforced at the catalog boundary. **We explore:** - How the Iceberg REST Catalog introduces a centralized enforcement point decoupled from compute engines - Why Zero Trust principles apply to data platforms (no implicit trust between engines, users, or services) - How policy-as-code systems such as OPA and Cedar enable versioned, testable, auditable access control - Patterns for implementing fine-grained authorization (row/column-level policies, environment isolation, service-to-service trust) - How governance becomes reproducible and portable across Spark, Flink, Trino, and other engines The session focuses on architectural patterns rather than vendor-specific tooling and highlights practical trade-offs when implementing policy enforcement in production lakehouses. **Key Takeaways** 1. Understand why traditional RBAC is insufficient for modern lakehouses 3. Learn how REST-based catalog architectures enable centralized governance 5. See how Zero Trust can be applied to data access workflows 7. Discover how to implement policy-as-code using OPA or Cedar 9. Gain a reference architecture for governance-first lakehouse design PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/8JJUKQ/ Doddington Forum Viktor Kessler PUBLISH QMGS7U@@pretalx.com

-QMGS7U

MCP, or not MCP en

20260606T144500 20260606T153000 004500

MCP, or not MCP

Outline: * Intro: how can I get data from this API into my Claude Code session? * What is MCP? When should you use it, when should you use other tools * Work through an example * Sharing and deploying MCP servers, alternatives and best practices * Optimizing your tools for best results PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/QMGS7U/ Doddington Forum Neal Richardson PUBLISH T7BQTG@@pretalx.com

-T7BQTG

Build your castle, dig your moat: AI sovereignty, provenance and compliance en

20260606T153000 20260606T161500 004500

Build your castle, dig your moat: AI sovereignty, provenance and compliance

In this talk you’ll learn… • What AI sovereignty actually means for your stack and your business • How to evaluate self-hosted, local LLMs • Overview of supply chain security controls for data and code artifacts – provenance, signatures and compliance measures, opacity and trust signals PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/T7BQTG/ Doddington Forum Daina Bouquin PUBLISH 3MAU9W@@pretalx.com

-3MAU9W

Documenting your open source projects for machines en

20260606T161500 20260606T170000 004500

Documenting your open source projects for machines

Introduction (5 mins) How LLM tools consume documentation pages (5 mins) Quick steps you can take to improve things (5 mins) Build markdown pages with sphinx-llm or mkdocs-llmstxt Helping LLMs write idiomatic code for your library (5 mins) Constraining how your code is used (5 mins) Conclusions (5 mins) PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/3MAU9W/ Doddington Forum Jacob Tomlinson PUBLISH EQZ7VK@@pretalx.com

-EQZ7VK

From Noisy Sensors to Events: Event Detection in Sensor data with Kalman Filters and Hidden Markov Models en

20260606T102000 20260606T110500 004500

From Noisy Sensors to Events: Event Detection in Sensor data with Kalman Filters and Hidden Markov Models

Objective Many operations depend on accurate data from continuous sensor streams. Knowing when a system transitions between states, when a process cycle completes, and how much change occurred per cycle drives scheduling, monitoring, and operational reporting. This talk presents a complete data science pipeline — built entirely in Python — that automates event detection and value estimation from noisy sensor streams. The goal is to give attendees both a worked real-world case study and a transferable toolkit for tackling noisy, event-driven sensor data in any domain. The Problem Sensors record measurements continuously, but the raw signal is far from clean. Vibrations, speed changes, and environmental shifts all create noise that masks the true underlying state of the system (for example: wake, light sleep, deep sleep, REM sleep). A naive threshold-based approach — the initial "traditional method" — is brittle: it misfires on transient spikes, misses gradual transitions, and cannot estimate values reliably. This section sets up the problem visually with annotated sensor traces and shows concretely where simple methods break down. Why Kalman Filter + Hidden Markov Model? The key insight is that the system operates as a latent state machine: at any moment it is in one of a small number of discrete states (idle, transitioning, active, completing), and what we observe is a noisy function of that state. This framing motivates a two-stage approach: Kalman Filter — smooths the raw signal, handles sensor noise, and provides a principled estimate of the true instantaneous value with an associated uncertainty. Hidden Markov Model — takes the smoothed signal and infers the sequence of hidden states, including the timing of transitions and the most probable value estimate at peak. The talk explains the intuition behind both models without heavy mathematics, and then shows how to implement them in Python with filterpy (Kalman) and hmmlearn (HMM). PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/EQZ7VK/ Hardwick Hub Ono Gantsog PUBLISH CKV8PH@@pretalx.com

-CKV8PH

Mapping the local heat transition: from large-scale geospatial data to real-world impact en

20260606T110500 20260606T115000 004500

Mapping the local heat transition: from large-scale geospatial data to real-world impact

Decarbonising UK’s home heating is one of the greatest challenges of the Net Zero transition, yet it currently relies on individual household decisions supported by government incentives. To help accelerate the local delivery, we are building a tool that maps the most suitable low-carbon heating for clusters of properties at a neighbourhood level. We will walk through the end-to-end journey of building a data product, from handling open data (such Ordnance Survey products and EPC) to designing a user interface that empowers non-technical decision-makers. What we will cover: - Our data science pipeline: Processing large-scale geospatial data, deployment of classification models and clustering algorithms, evaluating pipelines where ground truth data does not yet exist, etc - Our Python tech stack - A walkthrough of the user interface - The process of translating the needs of local authorities into a functional and intuitive product Who should attend? No prior technical knowledge is required. Whether you are a data science newcomer or a seasoned professional with a decade of experience, this talk is designed to be accessible to all. We welcome: - Data scientist, engineers, academics, machine learning engineers curious about how data science operates within a mission-driven, not-for-profit context. - Project and product managers looking for a roadmap to steer complex data products from concept to delivery. Key takeaways: By the end of this session, you will gain a deeper understanding of: - Data science in practice: data science techniques and libraries used - Applying data science for impact: How to bridge the gap between complex modelling and the practical needs of external stakeholders - Multidisciplinary collaboration: lessons earned from a team of data scientists, full-stack developers, designers, and domain experts working toward a common goal. PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/CKV8PH/ Hardwick Hub Sofia Pinto Simran Dave PUBLISH A38MW7@@pretalx.com

-A38MW7

Hazards on the Causal Path: Bayesian Time-Varying Survival Analysis with PyMC en

20260606T115000 20260606T123500 004500

Hazards on the Causal Path: Bayesian Time-Varying Survival Analysis with PyMC

Survival analysis is often used to answer when an event occurs, but in many real-world settings we also care about how and through which mechanisms interventions exert their effects over time. Dynamic Path Analysis (DPA), introduced by Aalen and colleagues, addresses this by decomposing time-varying effects on the hazard into direct and mediated causal pathways, allowing these relationships to evolve dynamically. In this talk, I present a Bayesian, generative reinterpretation of Dynamic Path Analysis implemented in PyMC. The model discretises time into intervals and represents cumulative hazard effects using smooth spline-based priors, enabling stable estimation of time-varying direct and indirect effects with full posterior uncertainty. I show how this approach recovers the qualitative behaviour of canonical dpasurv examples while extending them to a fully probabilistic framework. The emphasis is on the causal decomposition of hazards, clarifying why DPA is well suited to reasoning about evolving mediation structures and intervention planning. The talk highlights how generative Bayesian models make these ideas more flexible, interpretable, and extensible within the Python ecosystem. We end with practical recipes for using g-computation to derive non-parametric estimates of direct, indirect and survival-curve-differences from the fitted DPA model. Target audience: data scientists and researchers with some familiarity with survival analysis or Bayesian modelling. Takeaway: attendees will understand when and why to use dynamic causal hazard models, and how to implement them in practice using PyMC. PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/A38MW7/ Hardwick Hub Nathaniel Forde PUBLISH H7PFXK@@pretalx.com

-H7PFXK

Did Your Rollout Actually Work? Measuring Phased Launches with Staggered DiD in Python en

20260606T144500 20260606T153000 004500

Did Your Rollout Actually Work? Measuring Phased Launches with Staggered DiD in Python

### Who this is for Data scientists, analysts, and applied ML/measurement practitioners who evaluate interventions using observational or quasi-experimental data (e.g., feature flags, phased launches, regional changes). Familiarity with pandas and basic regression is helpful; no prior Bayesian experience required. ### What attendees will learn (takeaways) - How staggered adoption differs from "textbook" two-period Difference-in-Differences, and why the difference matters in production measurement. - How the imputation-based estimator (Borusyak, Jaravel & Spiess, 2024) works: fit on untreated observations, predict counterfactuals, aggregate by event time. - How to turn model output into stakeholder-friendly language: probability of positive effect, expected uplift, decision thresholds — no Bayesian background needed. - The parameter recovery pattern: validate your method on simulated data with known truth before trusting it on real data. - Practical diagnostics and red flags: parallel trends, anticipation effects, spillovers, and when *not* to use this method. ### Outline and time plan (30 min talk + 10 min Q&A) - 0–4 min: The real-world problem — phased rollouts and why naive pre/post comparisons fail - 4–10 min: DiD refresher, then what breaks under staggered adoption (timing heterogeneity, negative weighting in TWFE) - 10–17 min: The staggered DiD solution (event-time framing, imputation intuition, key assumptions) - 17–25 min: Worked example in Python with CausalPy - A loyalty program rolled out to 60 stores in 3 waves over 30 weeks - Visualise adoption timing and check pre-trends - Fit the model and produce event-study plots - Parameter recovery: compare estimated effects to known ground truth - 25–28 min: Diagnostics — pre-treatment placebo checks, counterfactual inspection, "when not to use this" decision checklist - 28–30 min: Summary — three takeaways and the six-step workflow - 30–40 min: Q&A ### Background knowledge needed - Comfortable with tidy data, grouping/aggregating, and reading a regression coefficient. - Basic causal inference vocabulary (treatment/control, confounding) is helpful but not required. ### What I will provide A public GitHub repository containing: - a reproducible Quarto notebook (the slides themselves, with all code), - a synthetic dataset simulating a realistic store loyalty program rollout, - and environment setup instructions (conda environment file). PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/H7PFXK/ Hardwick Hub Benjamin Vincent PUBLISH JWNWFQ@@pretalx.com

-JWNWFQ

Do Multilingual Embeddings Really Share a Semantic Space? Practical Lessons Across Scripts and Languages en

20260606T153000 20260606T161500 004500

Do Multilingual Embeddings Really Share a Semantic Space? Practical Lessons Across Scripts and Languages

Multilingual embedding models are widely used in retrieval, search, recommendation, and RAG pipelines under the assumption that semantically similar text across languages occupies a shared embedding space. This talk examines how true that assumption is in practice. Using pre-trained multilingual embedding models, I explore examples where multilingual alignment works extremely well, and others where it breaks down unexpectedly. Across multiple languages, we will look at how tokenisation, training data imbalance, and semantic ambiguity shape embedding geometry and retrieval behaviour. Rather than focusing on benchmark performance, the talk emphasises intuition and failure analysis: - Why do some languages align much more reliably than others? - Why do averages often hide important multilingual failures? - What happens when semantic ambiguity enters the embedding space? Through UMAP projections, nearest-neighbour analyses, tokenisation patterns, and translation similarity distributions, we will build a practical mental model for understanding multilingual embeddings beyond the assumption of “one shared semantic space.” The talk concludes with concrete diagnostics practitioners can use, along with common failure modes to watch for in applications. PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/JWNWFQ/ Hardwick Hub Kavit Tolia PUBLISH JFJFQX@@pretalx.com

-JFJFQX

Designing Semantic Memory for Multi-Agent Systems with Python en

20260606T161500 20260606T170000 004500

Designing Semantic Memory for Multi-Agent Systems with Python

This session focuses specifically on semantic memory architecture as the critical systems layer in production-grade multi-agent AI applications. From my role on the Azure Cosmos DB engineering team, I’ve worked with teams building large-scale agentic systems that must support multi-tenancy, personalization, long-lived conversational state, and operational observability. A consistent lesson is that orchestration frameworks coordinate agents, but memory design determines whether the system behaves coherently over time. The talk will cover: - A practical taxonomy of agent memory: short-term state, episodic logs, declarative knowledge, and procedural memory - Modeling conversations as append-only event streams versus mutable session documents - Designing retrieval-aware memory stores that combine structured filtering with semantic signals - Memory lifecycle management: summarization spans, supersession flags, retention windows, and TTL-based compaction - Checkpointed agent workflows for traceability and debugging - Multi-tenant memory partitioning strategies - Cost tradeoffs between growing context windows and durable storage A live Python-based multi-agent travel planner (built with LangGraph and backed by Azure Cosmos DB) will demonstrate these patterns in practice, including MCP-based memory tools that separate reasoning from storage concerns. The goal is to provide PyData attendees with a concrete systems framework for thinking about semantic memory, not as an afterthought to prompting, but as a first-class data architecture problem at the intersection of distributed systems and applied AI. PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/JFJFQX/ Hardwick Hub Theo van Kraay PUBLISH BUEZSA@@pretalx.com

-BUEZSA

PyMC Code Sprint en

20260606T102500 20260606T122500 020000

PyMC Code Sprint

PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/BUEZSA/ Board Room- Unconference Track Chris Fonnesbeck Oriol Abril Pla PUBLISH RZPMEY@@pretalx.com

-RZPMEY

Diversity Scholar Luncheon en

20260606T123500 20260606T133500 010000

Diversity Scholar Luncheon

PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/RZPMEY/ Board Room- Unconference Track NumFOCUS PUBLISH 8SWJS9@@pretalx.com

-8SWJS9

Unconference- Feminist AI en

20260606T144500 20260606T153000 004500

Unconference- Feminist AI

PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/8SWJS9/ Board Room- Unconference Track Cheuk Ting Ho PUBLISH YPFBTB@@pretalx.com

-YPFBTB

Lightning Talks en

20260607T090000 20260607T094500 004500

Lightning Talks

PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/YPFBTB/ Grand Hall 1 NumFOCUS PUBLISH NMFNQJ@@pretalx.com

-NMFNQJ

Your ML Pipeline Meets the EU AI Act en

20260607T101500 20260607T110000 004500

Your ML Pipeline Meets the EU AI Act

Resources with slides and interactive checklist: anx.io/SI1ja The EU AI Act introduces new obligations that will directly affect how machine learning systems are designed, evaluated, and operated. While the regulation is often discussed from a legal perspective, many of its practical consequences fall squarely into the domain of data scientists and ML engineers. This talk provides an engineering-focused walkthrough of where the EU AI Act intersects with the modern ML lifecycle. We map key regulatory expectations to familiar technical stages such as data collection, model training, evaluation, deployment, and monitoring. Rather than diving into legal detail, the session focuses on concrete implementation patterns and common failure modes observed in real-world ML workflows. Attendees will learn how to perform lightweight risk classification, identify typical compliance gaps in existing pipelines, and apply design patterns that improve traceability, documentation, and monitoring without significantly slowing down development. The talk concludes with a practical readiness checklist that teams can immediately apply to their own systems. Target audience: data scientists, ML engineers, and MLOps practitioners working with production ML systems. Expected background: familiarity with the basic ML lifecycle and model deployment concepts. No prior knowledge of the EU AI Act is required. Key takeaways: - Understand where the EU AI Act impacts ML pipelines - Learn practical patterns for AI Act readiness - Avoid common compliance pitfalls in production ML - Leave with a concrete checklist for next steps PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/NMFNQJ/ Grand Hall 1 Gabriel Lipnik PUBLISH MMS9WY@@pretalx.com

-MMS9WY

The Silent Crash: Why Your RAG Evaluation Metrics Are Lying to You en

20260607T110000 20260607T114500 004500

The Silent Crash: Why Your RAG Evaluation Metrics Are Lying to You

Picture this: You’ve just finished your RAG pipeline. The test dashboard is all green, Context Recall is 85%, Answer Relevance is 92%. You deploy with confidence. Ten minutes later, a user asks a simple question, and the bot confidently gives the wrong answer. Why did the metrics pass? Because **similarity is not correctness**. To a vector database, "The treatment is safe" and "The treatment is not safe" look nearly identical, they share the same words and sentence structure. But logically, they are opposites. Standard metrics like Cosine Similarity or BLEU often completely miss these critical negations. In this talk, we are going to stop relying on "vibe checks" and start treating Evaluation as a software testing problem. We’ll look at why traditional NLP metrics are useless for RAG and move toward the new standard: **LLM-as-a-Judge**. We will discuss the messy reality of using GPT-4 to grade Llama-3, how to catch "Self-Preference Bias" (where models just like their own writing style), and how to do all of this without bankrupting your API budget. **Outline** - **Real-world examples** where high metrics hid major failures, and why "Finding the doc" (Retrieval) is different from "Answering the question" (Generation). - Why Your Metrics Are Broken: Why **Cosine Similarity is good for search but bad for truth**, and why BLEU scores punish correct answers just for using different synonyms. - Using models (like G-Eval) to grade logic and tone, and solving the "Judge Paradox" by swapping options to remove Position Bias. - Building a "Hard" Test Set: How to stop testing on easy questions and generate adversarial "Trick Questions" that specifically target your retrieval gaps. - Key Takeaways: A practical strategy for using metrics, plus a look at tools like Ragas and DeepEval. PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/MMS9WY/ Grand Hall 1 Hitendri Bomble Arghyadeep Sarkar PUBLISH CJGBGV@@pretalx.com

-CJGBGV

Vibe NLP for Applied NLP en

20260607T114500 20260607T123000 004500

Vibe NLP for Applied NLP

At the core of it is an often overlooked idea: using LLMs to *build systems* instead of *as systems*. AI-powered coding assistants have transformed the way we build software – and they can be even more impactful for AI development itself and bridge the experience gap that's often holding teams back and causing projects to fail. In the talk, I will show you a new way of using generative models for AI development, and some practical examples of how to make "Vibe NLP" work for real-world problems. PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/CJGBGV/ Grand Hall 1 Ines Montani PUBLISH XQXNVK@@pretalx.com

-XQXNVK

Keynote- Martin O'Reilly- LLMs and AI agents demystified en

20260607T133000 20260607T141500 004500

Keynote- Martin O'Reilly- LLMs and AI agents demystified

PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/XQXNVK/ Grand Hall 1 Martin O'Reilly PUBLISH GBGB9X@@pretalx.com

-GBGB9X

AI-Assisted Creative for Automated Marketing using Python en

20260607T144500 20260607T153000 004500

AI-Assisted Creative for Automated Marketing using Python

Large content catalogues create a classic long-tail problem: while a small number of titles receive heavy promotion, a large proportion of overall consumption comes from many programmes with relatively small individual audiences. Producing bespoke marketing assets for this long tail is usually impractical, as traditional workflows rely on manual design and editing. This talk presents a real-world Python-based system that automates marketing asset production at scale by combining audience data, asset metadata, machine learning models and automated rendering through Adobe After Effects. The pipeline generates thousands of platform-specific video and image assets, including multi-title creatives populated dynamically using recommendation outputs. We’ve even gone a step further by tapping into catalogue ads in paid social marketing and we’re able to deploy direct to audience-facing without any human intervention using python’s Dropbox API. A key focus of the talk is how we made automation safe for audience-facing outputs without compromising editorial standards. We will cover the design of automated QA layers that utilise python’s OpenAI API, rule-based validation, and alerting mechanisms using python’s Slack API that trigger human intervention when necessary. Plotly dash apps allow review and controlled interventions such as blacklisting problematic shows. While the domain is media, the architectural challenges can be applied to other data-driven workflows: orchestration, quality assurance, risk management and human-in-the-loop design. The session is aimed at data scientists, ML engineers and data engineers interested in automation and production pipelines. The talk will aim to be accessible to all and focus on the application and output interspersed with relevant python code snippets. Rough timings: 0–5 min — The long-tail problem in large content catalogues 5-10 min — Examples of marketing creative 10 - 15 min — Demo of running Adobe After Effects through python 15 - 25 min — System overview: data sources, models, and orchestration 25–30 min — Making automation safe: QA layers, rules, tooling, and alerting 30–35 min — Multi-title assets and recommendation-driven content selection 35–40 min — Key lessons, design principles, and audience Q&A PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/GBGB9X/ Grand Hall 1 Matt Crooks PUBLISH HAYANG@@pretalx.com

-HAYANG

LLM-Based Recommendation Systems: From Embeddings to Real Personalization en

20260607T153000 20260607T161500 004500

LLM-Based Recommendation Systems: From Embeddings to Real Personalization

Recommendation systems are a core component of many data-driven products, yet most practitioners are still navigating how and when to incorporate Large Language Models into these systems effectively. This talk presents a practical, end-to-end view of LLM-based recommendation systems. We start by revisiting classical recommendation architectures and then move into modern approaches built around embeddings, vector similarity search, and retrieval-augmented generation (RAG). Topics covered include: Using LLM embeddings for user and item representation Hybrid retrieval pipelines combining vector search and traditional ranking models Prompt-driven personalization and context-aware recommendations Offline and online evaluation strategies for LLM-based recommenders Trade-offs around latency, cost, and system complexity The focus is on real-world applicability rather than theoretical novelty. Examples and design patterns are drawn from production-like systems and practical experimentation. This session is aimed at data scientists, ML engineers, and practitioners who want to move beyond hype and build recommendation systems that deliver meaningful personalization using LLMs. PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/HAYANG/ Grand Hall 1 Özge Çinko PUBLISH KDWRYR@@pretalx.com

-KDWRYR

The Future of Notebooks in a Claude Code World** en

20260607T161500 20260607T170000 004500

The Future of Notebooks in a Claude Code World**

1. **The notebook's hidden contract** — Jupyter intermingles three valuable things: interactive execution, a long-running process that caches state in memory, and a narrative log of exploration steps. The coupling has real benefits — edit-in-place re-execution captures a clean story, not a noisy shell log, and the persistent kernel means you never have to reload expensive state. But the coupling is also why notebooks are fragile, unreproducible, and can't go to production. AI agents are fine with long-running stateful processes like databases (explicit state, declarative interface, introspectable). A notebook kernel is the opposite — implicit mutable state, imperative, execution-order-dependent — and the agent's model of it degrades as state accumulates. Self-contained steps with explicit inputs and cached outputs have much better failure modes. 2. **The display surface gap** — How data professionals actually use Claude Code today: saving PNGs, dumping ASCII tables, switching back and forth to Jupyter. The terminal is a fantastic interface for intent but a terrible interface for output. 3. **Prior art and adjacent solutions** — MCP Apps (renders UI inside Claude Desktop's chat window), chart-canvas (browser dashboard for Claude Desktop), Data Formulator (Microsoft's standalone viz tool). What each gets right, and why none of them solve the CLI agent case. 4. **The deconstructed notebook (live demo)** — Separate the three concerns. Terminal for intent. The PyData Arrow stack driven by Ibis/xorq for compute, with instructions written by Claude. Browser for display. Live walkthrough of the working system: the audience sees the browser update in real time as Claude iterates, with tables, charts, and diffs appearing in structured blocks, not a scrolling chat log. The default view shows the current result at each step — preserving the notebook's narrative quality — with iteration history available but not in your face. 5. **Iteration and diffing (live demo)** — Exploratory data analysis through model evaluation, driven by conversation. The audience watches the full loop live: prompt, compute, result, diff, refine. Interactive tables with sort/filter, Vega-Lite charts, side-by-side diffs showing exactly what changed between iterations, and expression lineage tracing the full dependency graph from raw data to final result. 6. **What the compute substrate needs to get right** — Why "just render HTML" isn't enough. The notebook kernel's real job is caching — keeping expensive intermediate results in memory so you can build on them. To decouple the notebook, you need a substrate that handles caching automatically: expression results stored as Parquet on local disk, streamed to the next step, no long-running process needed. Plus: content-addressing (so every iteration is retrievable), typed schemas (so composition errors are caught early), and separation of transform logic from visualization. Brief introduction of xorq's expression model as one approach to these requirements. 7. **Design principles for post-notebook tooling** — Expressions are append-only and immutable — every iteration is preserved. But the workflow and the final view are structured like a notebook: blocks that are iterated on, each showing its current result, with history accessible underneath. These blocks can be arranged into a traditional notebook-style narrative or a dashboard. The human controls the intent and reviews the display. Diffing is a first-class operation. Every intermediate result is addressable. PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/KDWRYR/ Grand Hall 1 Paddy Mullen PUBLISH BGNZLQ@@pretalx.com

-BGNZLQ

Tesco AI & Data Science: From Recipes to Reality en

20260607T101500 20260607T110000 004500

Tesco AI & Data Science: From Recipes to Reality

PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/BGNZLQ/ Doddington Forum Julie Huang Kareem Hussein PUBLISH KQDKTE@@pretalx.com

-KQDKTE

Querying the queries: SQL Metaprogramming in Python en

20260607T110000 20260607T114500 004500

Querying the queries: SQL Metaprogramming in Python

SQL sits at the heart of most analytics and data engineering work, yet the way we maintain SQL rarely scales with the complexity of our pipelines. As codebases grow, SQL tends to accumulate structural debt: duplicated logic, subtle inconsistencies, deeply nested subqueries, and transformations that are difficult to apply reliably. Teams often end up relying on manual pattern‑matching, ad‑hoc scripts, or one‑off rewrites, approaches that are fragile and nearly impossible to generalise. This talk presents a more systematic solution: treat queries as manipulable data through metaprogramming in Python. Instead of working with SQL as raw text, we use Python to parse queries into Abstract Syntax Trees (ASTs), unlocking the ability to inspect, analyze, and modify SQL with precision at scale. After introducing the intuition behind SQL ASTs, we walk through what they look like in practice using Python libraries such as sqloxide. With queries represented as nested dictionaries, we can traverse them, detect patterns, and apply targeted modifications without breaking syntactic structure. The session demonstrates several real examples that highlight the power of this approach: evaluating subquery depth for complexity diagnostics, adding defensive transformations such as wrapping denominators in NULLIF(), generating consistent aliases for aggregation expressions, and extracting table references to infer dependency graphs across staging or temporary‑table‑heavy pipelines. Rather than offering a single tool or framework, this talk focuses on the underlying metaprogramming techniques that empower engineers to build their own SQL analysis and refactoring utilities. Attendees will leave with a clear mental model of how SQL parsing works, how ASTs can be manipulated in Python, and how these patterns can be applied to enforce standards, build linters, or automate large‑scale refactors. Background required: - Intermediate familiarity with Python (nested dictionaries, basic tree algorithms). - Intermediate familiarity with SQL (CTEs, subqueries, aggregates) - No prior knowledge of compiler theory or ASTs is assumed Outline: - 0–3 min — Motivation: Why SQL Refactoring Is Hard -- Structural debt in real SQL codebases: duplication, inconsistencies, nested logic -- Why regex and manual review fail at scale 3–8 min — Key Idea: Treat SQL as Data -- What is an Abstract Syntax Tree (AST)? -- Using Python libraries (e.g., sqloxide) to parse SQL into manipulable structures 8–15 min — Demo: Exploring Real SQL ASTs in Python -- Show nested dictionaries representing SQL structure -- Simple tree traversal patterns 15–25 min — Practical Refactoring Examples -- Computing subquery depth (complexity linting) -- Auto‑aliasing aggregate expressions -- Wrapping denominators with NULLIF() -- Extracting table references for dependency graphs 25–32 min — Building Custom SQL Tooling -- How these patterns generalize -- Enforcing standards, writing linters, automating bulk rewrites -- When AST‑based tooling is worth it 32–40 min — Lessons Learned & Limits + Q&A -- Homoiconicity (Python vs Lisp for AST manipulation) PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/KQDKTE/ Doddington Forum Michel Semaan PUBLISH AMGUEK@@pretalx.com

-AMGUEK

Making tech boring to keep data exciting en

20260607T114500 20260607T123000 004500

Making tech boring to keep data exciting

Data engineering succeeds when it disappears into the background. Not because it’s unimportant, but because it becomes reliable enough that other teams can build on it without thinking about it. In many organisations, the opposite happens: pipelines are fragile, changes are risky, and operational work consumes the roadmap. This talk tells the story of moving from that state to one where the pipeline becomes a platform: - Predictable runs and recovery: designing for frequent ingest, safe execution windows, and fast time-to-recover when things fail. - Incremental modernisation: migrating orchestration and execution in a way that avoids running parallel “shadow pipelines” and reduces blast radius. - System transparency: turning a black box into something teams can interrogate — what ran, what it produced, what failed, what changed, and why. - Data quality as a product feature: creating actionable quality signals (not just logs), so improvements to text quality and search relevance can ship quickly and be measured. - Federation and alignment via a shared layer: using a data lake/warehouse layer to consolidate outputs, align metrics across teams, and remove ad hoc transforms at the edges. - Unblocking downstream users: improving interfaces and handoffs so application, policy, and data science teams can self-serve, iterate, and trust the numbers. The emphasis is on the big picture: how to set goals that matter (scale, resilience, extendability), how to define “done” in operational terms, and how to deliver tangible improvements sprint by sprint while still laying foundations for the future. The takeaway is a repeatable approach for making data infrastructure boring — so the work built on top of it can be exciting. PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/AMGUEK/ Doddington Forum Fred O'Loughlin Kerry Parker Mark Cottam PUBLISH HPJR9B@@pretalx.com

-HPJR9B

The Polars vs SQL differences nobody is talking about en

20260607T144500 20260607T153000 004500

The Polars vs SQL differences nobody is talking about

Polars is a dataframe library that started gaining significant traction in the data science community around 2022/2023. It is now generally regarded as a safer and more performant alternative to its extremely popular counterpart pandas. As such, it has attracted several performance comparisons with SQL-like engines such as DuckDB, PySpark, Daft, and more. What's typically missing from these comparisons is an explanation of the semantic differences. For example: - Why does Polars let me do `pl.col('price') - pl.col('price').mean()`, but SQL doesn't? - Why does Polars let me filter using window functions, and how can I get SQL to? - Are there operations that are more dangerous in Polars than in SQL? - How do they differ when working with time zones? - Why did SQL reorder my rows when Polars didn't? Outline of the talk: - Motivation: why care about Polars or about SQL? - Relational model background, row order - Polars model, how it differs from the relational model, and what this means for you - Abstracting the Polars and SQL differences away in Narwhals, and advice for non-Narwhals users - Q&A This is a technical but accessible talk aimed at data practitioners. Data engineers, data scientists, data analysts, and anyone else working with data will leave the talk with stronger theoretical foundations regarding the Polars and SQL data models. Most importantly, they will learn what this means for them, and what they can do about it. PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/HPJR9B/ Doddington Forum Marco Gorelli PUBLISH QQWDVQ@@pretalx.com

-QQWDVQ

From Chat-with-PDF to Quiz-Master: Live-Grading RAG with LLM-as-Judge in Python en

20260607T153000 20260607T161500 004500

From Chat-with-PDF to Quiz-Master: Live-Grading RAG with LLM-as-Judge in Python

RAG systems typically answer questions but rarely evaluate whether the answer, or the user, actually demonstrates understanding. That requires structured datasets, grading logic, and application state, not just retrieval. In this talk, we build a live-graded “knowledge arena”: a Python application that converts a dense technical document into an interactive quiz with two modes: - Easy mode - automatically generated multiple-choice questions with plausible distractors - Expert mode - free-text answers scored in real time using semantic LLM metrics The implementation illustrates several reusable production patterns: - **Document ingestion (Docling)**: Extracting layout, tables, and figures so evaluation covers the full source rather than plain text only. - **Synthetic dataset generation (DeepEval)**: Creating “golden” QA pairs and automated distractors for benchmarking and training. - **LLM-as-judge grading**: Scoring free-text answers with semantic metrics instead of brittle string matching. - **Stateful Python UI (Marimo)**: Managing interaction and evaluation loops without custom JavaScript. Although the interface is playful, the architecture generalises to production RAG and agentic knowledge systems for benchmarking, training, and human-in-the-loop evaluation. This talk presents a reusable LLM-as-judge architecture for evaluating understanding in RAG systems using synthetic QA generation and real-time semantic grading in Python. All demo components are pre-built and run locally with cached models and datasets. **Audience / Prerequisites** - Intermediate Python users familiar with basic LLM and RAG concepts (embeddings, retrieval). - No prior experience with Docling, DeepEval, or Marimo required. **Key Takeaways** - A reusable LLM-as-judge evaluation pattern for RAG - How to generate QA benchmarks from documents automatically - Techniques for handling tables and figures in ingestion - Where live grading fits into production workflows Full code example available here: https://github.com/Cadarn/PyData-AI-Generated-Quiz PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/QQWDVQ/ Doddington Forum Adam Hill PUBLISH CHQLFU@@pretalx.com

-CHQLFU

When Your Dataset Has Blind Spots: Practical LLM-Based Data Augmentation en

20260607T161500 20260607T170000 004500

When Your Dataset Has Blind Spots: Practical LLM-Based Data Augmentation

## Objective Many machine learning teams struggle not because of model limitations, but because their datasets fail to cover rare classes, niche domains, or emerging user behavior. Traditional data augmentation techniques offer limited help for text, often producing surface-level variations without meaningful semantic diversity. This talk presents a practical framework for using large language models to augment NLP datasets. ## Outline - The Data Bottleneck: Why models trained on "standard" food language fail to generalize to "Molecular Gastronomy" or niche culinary terms. - Three Complementary Techniques: 1. Synthetic Generation: Creating fully labeled examples for missing classes. 2. LoRA Adapters: Fine-tuning LLMs to control style and label consistency (e.g., matching a "Professional Critic" tone). 3. LLM Annotation: Labeling large volumes of messy, real-world text from social media or external scrapes. - Validation Strategies: Addressing error amplification and bias through human agreement checks, self-consistency, and "LLM-as-a-judge" approaches. - Measuring Impact: Evaluating downstream model performance via rare-class recall, calibration, and error distribution. ## Central Thesis and Takeaways The session provides a decision framework for choosing between generation, fine-tuning, and annotation based on data availability and the need for style or tone. Attendees will walk away with strategies to ensure synthetic data quality before retraining their models. ## Background Knowledge Expected Basic knowledge of Python and familiarity with machine learning workflows (training, labelling, and evaluation) is recommended. PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/CHQLFU/ Doddington Forum Ophelie Bleu PUBLISH BAFKCL@@pretalx.com

-BAFKCL

From SQL to Python: Building Data Context for Agents and People en

20260607T101500 20260607T110000 004500

From SQL to Python: Building Data Context for Agents and People

Text-to-SQL is often presented as the future interface for AI-driven analytics: connect an LLM to your warehouse, ask questions, get answers. The demo works. But production systems reveal a deeper issue: SQL can query structure, but it cannot provide the context required to understand what data actually means. After years of building data infrastructure, I’ve learned that context is the real bottleneck - for both people and agents. This becomes unavoidable in S3-first, multimodal environments: video, audio, medical scans, sensor streams, and model outputs. In these projects, the source of truth is object storage, and meaning is defined by Python pipelines. To reason correctly, you need data context across multiple layers: - **Storage context -** what exists, where it lives, and how it changes - **Metadata context** - what’s inside files, extracted signals, and hierarchical structure - **Dataset context** - how files are grouped, reused across datasets, and versioned - **Code context** - the Python transformations that define semantics and intent In this talk, I’ll present a practical framework for collecting and using these layers systematically. Using DataChain as a concrete example, I’ll show how typed schemas (e.g., Pydantic), vectorized metadata operations, and scalable Python execution make multimodal workflows understandable, reusable, and agent-ready - especially in Physical AI and biotech. Attendees will leave with a clear mental model for building data platforms where meaning lives in code, and agents can operate with real context rather than isolated queries. PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/BAFKCL/ Hardwick Hub Dmitry Petrov PUBLISH R7UEBE@@pretalx.com

-R7UEBE

The Clean Energy Graveyard: Using Python & Gemini to Map the UK's Cancelled Renewable's en

20260607T110000 20260607T114500 004500

The Clean Energy Graveyard: Using Python & Gemini to Map the UK's Cancelled Renewable's

Within this talk, I'll go through some data on Britain's hidden energy crisis, including evaluating current quality of the Renewable Energy Planning Database (REPD), a government project that's been tracking every renewable energy project from beginnning to success/cancellation openly. While the data is public, it's often difficult to navigate and tells an incomplete story. When a wind farm is cancelled after 4 years of planning, this represents hundreds of pages of planning documents, countless hours of work with ocmmunity objections, and interventions hidden in planning documents. The Clean Energy Graveyard is a open-source and free web visualisation tool that seeks to transform the REPD dataset into an interactive clean energy graveyard, using a mixture between GeoPandas for spatial analysis, Pandas for dataset cleaning, and the gemini API to begin intelligently surfacing critical news stories and council data. The key take-aways from this talk will be: - How to design and build AI models for the public good, to help transform byzantine systems into open data flows. - Practical patterns for using API's to enrich datasets at scale. - Techniques for handling messy government data with inconsistent schemas - How to create data visualisations that seek to tell human stories, not just difficult to understand statistics. - How to collaborate and work on open-source public good projects Prior Knowledge Expected: Basic familiarity with Python & coding techniques. No prior knowledge of energy policy, LLM api's, geospatial analysis or detailed webs of council websites required. The talk is aimed at intermediate python users & data scientists who want to explore how to use their skills for public good. Target Audience: - Data scientists interested in working with government/public sector data - Developers exploring practical applications of LLMs past typical analysis and summarisation - Anyone interested in climate/civic tech. Resources: Live Demo: nimby.bemben.co.uk Previous Blog Post about Specific Example: https://ends.substack.com/p/faw-side-community-wind-farm Previous Presentation Slides: github.com/dambem/nimbydex_slides Why this talk: This PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/R7UEBE/ Hardwick Hub Damian Bemben PUBLISH 9HLYEW@@pretalx.com

-9HLYEW

What We Expect from XAI - A scientist’s experience between models and users en

20260607T114500 20260607T123000 004500

What We Expect from XAI - A scientist’s experience between models and users

Explainable AI (XAI) emerged as a major research topic with the rise of deep learning and is now being adopted in domains where predictive models support high-impact decisions such as healthcare, finance, environmental monitoring, and public policy. As machine learning systems move into operational use, explanations are increasingly relied upon not only to understand models but also to justify and guide real decisions. Conceptually, an explanation provides information that allows a human observer to understand a system’s behaviour. In machine learning, the term refers to a broad family of approaches, ranging from interpretable models to post-hoc analysis methods. These techniques are often presented as a way to make complex models understandable and usable by human stakeholders. A concrete example comes from a project in which I applied machine learning to earth observation data for urban resilience, where explanations were expected to help local authorities plan maintenance and intervention actions to mitigate the impacts of natural hazards in cities. My role as the data scientist placed me between the domain specialists curating the data and the end users relying on the model’s outputs. In practice, this meant translating between different questions: domain specialists wanted to know whether the model’s behaviour made sense given their knowledge of the phenomenon, while end users wanted to know how the outputs could guide concrete actions and planning decisions. This experience motivates a closer look at what contemporary explainability methods are intended to provide to different users. Many widely used approaches—particularly post-hoc feature attribution techniques—are often interpreted as revealing the reasoning of a model. In practice, however, they typically provide local approximations or sensitivity analyses rather than faithful descriptions of the decision process. For example, feature attribution methods such as SHAP may be read as identifying causal factors, and saliency maps may appear meaningful even when weakly connected to the model’s actual reasoning. I unpack explainability in contemporary machine learning practice by asking what explanation methods actually guarantee—and what they do not. Drawing on my experience working between domain experts and end users, I reflect on how XAI functions in operational settings and on the expectations attached to explanations when they are used to support decisions. Intended audience The talk is aimed at machine learning practitioners, researchers, data scientists, and applied scientists who work with predictive models, as well as anyone who is interested in interpretation of model outputs in practice (including domain experts and decision-makers). No prior expertise in XAI is required. Type and tone of the talk The presentation is conceptual and experience-driven rather than mathematical. It will use concrete examples and intuitive explanations rather than formal derivations. The tone is reflective and discussion-oriented, focusing on practical interpretation rather than algorithmic detail. PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/9HLYEW/ Hardwick Hub Alessandra Costantino PUBLISH 3YH3WF@@pretalx.com

-3YH3WF

The Human-in-the-Loop is Tired en

20260607T144500 20260607T153000 004500

The Human-in-the-Loop is Tired

Outline: - The feeling: honest anecdotes from inside an AI tooling company navigating its own disruption (~8 min) - Three named patterns: reward function disruption, intensity trap, isolation drift (~7 min) - What's working: pre-mortems, judgment distillation, mode-switching discipline, team counter-practices (~10 min) - The reframe: responsive design as precedent, what still matters, scarce resources are valuable (~5 min) PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/3YH3WF/ Hardwick Hub Laura Summers PUBLISH TL88MJ@@pretalx.com

-TL88MJ

What Can LLMs Do with Messy Residential Electrification Data? en

20260607T153000 20260607T161500 004500

What Can LLMs Do with Messy Residential Electrification Data?

ResStock is an incredible tool for residential energy research, but quite tricky for anyone who isn’t deep in the weeds. It produces huge, domain-heavy datasets: thousands of simulated homes, dozens of variables, and hourly time series for a full year. Great if you’re writing a paper, overwhelming if you want to understand how electrification upgrades change bills or demand. This talk asks a practical question: What can large language models actually do with ResStock-style data, using a Python workflow? Can LLMs help normal people make sense of the benefits of electrification upgrades without pretending the model is “doing the science” for us? We ground everything in two real ResStock runs: (1) solar thermal water heater upgrades in Texas, and (2) HVAC upgrades across the Southeastern U.S. Both are large and messy, so we can’t just upload the parquet files. Instead, we: - Use Python (pandas/DuckDB) to sample and aggregate the data into representative slices that fit within context limits. - Build a clear schema description (“data card”) so the LLM understands variables, units, and constraints. - Ask the LLM to help where it shines: generating and refining pandas/DuckDB queries from natural-language questions, and explaining upgrade impacts in plain English. Andrew (UT Austin) brings the ResStock data, research questions, and domain constraints; Cedric (Red Hat) brings the open source + LLM integration side. Attendees will leave with a realistic pattern for using LLMs as helpers, not replacements, when working with large, messy scientific or policy datasets in Python. PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/TL88MJ/ Hardwick Hub Cedric Clyburn Andrew Igdal PUBLISH RTWLPY@@pretalx.com

-RTWLPY

No Ropes on a Boat: Coherent Forecasting en

20260607T161500 20260607T170000 004500

No Ropes on a Boat: Coherent Forecasting

PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/RTWLPY/ Hardwick Hub Thomas Ogden PUBLISH AV3A3W@@pretalx.com

-AV3A3W

Python Leadership and Engineering Excellence BoF en

20260607T101500 20260607T111500 010000

Python Leadership and Engineering Excellence BoF

PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/AV3A3W/ Board Room- Unconference Track Sam Joseph PUBLISH LLVVRQ@@pretalx.com

-LLVVRQ

How to write a PyData proposal en

20260607T114500 20260607T123000 004500

How to write a PyData proposal

PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/LLVVRQ/ Board Room- Unconference Track James Fielder PUBLISH BKW3KD@@pretalx.com

-BKW3KD

PyData Meetup Organizer Luncheon en

20260607T123000 20260607T133000 010000

PyData Meetup Organizer Luncheon

PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/BKW3KD/ Board Room- Unconference Track PUBLISH HNKFP8@@pretalx.com

-HNKFP8

Surviving (and Thriving) as a Data Professional in the Age of AI Agents en

20260607T144500 20260607T153000 004500

Surviving (and Thriving) as a Data Professional in the Age of AI Agents

PUBLIC CONFIRMED Talk https://pretalx.com/pydata-london-2026/talk/HNKFP8/ Board Room- Unconference Track Maksym Bilychenko