PyData Boston 2025

To see our schedule with full functionality, like timezone conversion and personal scheduling, please enable JavaScript and go here.
08:00
08:00
60min
Registration and Breakfast
Horace Mann
09:00
09:00
210min
CUDA Python Kernel Authoring
Katrina Riehl

We'll explore best practices for writing CUDA kernels using Python, empowering developers to harness the full potential of GPU acceleration. Gain a clear understanding of the structure and functionality of CUDA kernels, learning how to effectively implement them within Python applications.

Horace Mann
09:00
90min
From Notebook to Pipeline: Hands-On Data Engineering with Python
Gilberto Hernandez

In this hands-on tutorial, you'll go from a blank notebook to a fully orchestrated data pipeline built entirely in Python, all in under 90 minutes. You'll learn how to design and deploy end-to-end data pipelines using familiar notebook environments, using Python for your data loading, data transformations, and insights delivery.

We'll dive into the Ingestion-Tranformation-Delivery (ITD) framework for building data pipelines: ingest raw data from cloud object storage, transform the data using Python DataFrames, and deliver insights via a Streamlit application.

Basic familiarity with Python (and/or SQL) is helpful, but not required. By the end of the session, you'll understand practical data engineering patterns and leave with reusable code templates to help you build, orchestrate, and deploy data pipelines from notebook environments.

Thomas Paul
09:00
90min
Hands-On with LLM-Powered Recommenders: Hybrid Architectures for Next-Gen Personalization
Astha Puri, Sheetal Borar

Recommender systems power everything from e-commerce to media streaming, but most pipelines still rely on collaborative filtering or neural models that focus narrowly on user–item interactions. Large language models (LLMs), by contrast, excel at reasoning across unstructured text, contextual information, and explanations.
This tutorial bridges the two worlds. Participants will build a hybrid recommender system that uses structured embeddings for retrieval and integrates an LLM layer for personalization and natural-language explanations. We’ll also discuss practical engineering constraints: scaling, latency, caching, distillation/quantization, and fairness.
By the end, attendees will leave with a working hybrid recommender they can extend for their own data, along with a playbook for when and how to bring LLMs into recommender workflows responsibly.

Abigail Adams
10:30
10:30
30min
Break
Abigail Adams
10:30
30min
Break
Thomas Paul
11:00
11:00
90min
Build Your MCP server
Chuxin Liu, Yiwen Liu

This tutorial tackles a fundamental challenge in modern AI development: creating a standardized, reusable way for AI agents to interact with the outside world. We will explore the Model Context Protocol (MCP) designed to connect AI agents with external systems providing tools, data, and workflows.
This session provides a first-principles understanding of the protocol, by building an MCP server from scratch, attendees will learn the core mechanics of the protocol's data layer: lifecycle management, capability negotiation, and the implementation of server-side "primitives." The goal is to empower attendees to build their own MCP-compliant services, enabling their data and tools to be used by a growing ecosystem of AI applications.

Thomas Paul
11:00
90min
Create your Health Research Agent
Leonardo Ferreira

PubMed is a free search interface for biomedical literature, including citations and abstracts from many life science scientific journals. It is maintained by the National Library of Medicine at the NIH. Yet, most users only interact with it through simple keyword searches. In this hands-on tutorial, we will introduce PubMed as a data source for intelligent biomedical research assistants — and build a Health Research AI Agent using modern agentic AI frameworks such as LangChain, LangGraph, and Model Context Protocol (MCP) with minimum hardware requirements and no key tokens. To ensure compatibility, the agent will run in a Docker container which will host all necessary elements.

Participants will learn how to connect language models to structured biomedical knowledge, design context-aware queries, and containerize the entire system using Docker for maximum portability. By the end, attendees will have a working prototype that can read and reason over PubMed abstracts, summarize findings according to a semantic similarity engine, and assist with literature exploration — all running locally on modest hardware.

Expected Audience: Enthusiasts, researchers, and data scientists interested in AI agents, biomedical text mining, or practical LLM integration.
Prior Knowledge: Python and Docker familiarity; no biomedical background required.
Minimum Hardware Requirements: 8GB RAM (+16GB recommended), 30GB disk space, Docker pre-installed. MacOS, Windows, Linux.
Key Takeaway: How to build a lightweight, reproducible research agent that combines open biomedical data with modern agentic AI frameworks.

Abigail Adams
12:30
12:30
60min
Lunch
Horace Mann
12:30
60min
Lunch
Abigail Adams
12:30
60min
Lunch
Thomas Paul
13:30
13:30
90min
Building LLM Agents Made Simple
Eric Ma

Learn to build practical LLM agents using LlamaBot and Marimo notebooks. This hands-on tutorial teaches the most important lesson in agent development: start with workflows, not technology.

We'll build a complete back-office automation system through three agents: a receipt processor that extracts data from PDFs, an invoice writer that generates documents, and a coordinator that orchestrates both. This demonstrates the fundamental pattern for agent systems—map your boring workflows first, build focused agents for specific tasks, then compose them so agents can use other agents as tools.

By the end, you'll understand how to identify workflows worth automating, build agents with decision-making loops, compose agents into larger systems, and integrate them into your own work. You'll leave with working code and confidence to automate repetitive tasks.

Prerequisites: Intermediate Python, familiarity with APIs, basic LLM understanding. Participants should have Ollama and models installed beforehand (setup instructions provided).

Materials: GitHub repository with Marimo notebooks. Setup uses Pixi for dependency management.

Thomas Paul
13:30
90min
Learn to Unlock Document Intelligence with Open-Source AI
Mingxuan Zhao

Unlocking the full potential of AI starts with your data, but real-world documents come in countless formats and levels of complexity. This session will give you hands-on experience with Docling, an open-source Python library designed to convert complex documents into AI-ready formats. Learn how Docling simplifies document processing, enabling you to efficiently harness all your data for downstream AI and analytics applications.

Abigail Adams
13:30
90min
Understanding and using color for storytelling in data visualizations
Benjamin Lear, Morgan Vincent

The default color space for computers includes over 16 million colors—an embarrassment of riches that is also a potential quagmire to anyone considering how to best choose colors for visualizations. In this workshop, we will provide a practical framework for working with color. We will start by developing an understanding of color models and color theory, building from these to provide simple but powerful heuristics for color selection that will enable creators of data visualization to enhance the clarity, power, and storytelling of their visualizations. We will conclude with the introduction of tools for working with and selecting color, followed by hands-on activities using these tools. No prior knowledge is needed or assumed, and the only tools you will need is a computer with a web browser and an internet connection.

Horace Mann
15:00
15:00
30min
Break
Horace Mann
15:00
30min
Break
Abigail Adams
15:00
30min
Break
Thomas Paul
15:30
15:30
90min
"Save your API Keys for someone else" -- Using the HuggingFace and Ollama ecosystems to run good-enough LLMs on your laptop
Ian Stokes-Rees

In this 90 minute tutorial we'll get anyone with some basic Python and Command Line skills up and running with their own 100% laptop based set of LLMs, and explain some successful patterns for leveraging LLMs in a data analysis environment. We'll also highlight pit-falls waiting to catch you out, and encourage you that your pre-GenAI analytics skills are still relevant today and likely will be for the foreseeable future by demonstrating the limits of LLMs for data analysis tasks.

Abigail Adams
15:30
90min
Generative Programming with Mellea: from Agentic Soup to Robust Software
Nathan Fulton, Jake Lorocco

Agentic frameworks make it easy to build and deploy compelling demos. But building robust systems that use LLMs is difficult because of inherent environmental non-determinism. Each user is different, each request is different; the very flexibility that makes LLMs feel magical in-the-small also makes agents difficult to wrangle in-the-large.

Developers who have built large agentic-like systems know the pain. Exceptional cases multiply, prompt libraries grow, instructions are co-mingled with user input. After a few iterations, an elegant agent evolves into a big ball of mud.

This hands-on tutorial introduces participants to Mellea, an open-source Python library for writing structured generative programs. Mellea puts the developer back in control by providing the building blocks needed to circumscribe, control, and mediate essential non-determinism.

Horace Mann
15:30
90min
Going multi-modal: How to leverage the lastest multi-modal LLMs and deep learning models on real world applications
Isaac Godfried

Multimodal deep learning models continue improving rapidly, but creating real-world applications that effectively leverage multiple data types remains challenging. This hands-on tutorial covers model selection, embedding storage, fine-tuning, and production deployment through two practical examples: a historical manuscript search system and flood forecasting with satellite imagery and time series data.

Thomas Paul
08:00
08:00
60min
Registration & Breakfast
Horace Mann
09:00
09:00
15min
Opening Notes
Horace Mann
09:15
09:15
40min
Keynote by Isabel Zimmerman

Isabel is a Senior Software Engineer at Posit, PBC.

Horace Mann
10:00
10:00
40min
The Lifecycle of a Jupyter Environment: From Exploration to Production-Grade Pipelines
Dawn Wages

Most data science projects start with a simple notebook—a spark of curiosity, some exploration, and a handful of promising results. But what happens when that experiment needs to grow up and go into production?

This talk follows the story of a single machine learning exploration that matures into a full-fledged ETL pipeline. We’ll walk through the practical steps and real-world challenges that come up when moving from a Jupyter notebook to something robust enough for daily use.

We’ll cover how to:

  • Set clear objectives and document the process from the beginning
  • Break messy notebook logic into modular, reusable components
  • Choose the right tools (Papermill, nbconvert, shell scripts) based on your workflow—not just the hype
  • Track environments and dependencies to make sure your project runs tomorrow the way it did today
  • Handle data integrity, schema changes, and even evolving labels as your datasets shift over time

And as a bonus: bring your results to life with interactive visualizations using tools like PyScript, Voila, and Panel + HoloViz

Horace Mann
10:40
10:40
35min
Break
Horace Mann
11:15
11:15
40min
Using Traditional AI and LLMs to Automate Complex and Critical Documents in Healthcare
Aman Bhandari, Lily Xu

Informed Consent Forms (ICFs) are critical documents in clinical trials. They are the first, and often most crucial, touchpoint between a patient and a clinical trial study. Yet the process of developing them is laborious, high-stakes, and heavily regulated. Each form must be tailored to jurisdictional requirements and local ethics boards, reviewed by cross-functional teams, and written in plain language that patients can understand. Producing them at scale across countries and disease areas demands manual effort and creates major operational bottlenecks. We used a combination of traditional AI and large language models to autodraft the ICF across clinical trial types, across countries and across disease areas at scale. The build, test, iteration and deployment offers both technical and non technical lessons learned for generative AI applications for complex documents at scale and for meaningful impact.

Horace Mann
12:00
12:00
40min
Where Have All the Metrics Gone?
Dr. Rebecca Bilbro

How exactly does one validate the factuality of answers from a Retrieval-Augmented Generation (RAG) system? Or measure the impact of the new system prompt for your customer service agent? What do you do when stakeholders keep asking for "accuracy" metrics that you simply don't have? In this talk, we’ll learn how to define (and measure) what “good” looks like when traditional model metrics don’t apply.

Horace Mann
12:40
12:40
65min
Lunch
Horace Mann
13:45
13:45
40min
Keynote by Lisa Amini- What’s Next in AI for Data and Data Management?

Advances in large language models (LLMs) have propelled a recent flurry of AI tools for data management and operations. For example, AI-powered code assistants leverage LLMs to generate code for dataflow pipelines. RAG pipelines enable LLMs to ground responses with relevant information from external data sources. Data agents leverage LLMs to turn natural language questions into data-driven answers and actions. While challenges remain, these advances are opening exciting new opportunities for data scientists and engineers. In this talk, we will examine recent advances, along with some still incubating in research labs, with the goal of understanding where this is all heading, and present our perspective on what’s next for AI in data management and data operations.

Horace Mann
14:30
14:30
40min
The SAT math gap: gender difference or selection bias?
Allen Downey

Why do male test takers consistently score about 30 points higher than female test takers on the mathematics section of the SAT? Does this reflect an actual difference in math ability, or is it an artifact of selection bias—if young men with low math ability are less likely to take the test than young women with the same ability?

This talk presents a Bayesian model that estimates how much of the observed difference can be explained by selection effects. We’ll walk through a complete Bayesian workflow, including prior elicitation with PreliZ, model building in PyMC, and validation with ArviZ, showing how Bayesian methods disentangle latent traits from observed outcomes and separate the signal from the noise.

No prior knowledge of Bayesian statistics is required; attendees should be familiar with Python and common probability distributions.

Horace Mann
15:10
15:10
35min
Break
Horace Mann
15:45
15:45
40min
The Boringly Simple Loop Powering GenAI Apps
Sebastian Wallkötter

Do you feel lost in the jungle of GenAI frameworks and buzzwords? Here's a way out. Take any GenAI app, peel away the fluff, and look at its core. You'll find the same pattern: a boringly simple nested while loop. I will show you how this loop produces chat assistants, AI agents, and multi-agent systems. Then we'll cover how RAG, tool-calling, and memory are like lego bricks we add as needed. This gives you a first-principles based map. Use it to build GenAI apps from scratch; no frameworks needed.

Horace Mann
16:30
16:30
60min
Lightning Talks
Horace Mann
17:30
17:30
180min
Conference Social Event at Naco Taco

Join us for the PyData Boston Social!

Horace Mann
08:00
08:00
60min
Breakfast & Registration
Horace Mann
09:00
09:00
40min
Embracing Noise: How Data Corruption Can Make Models Smarter
Aayush Gauba

Machine learning often assumes clean, high-quality data. Yet the real world is noisy, incomplete, and messy, and models trained only on sanitized datasets become brittle. This talk explores the counterintuitive idea that deliberately corrupting data during training can make models more robust. By adding structured noise, masking inputs, or flipping labels, we can prevent overfitting, improve generalization, and build systems that survive real world conditions. Attendees will leave with a clear understanding of why “bad data” can sometimes lead to better models.

Thomas Paul
09:00
40min
Scaling Specialist Knowledge with AI: From Virtual Specialist to Revenue Acceleration Agent
Ishita Sequeira, fasal shah

Specialists are vital in enterprise sales, but their expertise is stretched thin. At Red Hat, our solution architects and product experts — highly skilled resources essential for winning complex deals — were often engaged on recurring questions from account teams. While this demonstrated their importance, it also highlighted an opportunity: how could we scale their knowledge more broadly, without relying on one-to-one interactions?

To address this, we developed an AI-powered agent designed to provide on-demand, sales-ready knowledge and accelerate deal progression. The first iteration focused on surfacing accurate responses from curated internal knowledge sources, including product documentation, knowledge base articles, Red Hat’s Content Center data, and shared repositories such as Google Drive. This reduced inbound questions and freed specialists to focus on high-value opportunities.

But knowledge alone wasn’t enough. Sellers also needed contextual intelligence and deal progression support. In the next phase, we extended the agent with a tool-calling framework (Model Context Protocol), enabling it to pull in live account insights from external revenue intelligence systems (e.g. People.ai). We further integrated quoting tools, booking sheets, and normalized account mapping tables — allowing the agent not only to answer “what” and “why,” but also to support sellers with pricing and quoting actions directly within their workflows.

The result is a multi-tool AI agent that accelerates revenue while improving consistency and trust. In this talk, we’ll share the architecture, design decisions, and evaluation metrics behind this evolution. Attendees will learn practical patterns for moving beyond static RAG bots into workflow-integrated agents that scale scarce expertise in any domain.

Deborah Sampson
09:00
40min
Who is Python for? EVERYONE (and why that matters)
Deb Nicholson

Python is controlled by the community and that its vast library of packages remain free for anyone to use and open for anyone to add to -- and that's no accident. Open communities that share and learn together are how we will build the kind of future we want to live in. If you've ever wondered who is in charge of Python, how it exists as a perennially free resource and why anyone would do that, this talk is for you!

Horace Mann
09:00
40min
Wrappers and Extenders: Companion Packages for Python Projects
Jules Walzer-Goldfeld

Many Python users want features that don’t fit within the boundaries of their favorite libraries. Instead of forking or waiting on a pull request, you can build your own wrapper or extender package. This talk introduces the principles of designing companion packages that enhance existing libraries without changing their core code, using gt-extras as a case study. You’ll learn how to structure, document, and distribute your own add-ons to extend the tools you rely on.

Abigail Adams
09:45
09:45
40min
Rethinking Feature Importance: Evaluating SHAP and TreeSHAP for Tree-Based Machine Learning Models
Yunxin Gao

Tree-based machine learning models such as XGBoost, LightGBM, and CatBoost are widely used, but understanding their predictions remains challenging. SHAP (SHapley Additive exPlanations) provides feature attributions based on Shapley values, yet its assumptions — feature independence, additivity, and consistency — are often violated in practice, potentially producing misleading explanations.
This talk critically examines SHAP’s limitations in tree-based models and introduces TreeSHAP, its specialized implementation for decision trees. Rather than presenting it as perfect, we evaluate its effectiveness, highlighting where it succeeds and where explanations remain limited. Attendees will gain a practical, critical understanding of SHAP and TreeSHAP, and strategies for interpreting tree-based models responsibly.

Target audience: Data scientists, ML engineers, and analysts familiar with tree-based models.
Background: Basic understanding of feature importance and model interpretability.

Abigail Adams
09:45
40min
The Column's the limit: interactive exploration of larger than memory data sets in a notebook with Polars and Buckaroo
Paddy Mullen

Notebooks struggle when data vastly exceeds RAM: pagination hacks, fragile sampling, and surprise OOMs. Buckaroo is a modern data table for notebooks built to quickly make sense of dataframes by providing search, summary stats, and scrolling with every view. This talk reviews how Buckaroo uses out‑of‑core design patterns, viewport streaming, lazy Polars pipelines, batched background stats, and a series cache to make interactive exploration fast and reliable on commodity laptops. We’ll walk through the lifecycle of opening a large Parquet/CSV file: detecting formats, avoiding full materialization, fetching only requested row/column ranges, and throttling UI updates for smoothness. We’ll show how column‑level hashing (via a lightweight Rust extension) enables stable, cache keys so warm loads render the first viewport and stats in under a second. CSV specifics and a practical CSV→Parquet streaming path round out the approach. The ideas are tool‑agnostic and reproducible with the open‑source PyData stack; Buckaroo serves as a concrete reference implementation. You’ll leave with guidelines and snippets to bring these patterns to your own workflows.

Horace Mann
09:45
40min
Three agents, three frameworks, one talk
Benjamin Batorsky

The popularity of agent-based workflows has led to a proliferation of frameworks, each representing different design philosophies. At the core of each framework is a similar set of components; memory, tools and “planning”. By understanding these components, it becomes easier to experiment with different frameworks. In this talk, we will talk about these components and then see how they are implemented in three frameworks: LangGraph, Pydantic.AI and LlamaBot. Our use case will be agent-based search, where our agent will respond to a user query based on a knowledge base. We’ll see how each handles this simple workflow and discuss advantages and disadvantages to these different approaches.

Deborah Sampson
09:45
40min
Uncertainty-Guided AI Red Teaming: Efficient Vulnerability Discovery in LLMs
Zvi Topol

AI red teaming is crucial for identifying security and safety vulnerabilities (e.g., jailbreaks, prompt injection, harmful content generation) of Large Language Models. However, manual and brute-force adversarial testing is resource-intensive and often inefficiently consumes time and compute resources exploring low-risk regions of the input space.
This talk introduces a practical, Python-based methodology for accelerating red teaming using model uncertainty quantification (UQ).

Thomas Paul
10:30
10:30
30min
Break
Horace Mann
11:00
11:00
40min
Accelerating Geospatial Analysis with GPUs
Jaya Venkatesh, Jacob Tomlinson, Naty Clementi

Geospatial analysis often relies on raster data, n‑dimensional arrays where each cell holds a spatial measurement. Many raster operations, such as computing indices, statistical analysis, and classification, are naturally parallelizable and ideal for GPU acceleration.

This talk demonstrates an end‑to‑end GPU‑accelerated semantic segmentation pipeline for classifying satellite imagery into multiple land cover types. Starting with cloud-hosted imagery, we will process data in chunks, compute features, train a machine learning model, and run large-scale predictions. This process is accelerated with the open-source RAPIDS ecosystem, including Xarray, cuML, and Dask, often requiring only minor changes to familiar data science workflows.

Attendees who work with raster data or other parallelizable, computationally intensive workflows will benefit most from this talk, which focuses on GPU acceleration techniques. While the talk draws from geospatial analysis, key geospatial concepts will be introduced for beginners. The methods demonstrated can be applied broadly across domains to accelerate large-scale data processing.

Abigail Adams
11:00
40min
Applying Foundational Models for Time Series Anomaly Detection
Abhishek Murthy

The time series machine learning community has begun adopting foundational models for forecasting and anomaly detection. These models, such as TimeGPT, MOMENT, Morai, and Chronos, offer zero-shot learning and promise to accelerate the development of AI use cases.

In this talk, we'll explore two popular foundational models, TimeGPT and MOMENT, for Time Series Anomaly Detection (TSAD). We'll specifically focus on the Novelty Detection flavor of TSAD, where we only have access to nominal (normal) data and the goal is to detect deviations from this norm.

TimeGPT and MOMENT take fundamentally different approaches to novelty detection.

• TimeGPT uses a forecasting-based method, tracking observed data against its forecasted confidence intervals. An anomaly is flagged when an observation falls sufficiently outside these intervals.

• MOMENT, an open-source model, uses a reconstruction-based approach. The model first encodes nominal data, then characterizes the reconstruction errors. During inference, it compares the test data's reconstruction error to these characterized values to identify anomalies.

We'll detail these approaches using the UCR anomaly detection dataset. The talk will highlight potential pitfalls when using these models and compare them with traditional TSAD algorithms.

This talk is geared toward data scientists interested in the nuances of applying foundational models for TSAD. No prior knowledge of time series anomaly detection or foundational models is required.

Deborah Sampson
11:00
40min
Building Production RAG Systems for Health Care Domains : Clinical Decision
Shikhar Patel, Nikunj Doshi

Building on but moving far beyond the single-specialty focus of HandRAG, this session examines how Retrieval-Augmented Generation can be engineered to support clinical reasoning across multiple high stakes surgical areas, including orthopedic, cardiovascular, neurosurgical, and plastic surgery domains. Using a corpus of more than 7,800 clinical publications and cross specialty validation studies, the talk highlights practical methods for structuring heterogeneous medical data, optimizing vector retrieval with up to 35% latency gains, and designing prompts that preserve terminology accuracy across diverse subspecialties. Attendees will also learn a three-tier evaluation framework that improved critical-error detection by 2.4×, as well as deployment strategy such as automated literature refresh pipelines and cost-efficient architectures that reduced inference spending by 60% that enable RAG systems to operate reliably in real production healthcare settings.

Thomas Paul
11:00
40min
fastplotlib: driving scientific discovery through data visualization
Caitlin Lewis, Kushal Kolar

Fast interactive visualization remains a considerable barrier in analysis pipelines for large neuronal datasets. Here, we present fastplotlib, a scientific plotting library featuring an expressive API for very fast visualization of scientific data. Fastplotlib is built upon pygfx, which utilizes the GPU via WGPU, allowing it to interface with modern graphics APIs such as Vulkan for fast rendering of objects. Fastplotlib is non-blocking, allowing for interactivity with data after plot generation. Ultimately, fastplotlib is a general-purpose scientific plotting library that is useful for fast and live visualization and analysis of complex datasets.

Horace Mann
11:45
11:45
40min
Fun With Python and Emoji: What Might Adding Pictures to Text Programming Languages Look Like?
Ted Conway

We all mix pictures, emojis and text freely in our communications. So, why not in our code? This session takes a whimsical look at what mixing emoji with Python and SQL might look like (spoiler alert: a lot like those "rebus" stories in Highlights Magazine for Kids!). We'll discuss the benefits of doing so, challenges that emoji present, and demo a rudimentary Python preprocessor that intercepts Python and SQL code containing emojis submitted from Jupyter notebooks and translates it back into text-only code using an emoji-to-text dictionary before passing it on to Python for execution. This session is intended for all levels of programmers.

Thomas Paul
11:45
40min
Modeling Aesthetic Identity: Building a Digital Twin from Instagram Likes & Visual Preferences
Pranav Kompally

People's visual and brand preferences encode a rich signal of identity that goes beyond clicks or text. In this talk, I present a pipeline for modeling a user’s “aesthetic identity” using Instagram likes, liked visuals, and followed brands. I show how to convert images and brand interactions into embedding spaces, condition a language model (via adapter / LoRA fine-tuning) to emulate that user’s responses, and evaluate the fidelity of that “digital twin.” You’ll leave with a reproducible architecture for persona modeling from multimodal data, along with insights into pitfalls of overfitting, privacy, and drift.

Abigail Adams
11:45
40min
One agent, one job, better AI
David Jones-Gilardi

Building accurate AI workflows can get complicated fast. By explicitly defining and modularizing agent tasks, my AI flows have become more precise, consistent, and efficient—delivering improved outcomes consistently. But can we prove it? In this talk, I'll walk you through an agentic app built with Langflow, and show how giving agents narrower, well-defined tasks leads directly to more accurate, consistent results. We'll put that theory to the test using evals with Pytest and LangSmith, iterating across different agent setups, analyzing data, and tightening up the app. By the end, we'll have a clear, repeatable workflow that lets us have confidence in how future agent or LLM changes will affect outcomes, before we ever hit deploy.

Horace Mann
11:45
40min
Patterns for Productive Agent-Assisted Programming
Eric Ma

You're already using AI coding assistants for more than autocomplete, but are you using them effectively? This talk presents battle-tested patterns for productive collaboration with AI coding agents on real Python projects.

You'll learn a structured four-step approach: plan your changes, write tests first, let the agent build, then document what you created—iterating through this cycle multiple times. We'll explore why fast test harnesses are critical for agent productivity, how to pipe shell tools and logging output back to your agent for better context, and how custom slash-commands can automate repetitive tasks like code cleanup and style enforcement.

This session is for intermediate Python programmers who are already working with AI coding agents and want proven patterns for getting more value from the collaboration.

Takeaway: A practical framework and concrete techniques for collaborating effectively with AI coding assistants on real projects.

Deborah Sampson
12:30
12:30
60min
Lunch
Horace Mann
13:30
13:30
40min
Is Your LLM Evaluation Missing the Point?
Daina Bouquin

Your LLM evaluation suite shows 93% accuracy. Then domain experts point out it's producing catastrophically wrong answers for real-world use cases. This talk explores the collaboration gap between AI engineers and domain experts that technical evaluation alone cannot bridge. Drawing from government, healthcare, and civic tech case studies, we'll examine why tools like PromptFoo, DeepEval, and RAGAS are necessary but insufficient and how structured collaboration with domain stakeholders reveals critical failures invisible to standard metrics. You'll leave with practical starting points for building cross-functional evaluation that catches problems before deployment.

Horace Mann
13:30
40min
Tracking Policy Evolution Through Clustering: A New Approach to Temporal Pattern Analysis in Multi-Dimensional Data
Sarthak Pattnaik

Analyzing how patterns evolve over time in multi-dimensional datasets is challenging—traditional time-series methods often struggle with interpretability when comparing multiple entities across different scales. This talk introduces a clustering-based framework that transforms continuous data into categorical trajectories, enabling intuitive visualization and comparison of temporal patterns.What & Why: The method combines quartile-based categorization with modified Hamming distance to create interpretable "trajectory fingerprints" for entities over time. This approach is particularly valuable for policy analysis, economic comparisons, and any domain requiring longitudinal pattern recognition.Who: Data scientists and analysts working with temporal datasets, policy researchers, and anyone interested in comparative analysis across entities with different scales or distributions.Type: Technical presentation with practical implementation examples using Python (pandas, scikit-learn, matplotlib). Moderate mathematical content balanced with intuitive visualizations.Takeaway: Attendees will learn a novel approach to temporal pattern analysis that bridges the gap between complex statistical methods and accessible, policy-relevant insights. You'll see practical implementations analyzing 60+ years of fiscal policy data across 8 countries, with code available for adaptation to your own datasets.

Thomas Paul
13:30
40min
Using Cursor (and other AI code gen tools) for data science
Mike Woodward

Cursor and AI code gen promise productivity gains for data scientists, but what's the reality? In this talk using examples, I'll show how AI code gen can work very well for data scientists, but also where it doesn't work.
I'll cover code gen for: statistics and advanced work, data manipulation (e.g. Pandas), data visualization (e.g. Bokeh, Streamlit), code quality (e.g. PEP8, Git) and documentation and testing. I'll show the biggest barriers to implementation and suggest some coping mechanisms.
At the end of the talk, I'll provide a guide for how to successfully implement AI code gen for data science teams. I'll also touch on implications for hiring.
All content will be provided via GitHub.

Deborah Sampson
13:30
40min
When Rivers Speak: Analyzing Massive Water Quality Datasets using USGS API and Remote SSH in Positron
Rodrigo Silva Ferreira

Rivers have long been storytellers of human history. From the Nile to the Yangtze, they have shaped trade, migration, settlement, and the rise of civilizations. They reveal the traces of human ambition... and the costs of it. Today, from the Charles to the Golden Gate, US rivers continue to tell stories, especially through data.

Over the past decades, extensive water quality monitoring efforts have generated vast public datasets: millions of measurements of pH, dissolved oxygen, temperature, and conductivity collected across the country. These records are more than environmental snapshots; they are archives of political priorities, regulatory choices, and ecological disruptions. Ultimately, they are evidence of how societies interact with their environments, often unevenly.

In this talk, I’ll explore how Python and modern data workflows can help us "listen" to these stories at scale. Using the United States Geological Survey (USGS) Water Data APIs and Remote SSH in Positron, I’ll process terabytes of sensor data spanning several years and regions. I’ll demonstrate that, while Parquet and DuckDB enable scalable exploration of historical records, using Remote SSH is paramount in order to enable large-scale data analysis. By doing so, I hope to answer some analytical questions that can surface patterns linked to industrial growth, regulatory shifts, and climate change.

By treating rivers as both ecological systems and social mirrors, we can begin to see how environmental data encodes histories of inequality, resilience, and transformation.

Whether your interest lies in data engineering, environmental analytics, or the human dimensions of climate and infrastructure, this talk will explore topics at the intersection of environmental science, will offer both technical methods and sociological lenses to understand the stories rivers continue to tell.

Abigail Adams
14:15
14:15
40min
Data engineering with Python the right way: introducing the composable, Python-native data stack
Deepyaman Datta

For the past decade, SQL has reigned king of the data transformation world, and tools like dbt have formed a cornerstone of the modern data stack. Until recently, Python-first alternatives couldn't compete with the scale and performance of modern SQL. Now Ibis can provide the same benefits of SQL execution with a flexible Python dataframe API.

In this talk, you will learn how Ibis supercharges open-source libraries like Kedro, Pandera, and the Boring Semantic Layer and how you can combine these technologies (and a few more) to build and orchestrate scalable data engineering pipelines without sacrificing the comfort (and other advantages) of Python.

Deborah Sampson
14:15
40min
Evaluating AI Agents in production with Python
Susan Shu Chang

This talk covers methods of evaluating AI Agents, with an example of how the speakers built a Python-based evaluation framework for a user-facing AI Agent system which has been in production for over a year. We share tools and Python frameworks used (as well as tradeoffs and alternatives), and discuss methods such as LLM-as-Judge, rules-based evaluations, ML metrics used, as well as selection tradeoffs.

Thomas Paul
14:15
40min
Processing large JSON files without running out of memory
Itamar Turner-Trauring

If you need to process a large JSON file in Python, it’s very easy to run out of memory while loading the data, leading to a super-slow run time or out-of-memory crashes. In this talk you'll learn:

  • How to measure memory usage.
  • Why loading JSON takes a lot of memory.
  • Four different ways to reduce memory usage when loading large JSON files.
Abigail Adams
14:15
40min
Unlocking Smarter Typeahead Search: A Hybrid Framework for Large-Scale Query Suggestions
Brandon (Anbang) Wu

We present a hybrid framework for typeahead search that combines prefix matching with semantic retrieval using open-source tools. Applied at Quizlet, it indexed 200 million terms and improved coverage, boosted relevance, and lifted suggestion engagement by up to 37 percent—offering a reusable approach for building scalable, robust query suggestions.

Horace Mann
15:00
15:00
30min
Break
Horace Mann
15:30
15:30
40min
MMM Open- Source Showdown: A Practitioner's Benchmark of PyMC-Marketing vs. Google Meridian
Luca

Your Marketing Mix Model is only as good as the library you build it on. But how do you choose between PyMC-Marketing and Google Meridian when the feature lists look so similar? You need hard evidence, not marketing claims. Which library is actually faster on multi-geo data? Do their different statistical approaches (splines vs. Fourier series) lead to different budget decisions?

This talk delivers that evidence. We present a rigorous, open-source benchmark that stress-tests both libraries on the metrics that matter in production. Using a synthetic dataset that replicates real-world ad spend patterns, we measure:

  • Speed: Effective sample size per second (ESS/s) across different data scales.
  • Accuracy: How well each model recovers both sales figures and true channel contributions.
  • Reliability: A deep dive into convergence diagnostics and residual analysis.
  • Resources: The real memory cost of fitting these models.

You'll walk away from this session with a clear, data-driven verdict, ready to choose the right tool and defend that choice to your team.

Abigail Adams
15:30
40min
No Cloud? No Problem. Local RAG with Embedding Gemma
Sanjit Paliwal

Running Retrieval-Augmented Generation (RAG) pipelines often feels tied to expensive cloud APIs or large GPU clusters—but it doesn’t have to be. This session explores how Embedding Gemma, Google’s lightweight open embedding model, enables powerful RAG and text classification workflows entirely on a local machine. Using the Sentence Transformers framework with Hugging Face, high-quality embeddings can be generated efficiently for retrieval and classification tasks. Real-world examples involving call transcripts and agent remark classification illustrate how robust results can be achieved without the cloud—or the budget.

Thomas Paul
15:30
40min
Surviving the Agentic Hype with Small Language Models
Serhii Sokolenko

The AI landscape is abuzz with talk of "agentic intelligence" and "autonomous reasoning." But beneath the hype, a quieter revolution is underway: Small Language Models (SLMs) are starting to perform the core reasoning and orchestration tasks once thought to require massive LLMs. In this talk, we’ll demystify the current state of “AI agents,” show how compact models like Phi-2, xLAM 8B, and Nemotron-H 9B can plan, reason, and call tools effectively, and demonstrate how you can deploy them on consumer-grade hardware. Using Python and lightweight frameworks such as LangChain, we’ll show how anyone can quickly build and experiment with their own local agentic systems. Attendees will leave with a grounded understanding of agent architectures, SLM capabilities, and a roadmap for running useful agents without the GPU farm.

Horace Mann
16:15
16:15
40min
How AI Is Transforming Data Careers — A Panel Discussion
Chuxin Liu, Gayathri Ramanathan

AI is transforming data careers. Roles once centered on modeling and feature engineering are evolving into positions that involve building AI products, crafting prompts, and managing workflows shaped by automation and augmentation. In this panel discussion, ambassadors from Women in Data Science (WiDS) share how they have adapted through this shift—turning personal experiments into company practices, navigating uncertainty, and redefining their professional identities. They’ll also discuss how to future-proof your career by integrating AI into your daily work and career growth strategy. Attendees will leave with a clearer view of how AI is reshaping data careers and practical ideas for how to evolve their own skills, direction, and confidence in an era where AI is not replacing, but redefining, human expertise.

Abigail Adams
16:15
40min
LLMOps in Practice: Building Secure, Governed Pipelines for Large Language Models
Siddharth Shankar

As organizations move from prototyping LLMs to deploying them in production, the biggest challenges are no longer about model accuracy - they’re about trust, security, and control. How do we monitor model behavior, prevent prompt injection, track drift, and enforce governance across environments?

This talk presents a real-world view of how to design secure and governed LLM pipelines, grounded in open-source tooling and reproducible architectures. We’ll discuss how multi-environment setups (sandbox, runner, production) can isolate experimentation from deployment, how to detect drift and hallucination using observability metrics, and how to safeguard against prompt injection, data leakage, and bias propagation.

Attendees will gain insight into how tools like MLflow, Ray, and TensorFlow Data Validation can be combined for ** version tracking, monitoring, and auditability**, without turning your workflow into a black box. By the end of the session, you’ll walk away with a practical roadmap on what makes an LLMOps stack resilient: reproducibility by design, continuous evaluation, and responsible governance across the LLM lifecycle.

Thomas Paul
16:15
40min
Measuring Media Impact: Practical Geo-Lift Incrementality Testing
Bryce Casavant

Measuring the true incremental impact of media spend remains one of the toughest problems in marketing, especially in an era where privacy limits user-level tracking. This talk examines how geo-lift incrementality testing can be utilized to accurately measure the true causal impact of marketing and media channels. Attendees will learn what design decisions matter, how to analyze results, and common pitfalls to avoid when running marketing incrementality tests. The goal is to bring causal inference theory into real-world measurement, enabling practitioners to make informed, data-driven decisions with confidence.

Horace Mann
16:15
40min
The JupyterLab Extension Ecosystem: Trends & Signals from PyPI and GitHub
Konstantin Taletskiy

What does the JupyterLab extension ecosystem actually look like in 2025? While extensions drive much of JupyterLab's practical value, their overall landscape remains largely unexplored. This talk analyzes public PyPI (via BigQuery) and GitHub data to quantify growth, momentum, and health: monthly downloads by category, release recency, star-download relationships, and the rise of AI-focused extensions. I will present my approach for building this analysis pipeline and offer lessons learned. Finally, I will demonstrate of an open, read-only web catalog built on this data set.

Deborah Sampson