PyData London 2026

Abby Tse is Chair of PyData NYC, where she has led a community of over 8,000 data professionals since 2022. She organizes the annual PyData NYC/Boston Conference, a three-day event that brings together 600+ attendees from around the world to explore the latest in data science, machine learning, and AI. Abby is currently an MBA student at Columbia Business School, where she focuses on entrepreneurship and innovation. Previously, she worked at IBM, where she built enterprise AI systems, including large-scale generative AI applications to improve knowledge access.

Learn to Unlock Document Intelligence with Open-Source AI

Adam Hill

Adam is a Staff Data Scientist at ComplyAdvantage, where they are tackling financial crime with advanced analytics, large-scale systems, and the latest in generative and agentic AI.

Before that, he spent eight years in the smart cities space at HAL24K, helping governments and infrastructure providers make better decisions with their data. Along the way, he built and led a team of ten data scientists and helped launch four spin-out ventures.

A recovering astrophysicist, Adam spent a decade analysing data from space telescopes in search of new cosmic phenomena. He’s since redirected that curiosity toward Earth-based problems.

Adam is an active member of the PyData community, the founder of PyData Southampton, and a long-time volunteer with DataKind UK, supporting charities and NGOs with pro-bono data science.

From Chat-with-PDF to Quiz-Master: Live-Grading RAG with LLM-as-Judge in Python

Alessandra Costantino

I am a researcher with a strong track record of transferring core scientific computing skills across very different technical and scientific backgrounds ranging from radiation detection and medical physics to Earth observation. I have worked across disciplines in academic and industry settings and am particularly drawn to complex problems that require continuous learning and close collaboration across different domains.

What We Expect from XAI - A scientist’s experience between models and users

Andrew Igdal

I study energy policy at the University of Texas at Austin. My work focuses on residential electrification and improving the efficacy of beneficial electrification upgrades.

What Can LLMs Do with Messy Residential Electrification Data?

Arghyadeep Sarkar

Arghyadeep Sarkar is a Senior Data Scientist at Red Hat with ~8 years of experience in data science and artificial intelligence. His career has evolved from traditional machine learning to architecting large-scale Generative AI and LLM-based production systems.

He built strong foundations in statistical modeling, ML pipelines, and applied AI, later specializing in deep learning, NLP, transformers, and Generative AI. He has designed and deployed LLM agents, RAG-based systems, and enterprise conversational platforms, covering the full lifecycle from training and fine-tuning to scalable deployment.

Current Focus

Building reliable agentic AI systems
Improving retrieval grounding and RAG quality
Deploying LLMs and SLMs in production
Delivering scalable, cost-efficient enterprise AI solutions

He brings a system-first engineering mindset, translating cutting-edge AI research into robust real-world products.

The Silent Crash: Why Your RAG Evaluation Metrics Are Lying to You

Arthur Andres

A seasoned software engineer, working in both batch and real time, data intensive, python application.

Kafka Streaming, the Pythonic Way

Austen Wallis

Austen is a computational astrophysicist specialising in scientific machine learning. Currently a postgraduate researcher at the University of Southampton, he will soon be joining the University of Cambridge as an Exoplanetary Data Science Research Associate. His work primarily focuses on accelerating complex physics simulations using "fast-forward" emulator techniques, which he has applied across diverse domains ranging from fusion-energy plasma control at the UK Atomic Energy Authority to extreme weather forecasting at IBM Research.

Fast-Forward(ing) Models: Accelerating High-Dimensional Inference with AI Emulators

Benjamin Vincent

Ben Vincent is Director of InferenceWorks Ltd and a Principal Data Scientist at PyMC Labs, where he has been building Bayesian solutions for real-world business problems since 2021. He created CausalPy, an open-source Python library for causal inference in quasi-experimental settings. He holds a PhD in Neuroscience from the University of Sussex (UK) and previously held a university faculty position for 15 years.

Did Your Rollout Actually Work? Measuring Phased Launches with Staggered DiD in Python

Carol Chen

Carol Chen is a Community Architect at Red Hat, having led several upstream communities including InstructLab, Ansible and ManageIQ. She has been actively involved in open source communities while working for Jolla and Nokia previously. In addition, she also has experiences in software development/integration in her 12 years in the mobile industry. Carol has spoken at events around the world, including AI_Dev in Paris and OpenSearchCon in Shanghai. On a personal note, Carol plays the Timpani in an orchestra in Tampere, Finland, where she now calls home.

Learn to Unlock Document Intelligence with Open-Source AI

Cedric Clyburn

Cedric Clyburn (@cedricclyburn), Senior Developer Advocate at Red Hat, is an enthusiastic software developer with a background in Kubernetes, DevOps, and container tools. Focused on open-source software, he both contributes (e.g., Podman, vLLM) and enjoys speaking, with prior experience at Devoxx, WeAreDevelopers, The Linux Foundation, and more. Cedric also spends (too much) time creating video and written content helping developers learn new topics in emerging technologies, with over 2M+ views online. He’s based in New York City and is an organizer of the local Kubernetes Community Day.

What Can LLMs Do with Messy Residential Electrification Data?

Cheuk Ting Ho

After having a career as a Data Scientist and Developer Advocate, Cheuk dedicated her work to the open-source community. Currently, she is working as a developer advocate for JetBrains. She has co-founded Humble Data, a beginner Python workshop that has been happening around the world. Cheuk also started and hosted a Python podcast, PyPodCats, which highlights the achievements of underrepresented members in the community. She has served the EuroPython Society board for two years and is now a fellow and director of the Python Software Foundation.

Unconference- Feminist AI
Do you know how well your model is doing? Evaluate your LLMs

Chris Fonnesbeck

Chris is a Principal Quantitative Analyst at PyMC Labs and an Adjoint Associate Professor at the Vanderbilt University Medical Center, with 20 years of experience as a data scientist in academia, industry, and government, including 7 years in pro baseball research with the Philadelphia Phillies, New York Yankees, and Milwaukee Brewers.
He is interested in computational statistics, machine learning, Bayesian methods, and applied decision analysis. He hails from Vancouver, Canada and received his Ph.D. from the University of Georgia.

PyMC Code Sprint
Flexible Statistical Modeling with Bayesian Additive Regression Trees

Daina Bouquin

Daina Bouquin is Senior Developer Relations Engineer at Anaconda with over 12 years of experience spanning astrophysics, library science, and software development. She previously served as Head Librarian at the Harvard-Smithsonian Center for Astrophysics, where she led projects on software citation, preservation, and recovering the contributions of early women in computing. This work gave her deep familiarity with historical computing collections in addition to experience supporting scientists doing computational research. At Anaconda, she creates educational content and strengthens connections between engineering teams and the broader open source community. She believes documentation isn't just about clarity, it's about building communities where people want to participate.

Build your castle, dig your moat: AI sovereignty, provenance and compliance

Damian Bemben

Damian Bemben is a prominent speaker, creative technologist & developer within the Hampshire & Solent tech space. Damian is currently a Senior Software Engineer at Ada Mode - developing groundbreaking "human-in-the-loop" AI applications within highly regulated industrial sectors like civil nuclear.

He holds a First Class Masters in Computer Science from the University of Sheffield. His academic work included a dissertation on robotic locomotion using evolutionary principles and using AI to monitor air pollution & local renewable energy projects.

Damian is a dedicated educator and community organiser within the local space, who has excelled at translating complex research into accessible insights. He is an active organiser of events within the Hampshire creative & tech space.

The Clean Energy Graveyard: Using Python & Gemini to Map the UK's Cancelled Renewable's

Daniele Raimondi

Daniele is a Data Scientist with expertise in statistics, data science and AI, passionate about exploring the intersection of AI and financial markets.
Since 2023, he is working at MDPI, one of the largest open-access publishers.
A former national 400m sprinter.

Building a Scientific Taxonomy at Scale with Graph Clustering, Embeddings, and LLMs

Dmitry Petrov

Dmitry Petrov is the creator of open-source tool DVC (Data Version Control), holds a PhD in Computer Science, previously worked as a Data Scientist at Microsoft, and is now the founder of DataChain.ai, a Python-first data platform for Physical AI.

From SQL to Python: Building Data Context for Agents and People

Dragos Crintea

After 12+ years architecting and engineering cloud solutions for small and large enterprises, I recognized that AI represents not a replacement for expertise, but its natural evolution. Join me in shaping the semantic layer for AI-ready data.

Making Databases LLM-Ready: Building Production Semantic Layers with Semantido

Fei Phoon

Data Engineer in AI Platform at The Economist, PyData Cornwall co-founder, and committed diversity and inclusion ally.

Observing Agentic AI in Production: MCP Server Tracing with OpenTelemetry and Animal Crossing

Feichi Lu

Feichi Lu is a Data Scientist at MDPI in Basel, where she works on building data-driven analytics for scientific publishing. She holds a Master’s degree in Data Science from ETH Zürich. Her experience spans large-scale data analysis, semantic modeling, and applied AI.

Building a Scientific Taxonomy at Scale with Graph Clustering, Embeddings, and LLMs

Fred O'Loughlin

Lead MLOps Engineer at Climate Policy Radar

Making tech boring to keep data exciting

Gabriel Lipnik

Gabriel Lipnik is an AI engineer and applied mathematician at Anexia Digital Engineering, working on production-grade machine learning, artificial intelligence, and optimisation systems. His work focuses on bridging the gap between advanced models and real-world deployment.

Your ML Pipeline Meets the EU AI Act

Gergely Daroczi

Gergely Daroczi, PhD, has been a passionate open-source package developer for two decades. With over 15 years in the fintech, adtech, healthtech, and other SaaS industries, he has expertise in data science and engineering, as well as cloud infrastructure, in both California and Hungary, with a focus on building scalable data platforms. Gergely maintains a dozen open-source R and Python projects and organizes a tech meetup with 1,800 members in Hungary – along with other open-source and data conferences.

SELECT instance FROM cloud WHERE workload = ? ORDER BY cost_efficiency

Hitendri Bomble

Hitendri Bomble is a Senior Data Scientist at Red Hat, where she builds Generative AI solutions to solve complex business problems. She specializes in working with Large Language Models (LLMs) to create tools that make everyday work more efficient. Deeply rooted in the open-source community, Hitendri focuses on using the latest AI innovations to automate tasks and bring fresh ideas to her team.

The Silent Crash: Why Your RAG Evaluation Metrics Are Lying to You

Ian Thomas

Ian is a Scientific Software Developer at QuantStack. He has been an Open Source contributor for over 15 years, is a core maintainer of the libraries Matplotlib and ContourPy and a significant contributor to Bokeh and Datashader. Recently Ian has been involved throughout the Jupyter stack, from kernels and widgets through to JupyterLite.

JupyterLite: run all your code in a web browser using WebAssembly

Ines Montani

Ines Montani is a developer specializing in tools for AI and NLP technology. She’s the co-founder and CEO of Explosion and a core developer of spaCy, a popular open-source library for Natural Language Processing in Python, and Prodigy, a modern annotation tool for creating training data for machine learning models.

Vibe NLP for Applied NLP

Ivo Dilov

Ivo Dilov has 5 years of industry experience and 10 years of competitive programming, with a focus on high-performance software. For the past 2 and a half years, he has been a senior engineer on ArcticDB, the open-source DataFrame database backed by Man Group and Bloomberg, working in C++ and Python.

Bridging Pandas and Polars: The Hidden Costs of Dataframe Interoperability

Jacob Tomlinson

Jacob Tomlinson is a senior Python software engineer at NVIDIA with a focus on deployment tooling for distributed systems. His work involves maintaining open source projects including RAPIDS and Dask. RAPIDS is a suite of GPU accelerated open source Python tools which mimic APIs from the PyData stack including those of Numpy, Pandas and SciKit-Learn. Dask provides advanced parallelism for analytics with out-of-core computation, lazy evaluation and distributed execution of the PyData stack. He also tinkers with the open source Kubernetes Python framework kr8s in his spare time. Jacob volunteers with the local tech community group Tech Exeter and lives in Exeter, UK.

Documenting your open source projects for machines

James Fielder

How to write a PyData proposal

Jeremiah Lowin

Jeremiah Lowin is the founder and CEO of Prefect and the author of FastMCP. Prefect develops automation tools used across the data and AI ecosystem, and FastMCP has become the standard framework for working with the Model Context Protocol. Before founding Prefect, he spent over a decade leading risk and data initiatives at major investment firms and was a founding member of the Apache Airflow PMC. He lives in Washington, DC.

Keynote- Jeremiah Lowin- Build Reasonable Software

Julie Huang

Julie Huang is an AI Data Scientist at Tesco, where she works on production AI agent systems for retail, with a focus on evaluation, reliability, and agentic user experiences. She has contributed to Tesco’s Meal Planner Agent, working across LLM evaluation, scalable automated red-teaming and guardrails to help make AI agents safer and more measurable in real-world settings.

Her broader work spans applied machine learning, recommendation systems and agent evaluation. Beyond Tesco, Julie contributes to open-source research on terminal-agent reinforcement learning, where she has worked on scalable verifiable environments generation and RL post-training.

Tesco AI & Data Science: From Recipes to Reality

Kamlesh Shah

I am a senior engineering lead/executive director at Morgan Stanley.

I design and build large-scale, enterprise-ready, high-performance financial systems used in production environments where correctness, resilience, and speed matter. My work spans system design, hands-on engineering, and long-term platform evolution in regulated domains.

I place strong emphasis on clean, maintainable architecture—clear domain boundaries, explicit data contracts, and model-driven design. I optimise for systems that remain understandable and adaptable as complexity, scale, and regulatory demands increase.

A significant part of my work focuses on data analytics, complex data modelling, and financial mathematics—including forecasting, liquidity, risk, and regulatory calculations. I enjoy translating mathematically rich problem spaces and large datasets into precise, explainable, and production-grade implementations.

I work with a prototype-to-production mindset, leveraging modern cloud platforms, data tooling, and AI techniques to move quickly while preserving architectural discipline, observability, and operational robustness.

www.linkedin.com/in/kamlesh-shah

Columnar Thinking - Designing for high-performance execution with Arrow and Polars

Kareem Hussein

Kareem is an AI Engineer at Tesco working on the upcoming customer shopping assistant. Particular focuses include personalisation, guardrails, and the custom evaluation framework in use for the system at scale. Before this, he was a Machine Learning Engineer in the Personalisation team at Tesco, developing and scaling core personalisation capabilities.

Kareem is also the co-founder and CTO of Carbon Glance, a B2B SaaS startup in the climate compliance & analytics space - solving CBAM readiness for importers, manufacturers and service providers.

Kareem holds an MSc in Computer Science from the University of Edinburgh and a Computer Science BSc from the University of Southampton, where his research interests spanned deep learning over graphs and natural language processing.

Tesco AI & Data Science: From Recipes to Reality

Katrina Riehl

Dr. Katrina Riehl is a Principal Technical Product Manager at NVIDIA leading the CUDA Education program. For over two decades, Katrina has worked extensively in the fields of scientific computing, machine learning, data science, and visualization. Most notably, she has helped lead data initiatives at the University of Texas Austin Applied Research Laboratory, Anaconda, Apple, Expedia Group, Cloudflare, and Snowflake. She is an active volunteer in the Python open-source scientific software community and currently serves on the Advisory Council for NumFOCUS.

GPU Algorithm Authoring with CUDA Tile

Kavit Tolia

The speaker spent over 12 years working in quantitative roles in investment management before returning to academia to study Artificial Intelligence. They are currently completing a Master’s degree in AI and ML in Science, and are particularly interested in how modern machine learning systems behave in practice, especially where modelling assumptions quietly break down.

Do Multilingual Embeddings Really Share a Semantic Space? Practical Lessons Across Scripts and Languages

Ken Obata

Ken Obata is a senior data engineer currently working at Lyft, with over seven years of experience building large-scale data infrastructure at KPMG, Amazon, and Lyft. His current research focuses on scalable text deduplication for LLM training data, where he developed a partition-aware MinHash LSH system that processes hundreds of millions of documents on commodity Spark clusters.

Beyond Spark MLlib: Deduplicating Common Crawl at Scale

Kerry Parker

Data Engineer at Climate Policy Radar

Making tech boring to keep data exciting

Laura Summers

Laura is a very technical designer™️, working at Pydantic as Lead Design Engineer. Her side projects include Sweet Summer Child Score (summerchild.dev) and Ethics Litmus Tests (ethical-litmus.site). Laura is passionate about feminism, digital rights and designing for privacy. She speaks, writes and runs workshops at the intersection of design and technology.

The Human-in-the-Loop is Tired

Lena Shakurova

Lena Shakurova is the founder of ParsLabs (https://parslabs.org), a Conversational AI agency, and Chatbotly (https://chatbotly.co), a no-code platform for building AI assistants trained on custom data.

At ParsLabs, she leads a team blending AI, user research and conversation science to design and develop high quality AI Conversations that sound human. She has background in NLP and Artificial intelligence and 8+ years of experience and 110+ successful projects building production-ready chatbots and voice assistants.

Lena focuses on ethical, user-first AI, leveraging her expertise in Linguistics & AI to create responsible, high-quality AI solutions. She shares insights on AI innovation and human-centered design through her blog (https://shakurova.io/blog) and LinkedIn (https://www.linkedin.com/in/lena-shakurova/).

Evaluating multi-turn conversations: A practical guide to AI Agent evals

Lipika Ramaswamy

From Synthetic Examples to Production Signals: Multimodal Training Data Pipelines with Privacy-Safe Feedback

Luca Baggi

AI Engineer @xtream

Reading the Mind of an LLM

Maksym Bilychenko

I am a Data Science Leader dedicated to turning complex data into commercial momentum. Over the past 8+ years, I’ve partnered with executive teams to shape data strategy and integrate cross-functional data at scale for multi-million-dollar tech product companies. I bridge the gap between technical execution and business value, ensuring data isn’t just collected, but leveraged for sustainable growth.

Beyond building the high-performing analytics teams that power strategic decisions, I am deeply invested in growing the next generation of talent, actively mentoring professionals looking to build meaningful careers in data science.

Most recently, my focus has been exploring the intersection of AI and data operations, specifically how GenAI are reshaping data teams, redefining technical roles and shifting what it means to be a modern data professional.

Surviving (and Thriving) as a Data Professional in the Age of AI Agents

Marco Gorelli

Author of Narwhals, heavy contributor to pandas, Polars, and NumPy (stubs). Marco works as Senior Software Engineer at Quansight Labs. His background is in Mathematics. Outside of work he can most likely be spotted at Celtic Folk Sessions.

The Polars vs SQL differences nobody is talking about

Margaritha Groenendijk

I am Chief Architect at Engineering is Easy, working in aerospace and defence consulting. I hold a PhD in environmental and geospatial modelling, and I have spent over 20 years across climate research, data science, AI, and developer advocacy.

I also run Living is Easy, where I work as a certified mindset consultant focused on how habits, self-image, and mental programming drive results. That work has given me a deep understanding of how paradigms shape behaviour for both individuals and teams.

The Rules Nobody Writes Down: Decoding and Shifting Team Culture From Any Seat

Mark Cottam

Data Engineer at Climate Policy Radar

Making tech boring to keep data exciting

Martin O'Reilly

Martin O'Reilly is Director of Research Engineering at the Alan Turing Institute, where he leads a team of software, data and infrastructure engineers who work across the Turing's research portfolio to bridge the gap between research and practice - from AI for weather prediction to AI-assisted air-traffic control. Prior to Turing, Martin spent several years developing software, data standards and engineering practices in the education sector before going back to school to build robots and try and understand the brain by modelling it.

Keynote- Martin O'Reilly- LLMs and AI agents demystified

Matt Crooks

Matt Crooks is a Principal Data Scientist at the BBC, where he works in the audiences data science team applying statistical and machine learning models to understand and improve marketing effectiveness and audience engagement. His current work focuses on using data and AI to automate the production of personalised creative assets at scale. Previous work has involved building an ML-powered adaptive learning quiz for BBC Bitesize during Covid. He has also had a previous role leading and developing the experimentation tooling and best practices at Typeform. Matt holds a PhD in Mathematics from the University of Manchester and began his career in academic research into weather and climate.

AI-Assisted Creative for Automated Marketing using Python

Michel Semaan

Michel Semaan is the Analytics Lead for Transaction Banking at Allica Bank, previously a Senior Analytics Engineer at Amazon. Beyond his day job, Michel teaches as a DataCamp instructor with two published SQL courses and as a Python and data science mentor with Great Learning and Springboard.

Querying the queries: SQL Metaprogramming in Python

Mingxuan Zhao

Ming Zhao is an open source developer and Developer Advocate at IBM Research, where he helps IBM leverage open technologies while building impactful tools and growing vibrant open-source communities. He’s passionate about making open tech accessible to all and ensuring developers have the tools they need to succeed in the rapidly developing AI space. Ming now leads community efforts around Docling, IBM’s fastest-growing open source project, recently welcomed into the LF AI & Data Foundation.

Learn to Unlock Document Intelligence with Open-Source AI

Nabin Mulepati

Research Scientist/Engineer at NVIDIA focused on Multimodal Synthetic Data Generation

From Synthetic Examples to Production Signals: Multimodal Training Data Pipelines with Privacy-Safe Feedback

Nathaniel Forde

I'm a Data-Scientist working in HR Tech and People Analytics with Personio. I'm a big advocate of open source software and regularly contribute to PyMC, PyMC-Marketing and CausalPy. I've worked across a variety of industries ranging from e-commerce, insurance and gambling and in each, i've tried to find ways to apply statistical best practice to business problems.

I'm always open to chat about scientific python, philosophy of science and Bayesian reasoning and decision analysis.

Hazards on the Causal Path: Bayesian Time-Varying Survival Analysis with PyMC

Neal Richardson

Neal Richardson is VP of Engineering at Posit and a member of the Apache Software Foundation. He is a maintainer of Apache Arrow, along with many other open-source projects. He holds a Ph.D. in Political Science from the University of California, Berkeley.

MCP, or not MCP

Nick Radcliffe

Nick Radcliffe has used Python since around 2005 (starting with Python 2.1, in the form of Jython) and has been doing what we now call Data Science since around 1986. He is a Visiting Professor in the Maths Department (Operations Research) at University of Edinburgh and runs Stochastic Solutions Limited, a consulting and software company working in Data Science. Since around 2015 Nick has been developing the ideas of test-driven data analysis (TDDA), which is an approach to quality of data and analytical processes inspired by test-driven development (TDD). The open-source Python TDDA library (for which he is the lead developer) provides support for test-driven data analysis in those areas where software can help.

Nick has previously co-authored two books, one on Sustainability for WWF, and one on a (defunct) Python online tag-based social database called Fluidinfo. By the time of this conference, his latest book, Test-Driven Data Analysis (CRC Press) should be available.

Test-Driven Data Analysis

Nicolas Makaroff

Nicolas holds a Ph.D. in applied mathematics from Université Paris Dauphine - PSL, where his research focused on machine learning, with particular emphasis on attention mechanisms and geodesic approaches to segmentation. His work on designing advanced deep learning architectures for complex datasets has led to multiple publications at leading international conferences.

He brings hands-on expertise in self-supervised learning and large-scale optimisation, and is currently contributing to Neuralk's mission to develop the first enterprise tabular foundation model.

Hands-On with Tabular Foundation Models: From Zero to Strong Baselines

Niek Tax

Niek Tax is a Staff Research Scientist and Tech Lead at Meta's Central Applied Science team in London. He focuses on longer-term, foundational work that addresses new opportunities and challenges across Meta, bridging the gap between academic rigour and product teams. Niek has extensive experience overseeing the end-to-end lifecycle of production-grade ML systems, from research to global deployment. His expertise is in uncertainty quantification, including active learning and probability calibration, and he has published articles at NeurIPS and KDD on those topics.

Before joining Meta, Niek worked as an ML engineer at Booking.com and in applied R&D at Philips Research. He holds a PhD in Computer Science from Eindhoven University of Technology, and has authored 35+ peer-reviewed publications with over 2,500 citations.

Beyond ML Model Calibration: Hands-On Multicalibration with MCGrad

NumFOCUS

PyData London

Lightning Talks
Diversity Scholar Luncheon

Ono Gantsog

I am a data scientist at a international mining group.

From Noisy Sensors to Events: Event Detection in Sensor data with Kalman Filters and Hidden Markov Models

Ophelie Bleu

I am a Senior Machine Learning Scientist at Monzo, where my main focus is around Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), and sophisticated data augmentation strategies. With 6 years of experience specializing in Natural Language Processing (NLP), I have a proven track record of building scalable AI systems for high-stakes environments.

Prior to joining Monzo, I was a Machine Learning Engineer at Bumble, leading Trust and Safety initiatives by developing LLM-powered moderation pipelines to ensure platform safety at scale. I also worked as a Senior Data Scientist at ComplyAdvantage, where I applied NLP to financial crime detection, and as a consultant at Sia, focusing on complex question-answering tasks.

I am passionate about the intersection of LLM infrastructure and practical data engineering, specifically solving the "cold-start" problem for niche domains through synthetic data and rigorous validation frameworks

When Your Dataset Has Blind Spots: Practical LLM-Based Data Augmentation

Oreolorun Olu-Ipinlaye

Oreolorun Olu-Ipinlaye is a Machine Learning/AI Engineer at Crowdhelix in London, where he builds production AI systems end-to-end for a platform connecting researchers with EU funding opportunities. As the lead engineer behind ReviewIQ; a self-hosted-LLM proposal review tool used by researchers across dozens of organisations; he has helped researchers in assessing their proposals before submission leading to more competitive proposals.

His work spans the full stack of applied ML: self-hosted LLM infrastructure, recommender systems, semantic and hybrid search, and the event-driven pipelines underneath, built largely in Python. He's particularly drawn to taking ML products from idea to adoption with measurable impact, and to making AI capabilities legible to non-technical stakeholders.

He holds an MSc in Artificial Intelligence and Data Science from the University of Hull, where he earned the award for Best Overall Performance.

Building a Browser Agent from Scratch: Teach an LLM to Navigate the Web

Oriol Abril Pla

Oriol is a computational statistician, working as a maintainer of the ArviZ and PyMC libraries and as Principal Data Scientist with PyMC Labs. He started in academia but after some years but he left after some years in order to be able to work more freely and collaboratively on open source, software and knowledge sharing. His main areas of interest are data visualization, model and inference diagnostics, model comparison, and prior elicitation. Within open source projects, he has also dedicated a large part of his work to documentation, governance and DEI.

Model criticism through posterior predictive checks
PyMC Code Sprint

Paddy Mullen

Paddy Mullen is a full‑stack engineer and data‑tooling builder. An early employee at Anaconda, he contributed to the Bokeh visualization library. He has built data tools and led teams at hedge funds and startups. Since 2023 he has been developing Buckaroo, an interactive dataframe viewer for notebook environments. He is now leading visualization at xorq-labs.

The Future of Notebooks in a Claude Code World**

Prattyush Mangal

Prattyush is a Research Software Engineer working in the Granite Feedback Team in IBM Research, based in the UK (Winchester) and the US (New York).

IBM Granite is the family of AI models from IBM and Prattyush leads product and client engagements to increase adoption of the models across various use-cases. He is a technical leader for Agentic and GenAI applications, leading efforts for education content and acts as one of the release managers, contributing to testing and release efforts.

Prattyush is part of the wider AI Foundations organisation and as such regularly contributes to the development of the latest IBM Research technologies, both internally and through open source.

Production-Ready AI Agents: From LLMs to Small Language Models

Rachel Lee Nabors

Rachel-Lee Nabors spent the better part of their career on web standards and opensource and has spearheaded developer education at FAANG and startups, on the React Team, and W3C. Now they work to usher in the future with browser builders and Silicon Valley startups, teaching a new generation of builders that “it's not magic; it's just math” and building experiences that adapt information to people. You can find them drinking tea in London or shadowboxing in San Francisco.

Keynote- Rachel Lee Nabors- The Community Is the Boat

Richard

Richard Kehinde Ogunyale is a Senior Software Engineer based in London, UK, with experience building production AI systems, scalable microservices, and machine learning pipelines. He currently works at Partnerize, where he leads projects involving AI-powered solutions, and has previously built RAG systems with vector databases, LLM-powered automation workflows using DAG architectures at scale.

He is passionate about open source, practical AI engineering, and bridging the gap between ML prototypes and reliable production systems.

Building a Browser Agent from Scratch: Teach an LLM to Navigate the Web

Sam Joseph

I am a Senior Software Engineer & AI Specialist at DRW, a proprietary trading firm. I was previously the lead AI developer at Qualis Flow, a company that is using the latest AI tech to help decarbonise the construction industry.

I am also the CTO of NeuroGrid Ltd., a software consultancy firm providing data science and software engineering services. Previously I was the CoFounder of AgileVentures, where I was the CTO and ran multiple open source charity projects in Ruby, Node, React, ReactNative etc..

Before that I was Head of Education and Engineering at the Makers Academy bootcamp, and before that Associate Professor in Computer Science at Hawaii Pacific University, where I taught courses on mobile, games, AI and software engineering, remotely from the UK.

I've been mucking about with computers for over 40 years, starting with early attempts to program games in basic on the BBC Micro and ZX Spectrum in the 80s. I studied AstroPhysics, then Cognitive Science and then Computer Science at university, picking up a PhD (building Neural Nets in C and C++) and two masters along the way. I researched mobile agents at Toshiba in Japan (Java) then went freelance for two years (writing tech articles and building the NeuroGrid search engine, and working with the Cerego learning engine - lots of SQL). After that it was researching Peer to Peer at University of Tokyo, and then to University of Hawaii where I was working with collaborative systems, augmented reality and started programming in both PHP and Ruby/Rails.

I taught Computer Science for Hawaii Pacific University remotely from the UK (Software Engineering, Computer Games programming) for five years, during which time I got involved in MOOCs (I co-ran the "Agile Development using Ruby on Rails" course on edX with UCBerkeley) and started AgileVentures, taking it to full UK charity status. I also ran the MakersAcademy coding bootcamp in London for a couple of years, and am now focused on providing AI, data science and software engineering consulting services.

Education:

PhD in Neural Networks
MS in Computer Science
MSc in Cognitive Science & Natural Language
BSc in Physics with AstroPhysics

Python Leadership and Engineering Excellence BoF

Samuel Colvin

Samuel Colvin is a Python and Rust developer and Founder of Pydantic Inc., backed by Sequoia to build Pydantic Logfire, the only observability tool that traces your AI and your backend together. The Pydantic library, which he created is downloaded over 580M/month and is a dependency of virtually every GenAI Python libraries including the OpenAI SDK, the Anthropic SDK, the Google Gen AI SDK, Langchain and LlamaIndex.

Keynote: Samuel Colvin: Pydantic Monty & Logfire: Wild LLMs, from tool calling to computer use

Samuel Jaja

Samuel is a Gen AI Engineer at Capgemini UK, building production multi-agent systems for enterprise clients. He is also the founder of Atlasync AI Ltd, an early-stage AI startup focused on compliance automation. He founded and organises PyData Hull, the UK's newest NumFOCUS chapter. Samuel holds an MSc in AI and Data Science with Distinction from the University of Hull and is AWS certified. His work focuses on multi-agent architectures, RAG pipelines, and agentic observability.

Building Production Multi-Agent RAG Systems on Serverless AWS

Simran Dave

Simran is a PhD student in high energy physics at University College London, working on direct searches for dark matter with the LUX-ZEPLIN experiment. She is undertaking a data science placement at Nesta, working with the sustainable future team to map the local heat transition using open data. She holds a Master's degree in Theoretical Physics from Imperial College London.

Mapping the local heat transition: from large-scale geospatial data to real-world impact

Sofia Pinto

Sofia is a principal data scientist at Nesta, working with the sustainable future mission team on decarbonising UK homes. During her time at Nesta, Sofia worked with energy performance certificates, social media and smart meter data to: estimate the cost of low carbon heating technologies, identify issues faced by homeowners in their low carbon heating path, understand how people consume energy in their homes and identifying the most suitable low carbon heating technology for groups of homes.

Prior to joining Nesta, Sofia worked as a data scientist at Imperial College London, assessing the accuracy of crowdsourced data for road traffic collision and injury surveillance. Before this she worked as a research fellow at the Social Physics and Complexity research group, LIP Portugal, on health related projects such as identifying antibiotic over-prescription and factors influencing it.

Sofia holds a Bachelor’s degree in Applied Mathematics and Master’s degree in Data Science and Advanced Analytics.

Mapping the local heat transition: from large-scale geospatial data to real-world impact

Sujee Maniyam

Sujee Maniyam is a Developer Advocate at Nebius, with a background spanning AI, distributed systems, data engineering, and cloud infrastructure. Outside of work, he is usually on a local pickleball court.

"I’m a developer, technical instructor, and entrepreneur, now focusing on Developer Advocacy / Developer Relations for AI. I have worked with AI/ML, Data Engineering, Distributed Systems, Big Data, and Cloud technologies. As a AI Developer Relations Engineer, I combine hands-on engineering with community building to help developers make the most of AI, specially open-source AI. I also influence/shape the product by providing feedback to dev/product teams. See my developer advocacy work".

Using coding agents with open models

Theo van Kraay

Theo is passionate about NoSQL and distributed computing. He joined Microsoft in 2017 and has been in the Cosmos DB Engineering team as a Program Manager since 2019. He currently focuses on AI, programmability, and developer experience for Azure Cosmos DB. He has a masters degree in Data Science from Dundee University, and lives in the UK with his wife, two boys, and ragcoon cat.

Designing Semantic Memory for Multi-Agent Systems with Python

Thomas Ogden

Thomas Ogden is a Senior Data Scientist in Platform at Spotify. He builds tools, mostly with probabilistic machine learning on sequences and graphs.

No Ropes on a Boat: Coherent Forecasting

Tun Shwe

Tun leads AI Engineering at Lenses, where he is focused on helping companies imagine and implement their strategic vision with agentic AI systems fuelled by real-time context. He was previously a Head of Data and Data/ML Engineer at high growth startups and has spent 20 years building data-intensive applications and leading T-shaped teams.

Tun is a co-organiser for the annual PyData London conference and co-founder of PyData Cornwall. He is a strong advocate in the Python AI engineering community and contributor to open source AI engineering and Apache Kafka tools.

In his spare time, Tun goes surfing, plays guitar and shoots 35mm film.

Observing Agentic AI in Production: MCP Server Tracing with OpenTelemetry and Animal Crossing

Viktor Kessler

Viktor Kessler, is Co-Founder of Vakamo and the creator of Lakekeeper, an Apache Licensed Iceberg REST Catalog. He’s a big believer in open standards like Apache Iceberg, which he sees as the backbone of today’s modern, composable Data & Analytics systems.

Governance-as-Code for the Lakehouse: Zero Trust with Iceberg REST Catalog and Policy Engines

Özge Çinko

Hello world! 👋
I'm Özge Çinko. I'm currently an AI Engineer at ING, working around agentic AI. Before that, I spent two years as an AI Research Engineer at Huawei, where I focused on research-driven AI systems, including recommender systems. I hold a Bachelor's degree in Computer Engineering from Sakarya University. For me, engineering is a creative craft: a way of turning thoughts, emotions, and curiosity into experiences. I care about building technology that feels more purposeful, more human, and more alive. I love researching, building, learning, and exploring because they make me feel alive in the deepest way. I also love expressing myself through writing, speaking, and meaningful conversations, often inspired by art along the way.

LLM-Based Recommendation Systems: From Embeddings to Real Personalization