PyCon DE & PyData 2025

To see our schedule with full functionality, like timezone conversion and personal scheduling, please enable JavaScript and go here.
08:00
08:00
120min
Registration
Zeiss Plenary (Spectrum)
10:00
10:00
30min
Opening Session
Zeiss Plenary (Spectrum)
10:30
10:30
45min
Reasonable AI
Kristian Kersting

The relationship between humans and machines, especially in the context of Artificial Intelligence (AI), is shaped by hopes, concerns, and moral questions. On the one hand, advances in AI offer great promise: it can help us solve complex problems, improve healthcare, streamline workflows, and much more. Yet, at the same time, there are legitimate concerns about the control over this technology, its potential impact on jobs and society, and ethical issues related to discrimination and the loss of human autonomy. In the talk I shall will explore and illustrate the complex tension between innovation and moral responsibility in AI research.

Keynote
Zeiss Plenary (Spectrum)
11:15
11:15
30min
Coffee Break
Zeiss Plenary (Spectrum)
11:15
30min
Coffee Break
Titanium3
11:15
30min
Coffee Break
Helium3
11:15
30min
Coffee Break
Platinum3
11:15
30min
Coffee Break
Europium2
11:15
30min
Coffee Break
Hassium
11:15
30min
Coffee Break
Palladium
11:15
30min
Coffee Break
Ferrum
11:15
30min
Coffee Break
Dynamicum
11:45
11:45
30min
Are LLMs the answer to all our problems?
Dr. Maria Börner

Generative AI models have shaken up the German market. Since the release of ChatGPT, AI is available and usable for everyone. The number of ChatGPT-based agents is growing rapidly, but concerns about privacy, copyright and ethics remain. Regulation and ethical AI go hand in hand, but are often seen as barriers. The presentation will cover the different aspects of ethics and how they are addressed by regulation. It will give an overview of how to use large language models in a safe and practical way. This won't only address the various ethical issues, but also convince your next customer to invest in your AI-based product.

General: Ethics & Privacy
Hassium
11:45
30min
Beyond Code: Fostering Diversity and Inclusion in Open Source
Ariane Djeupang

Open source thrives not just on code, but on the diverse perspectives and inclusive practices that shape our communities. In this dynamic talk, we explore the intersection of open source and diversity, shedding light on how to bridge existing gaps and build welcoming environments for underrepresented groups. Through real-world examples, practical strategies, and inspiring stories from existing researches made on African communities, we'll uncover the immense benefits of diversity and equip you with actionable steps to foster inclusivity. Join us to learn how you can help create a more vibrant and inclusive open-source community that goes beyond code.

General: Community & Diversity
Palladium
11:45
90min
Deploy RAG Applications Using Docker: A Step-by-Step Guide
Brain Aboze

Docker is a game-changer for deploying AI applications, offering portable and consistent environments across platforms. In this tutorial, we’ll explore how to build and deploy a Retrieval-Augmented Generation (RAG) application using Docker. RAG applications combine data retrieval with generative AI to produce contextually relevant and accurate responses, making them powerful tools for interactive Q&A systems.

We will build a document-based Q&A application that allows users to upload files, retrieve context from them, and answer questions interactively. We will use LlamaIndex to build the RAG pipeline and use both open-source LLM from Groq and closed-source LLM, such as OpenAI models, for LLM access. This tutorial will guide you through the entire lifecycle—from building the app to deploying it using Docker on Hugging Face Spaces. Whether you’re new to Docker or experienced with RAG, this guide offers a hands-on approach to deploying scalable and efficient AI solutions.

PyData: Natural Language Processing & Audio (incl. Generative AI NLP)
Dynamicum
11:45
30min
Duplicate record detection using GenAI techniques to improve data quality
Ian Ormesher

Duplicate records can have a negative impact on many areas of a business. Current methods to detect duplicate records use traditional NLP techniques known as “Entity Matching”. An improvement to this traditional method can be achieved by incorporating GenAI techniques that do not entail any calls to OpenAI. Not only does this produce better matches, but it also keeps the data safe, since no information is transferred externally.

PyData: Natural Language Processing & Audio (incl. Generative AI NLP)
Platinum3
11:45
30min
Go Beyond Basic RAG with Agentic Behavior
Bilge Yücel

RAG has transformed AI systems by combining retrieval and generation, but traditional workflows often struggle with the dynamic demands of real-world applications, such as multi-step queries or integrating external APIs. Agentic behavior enhances RAG by enabling LLMs to make decisions, call tools, and adapt workflows dynamically. In this talk, we’ll define agentic behavior, explore its core features such as routing, tool integration, and reasoning, and demo its practical implementation in Python using Haystack.

PyData: Generative AI
Europium2
11:45
30min
Introducing the Synthetic Data SDK - Privacy Preserving Synthetic Data for AI/ML
Michael Platzer

AI-generated synthetic data is gaining traction as a privacy-safe solution for data access and sharing. This data is created from original datasets, maintaining privacy without compromising utility.

In this Session, we'll cover the fundamental concepts of AI-generated synthetic data and demonstrate how easy it is to generate synthetic data within your local compute environment using the open-source Synthetic Data SDK.

PyData: Data Handling & Engineering
Helium3
11:45
90min
Probably Fun: Board Games to teach Data Science
Dr. Kristian Rother, Paula Gonzalez Avalos

In this tutorial, you will speed-date with board and card games that can be used to teach Data Science. You will play one game for 15 minutes, reflect on the Data Science concepts it involves, and then rotate to the next table.

As a result, you will experience multiple ideas that you can use to make complex ideas more understandable and enjoyable. We would like to demonstrate how gamification can not only used to produce short puzzles and quizzes, but also as a tool to reason complex problem-solving strategies.

We will bring a set of carefully selected games that have been proven effective in teaching statistics, programming, machine learning and other Data Science skills. We also believe that it is probably fun to participate in this tutorial.

General: Education, Career & Life
Ferrum
11:45
30min
Python Performance Unleashed: Essential Optimization Techniques Beyond Libraries
Thomas Berger

Every Python developer faces performance challenges, from slow data processing to memory-intensive operations. While external libraries like Numba or Cython offer solutions, understanding core Python optimization techniques is crucial for writing efficient code. This talk explores practical optimization strategies using Python's built-in capabilities, demonstrating how to achieve significant performance improvements without external dependencies. Through real-world examples from machine learning pipelines and data processing applications, we'll examine common bottlenecks and their solutions. Whether you're building data pipelines, web applications, or ML systems, these techniques will help you write faster, more efficient Python code.

PyCon: Python Language & Ecosystem
Zeiss Plenary (Spectrum)
11:45
30min
Why E.ON Loves Python
Christer Friberg

Join me as I share my 20-year journey with Python and its pivotal role at E.ON. Discover how we transitioned fully to Python, streamlined our development framework, and embraced MLOps principles. Learn about some of our AI projects, including image analysis and real-time inference, and our steps towards open-sourcing code to foster innovation in the energy sector. Explore why Python is our go-to language for data science and collaboration.

PyCon: MLOps & DevOps
Titanium3
12:25
12:25
45min
Soon Revealed
Platinum3
12:25
30min
From Tensors to Clouds — A Practical Guide to Zarr V3 and Zarr-Python 3
Sanket Verma

A key feature of the Python data ecosystem is the reliance on simple but efficient primitives that follow well-defined interfaces to make tools work seamlessly together (Cf. http://data-apis.org/). NumPy provides an in-memory representation for tensors. Dask provides parallelisation of tensor access. Xarray provides metadata linking tensor dimensions. Zarr provides a missing feature, namely the scalable, persistent storage for annotated hierarchies of tensors. Defined through a community process, the Zarr specification enables the storage of large out-of-memory datasets locally and in the cloud. Implementations exist in C++, C, Java, Javascript, Julia, and Python, enabling.

This talk presents a systematic approach to understanding and implementing the newer version of Zarr-Python, i.e. Zarr-Python 3 by explaining the new API, deprecations, new storage backend, improved codec pipeline, etc.

I will also show the performance improvements in ZP-3 while creating, reading and writing async Zarr arrays across local and remote storage like AWS S3.

PyData: Data Handling & Engineering
Palladium
12:25
45min
From Text to Multimodal: Building Self-Hosted RAG Systems with Open Source
Stephen Batifol

While text-based RAG systems have been everywhere in the last year and a half, the real power comes from combining multiple modalities

In this session, we'll show how it is possible to create a Multimodal RAG system using Open-Source tools such as vLLM for deploying your LLM, Pixtral for multimodal understanding, and Milvus for storing your vectors.

I'll walk you through the architecture and implementation details of setting up your own infrastructure, demonstrating how to effectively process and understand both images and text in a unified system.
The talk will include a live demo, showing the real-time implementation and performance of the system.

Whether you're looking to reduce dependency on commercial APIs or need more control over your LLM infrastructure, this talk will provide you with practical insights and implementation strategies.

PyData: Generative AI
Europium2
12:25
30min
The aesthetics of AI: from cyberpunk to fascism
Laura Summers

Let’s explore the visual grammars, references and cultural norms at play in the field of AI; from Kismet to Spot®, from Clippy to Claude. As a sector we can be hyper-focused on technical process and function, to the extent that it blinkers our understanding of the cultural and political impacts of our work. Aesthetics infuse every aspect of technology. Aesthetic interpretations are manifold and mutable, constructed in-congress with the observer and not fully defined by the original designer. AI technologies add additional layers of subtext: character, consciousness, agency, intent.

Despite this murkiness, or perhaps because of it, this talk makes an passionate argument for engaging with historical aesthetic movements, for building our shared professional knowledge of fads and fashions⎯not just from the past 40 years of internet culture⎯but also the past 140 years of ideology, technology, and thought.

General: Others
Hassium
12:25
45min
Why AI Projects Fail – Chronicles of Failure and How to Overcome Them
Alexander CS Hendorf

Why do AI projects fail? Spoiler: It’s rarely about the technology. Organizations often stumble over unrealistic expectations, siloed data, and cultural roadblocks. This talk dives into the human and organizational dynamics that cause AI initiatives to derail.

Using real-world examples, I’ll showcase how missing tram stop location data was found on a hobbyist’s blog when official systems failed. Or how critical historical data for resource planning sat locked in Excel files with forgotten passwords, forcing a desperate password recovery operation. These examples highlight the surprising ways poor data governance and lack of interdisciplinary collaboration sabotage even well-intentioned AI projects.

But all is not lost! We’ll explore actionable strategies to navigate these challenges: shifting from shiny tools to R&D-driven approaches, fostering better communication between IT and business teams, and building strong foundations for clean, accessible data. Attendees will leave with insights into what AI success truly requires: cultural readiness, realistic expectations, and a commitment to collaboration.

If you’re tired of AI hype and looking for practical solutions, this talk offers a clear roadmap to ensure AI projects deliver measurable impact.

General: Others
Zeiss Plenary (Spectrum)
12:25
45min
Why Exceptions Are Just Sophisticated Gotos - and How to Move Beyond
Florian Wilhelm

"Why Exceptions Are Just Sophisticated Gotos - and How to Move Beyond" explores a common programming tool with a fresh perspective. While exceptions are a key feature in Python and other languages, they share surprising similarities with the notorious goto statement. This talk examines those parallels, the problems exceptions can create, and practical alternatives for better code. Attendees will gain a clear understanding of modern programming concepts and the evolution of programming.

PyCon: Programming & Software Engineering
Titanium3
12:25
45min
expectation: A modern take on statistical A/B testing with e-values and martingales
Rostami Jako

This talk introduces a novel Python library for statistical testing using e-values, offering a refreshing alternative to traditional p-values. We'll explore how this approach enables real-time sequential testing, allowing data scientists to monitor experiments continuously without the statistical penalties of repeated testing. Through practical examples, we'll demonstrate how e-values provide more intuitive evidence measures and enable flexible stopping rules in A/B testing, clinical trials, and anomaly detection. The library implements cutting-edge methods from game-theoretic probability, making advanced sequential testing accessible to Python practitioners. Whether you're conducting A/B tests, monitoring production models, or running clinical trials, this talk will equip you with powerful new tools for sequential data analysis.

PyData: Machine Learning & Deep Learning & Statistics
Helium3
13:10
13:10
80min
Lunch Break
Zeiss Plenary (Spectrum)
13:10
80min
Lunch Break
Titanium3
13:10
80min
Lunch Break
Helium3
13:10
80min
Lunch Break
Platinum3
13:10
80min
Lunch Break
Europium2
13:10
80min
Lunch Break
Hassium
13:10
80min
Lunch Break
Palladium
13:15
13:15
75min
Lunch Break
Ferrum
13:15
75min
Lunch Break
Dynamicum
14:30
14:30
30min
AI coding agent - what it is, how it works and is it good for developers
Cheuk Ting Ho

In this talk, we will have a deeper technical look at AI coding agents, their design, and how they can carry out coding tasks with the support of large language models. We will look at the journey from the user entering a prompt to how it converts to actions in completing the task.

After that, we will look at the impact it could make in the industry, as a developer, whether or not you should use an AI coding agent, and what a user should be cautious of when using suchan agent.

PyData: Generative AI
Platinum3
14:30
30min
Autonomous Browsing using Large Action Models
Arne Grobrügge, Nico Kreiling

The browser serves as our gateway to the internet—the largest repository of knowledge in human history. Proficiency in its use is a core skill across nearly all professions and is becoming increasingly important for Artificial Intelligence. But can Large Action Models (LAMs) autonomously operate a browser? What exactly are LAMs that promise to translate human intentions into actions? We report on a project that fully automates the job application process using AI: from navigating unfamiliar website structures and filling out forms to handling document uploads and cookie banners.

PyData: Natural Language Processing & Audio (incl. Generative AI NLP)
Hassium
14:30
30min
Benchmarking Time Series Foundation Model with sktime
Benedikt Heidrich

Recent time series foundation models such as LagLlama, Chronos, Moirai, TinyTimesMixer promise zero-shot forecasting for arbitrary time series. One central claim of foundation models is their ability to perform zero-shot forecasting, that is, to perform well with no training data. However, performance claims of foundation models are difficult to verify, as public benchmark datasets may have been a part of the training data, and only the already trained weights are available to the user.

Therefore, performance in specific use cases must be verified on the use case data itself, to ensure a reliable assessment of forecasting performance. sktime allows users to easily produce a performance benchmark of any collection of forecasting models, foundation models, simple baselines, or custom methods, on their internal use case data.

PyData: Machine Learning & Deep Learning & Statistics
Helium3
14:30
30min
Django's Dilemma: Balancing Simplicity with Scalability
Anette Haferkorn

Django's model-view-serializer approach works great for small apps, but as projects grow, you might face challenges like scattered database operations and APIs that are too closely linked to your database models. These issues can make unit testing and scaling harder. I'll share real-world examples of these problems and show how to refactor an app using ideas from Domain-Driven Design (DDD) and hexagonal architecture. We'll look at a before-and-after example to see how these changes can make your app easier to use and debug. Plus, we'll discuss when Django's simplicity is enough and when it's worth adopting a more structured approach. You'll leave with practical tips for transforming your Django projects into systems that can handle increased complexity.

PyCon: Django & Web
Palladium
14:30
30min
From Trees to Transformers: GetYourGuide’s Journey Towards Deep Learning for Ranking
Theodore Meynard, Mihail Douhaniaris

GetYourGuide, a global marketplace for travel experiences, reached diminishing returns with its XGBoost-based ranking system. We switched to a Deep Learning pipeline in just nine months, maintaining high throughput and low latency. We iterated on over 50 offline models and conducted more than 10 live A/B tests, ultimately deploying a PyTorch transformer that yielded significant gains. In this talk, we will share our phased approach—from a simple baseline to a high-impact launch—and discuss the key operational and modeling challenges we faced. Learn how to transition from tree-based methods to neural networks and unlock new possibilities for real-time ranking.

PyData: Machine Learning & Deep Learning & Statistics
Zeiss Plenary (Spectrum)
14:30
90min
Instrumenting Python Applications with OpenTelemetry
Mika Naylor, Emily Woods

Observability is challenging and often requires vendor-specific instrumentation. Enter OpenTelemetry: a vendor-agnostic standard for logs, metrics, and traces. Learn how to instrument Python applications with OpenTelemetry and send telemetry to your preferred observability backends.

PyCon: MLOps & DevOps
Ferrum
14:30
30min
LLM Inference Arithmetics: the Theory behind Model Serving
Luca Baggi

Have you ever asked yourself how parameters for an LLM are counted, or wondered why Gemma 2B is actually closer to a 3B model? You have no clue about what a KV-Cache is? (And, before you ask: no, it's not a Redis fork.) Do you want to find out how much GPU VRAM you need to run your model smoothly?

If your answer to any of these questions was "yes", or you have another doubt about inference with LLMs - such as batching, or time-to-first-token - this talk is for you. Well, except for the Redis part.

PyData: Generative AI
Titanium3
14:30
30min
Powering Up DDoS Defense with Python: Building Resilient Systems
Siddharth Vijay

This talk will explore how Python can be leveraged to build robust DDoS defense mechanisms, focusing on real-time threat detection, mitigation strategies, and system resilience. We will dive into key Python libraries, best practices, and techniques to protect your applications from large-scale DDoS attacks while ensuring high availability.

PyCon: Security
Europium2
14:30
90min
supplyseer: Computational Supply Chain with Python
Rostami Jako

This talk introduces supplyseer, an open-source Python library that brings advanced analytics to Supply Chain and Logistics. By combining time series embedding techniques, stochastic process modeling, and geopolitical risk analysis, supplyseer helps organizations make data-driven decisions in an increasingly complex global supply chain landscape. The library implements novel approaches like Takens embedding for demand forecasting, Hawkes processes for modeling supply chain events, and Bayesian methods for inventory optimization. Through practical examples and real-world use cases, we'll explore how these mathematical concepts translate into actionable insights for supply chain practitioners.

PyData: Machine Learning & Deep Learning & Statistics
Dynamicum
15:10
15:10
45min
Beyond Basic Prompting: Supercharging Open Source LLMs with LMQL's Structured Generation
Christiaan Swart

This intermediate-level talk demonstrates how to leverage Language Model Query Language (LMQL) for structured generation and tool usage with open-source models like Llama. You will learn how to build a RAG system that enforces output constraints, handles tool calls, and maintains response structure - all while using open-source components. The presentation includes hands-on examples where audience members can experiment with LMQL prompts, showcasing real-world applications of constrained generation in production environments.

PyData: Natural Language Processing & Audio (incl. Generative AI NLP)
Europium2
15:10
30min
Building Reliable AI Agents for Publishing: A DSPy-Based Quality Assurance Framework
Simonas Černiauskas

As publishers increasingly adopt AI agents for content generation and analysis, ensuring output quality and reliability becomes critical. This talk introduces a novel quality assurance framework built with DSPy that addresses the unique challenges of evaluating AI agents in publishing workflows. Using real-world examples from newsroom implementations, I will demonstrate how to design and implement systematic testing pipelines that verify factual accuracy, content consistency, and compliance with editorial standards. Attendees will learn practical techniques for building reliable agent evaluation systems that go beyond simple metrics to ensure AI-generated content meets professional publishing standards.

PyData: Natural Language Processing & Audio (incl. Generative AI NLP)
Palladium
15:10
45min
Inclusive Data for 1.3 Billion: Designing Accessible Visualizations
Pavithra Eswaramoorthy, Dr. Tania Allard

According to the World Health Organization (WHO), an estimated 1.3 billion people (1 in 6 individuals) experience a disability, and nearly 2.2 billion people (1 in 5 individuals) have vision impairment. Improving the accessibility of visualizations will enable more people to participate in and engage with our data analyses.

In this talk, we’ll discuss some principles and best practices for creating more accessible data visualizations. It will include tips for individuals who create visualizations, as well as guidelines for the developers of visualization software to help ensure your tools can help downstream designers and developers create more accessible visualizations.

PyData: Visualisation & Jupyter
Platinum3
15:10
45min
Open Table Formats in the Wild: From Parquet to Delta Lake and Back
Franz Wöllert

Open table formats have revolutionized analytical, columnar storage on cloud object stores with critical features like ACID compliance and enhanced metadata management, once exclusive to proprietary cloud data warehouses. Delta Lake, Iceberg, and Hudi have significantly advanced over traditional open file formats like Parquet and ORC.

In an effort to modernize our data architecture, we aimed to replace our Parquet-based bronze layer with Delta Lake, anticipating better query performance, reduced maintenance, native support for incremental processing, and more. While our initial pilot showed promise, we encountered unexpected pitfalls that ultimately brought us back to where we began.

Curious? Join me as we shed light on the current state of table formats.

PyData: Data Handling & Engineering
Zeiss Plenary (Spectrum)
15:10
30min
PDFs - When a thousand words are worth more than a picture (or table).
Caio Benatti Moretti

PDF, a must-have in RAG systems, ensures visual fidelity across platforms and devices, at the expense of compromising what would be the core condition for computers to properly process and interpret text: semantics. That means any logical arrangement of text, upon rendering, explodes into dummy visual shards of data that literally portrait the bigger picture for the human eye to perceive, but no longer convey the information computers should grasp. Such a bottleneck already makes proper ingestion of text-only documents a big challenge, let alone when tables or figures come into play, the ultimate nightmare for PDF parsers, not to say developers. The rest you must have already foreseen: a RAG system barfing unreliable knowledge from bad chunks (based on regular PDF parsing), if those ever get to be retrieved from a vector database. In this talk you can gather some vision-driven insights on how to leverage the strengths of PDF and language models towards good chunks to be ingested. Or, in other words, how multimodal models can go beyond trivial reverse engineering by decomposing tables into its building blocks, in plain language, as how those would be explained to another human; or better yet, as how humans would ask questions about such pieces of knowledge. And from such a strategy, we transfer the same rationale to figures. Come along, gather some insights, and get inspired to break down tables and figures from your own PDFs, and to improve retrieval in your RAG systems.

PyData: Generative AI
Hassium
15:10
45min
PyData Stack: Building and deploying pure Python, open source data platforms
Eric Thanenthiran

Modern open source Python data packages offer the opportunity to build and deploy pure Python, production-ready data platforms. Engineers can and do play a big role in helping companies become data-driven by centralising this data, cleaning and modelling it and presenting back to the business. Now more than ever it allows engineers and companies of any size the ability to build data products and insights for relatively low cost. In this talk we’ll walk through the key components of this stack, tooling options available and demo a deployable containerised Python data stack.

PyData: Data Handling & Engineering
Helium3
15:10
45min
Size matters: Inspecting Docker images for Efficiency and Security
Irena Grgic

Inspecting Docker images is crucial for building secure and efficient containers. In this session, we will analyze the structure of a Python-based Docker image using various tools, focusing on best practices for minimizing image size and reducing layers with multi-stage builds. We’ll also address common security pitfalls, including proper handling of build and runtime secrets.

While this talk offers valuable insights for anyone working with Docker, it is especially beneficial for Python developers seeking to master clean and secure containerization techniques.

PyCon: MLOps & DevOps
Titanium3
16:10
16:10
30min
Soon revealed!
Platinum3
16:10
30min
Soon revealed!
Ferrum
16:10
30min
Beyond FOMO — Keeping Up-to-Date in AI
Carsten Frommhold

The rapid evolution of AI technologies, particularly since the emergence of Large Language Models, has transformed the data science landscape from a field of steady progress to one of constant breakthroughs. This acceleration creates unique challenges for practitioners, from managing FOMO to battling imposter syndrome. Drawing from personal experience transitioning from mathematical modeling to modern AI development, this talk explores practical strategies for staying current while maintaining sanity. We'll discuss building effective learning structures, creating collaborative knowledge-sharing environments, and finding the right balance between innovation and implementation. Attendees will leave with actionable insights on navigating technological change while fostering sustainable growth in their teams and careers.

General: Education, Career & Life
Europium2
16:10
30min
Conformal Prediction: uncertainty quantification to humanise models
Vincenzo Ventriglia

Quantifying model uncertainties is critical to improve model reliability and make sound decisions. Conformal Prediction is a framework for uncertainty quantification that provides mathematical guarantees of true outcome coverage, allowing more informed decisions to be made by stakeholders

PyData: Machine Learning & Deep Learning & Statistics
Dynamicum
16:10
30min
Deploying Synchronous and Asynchronous Django Applications for Hobby Projects
melhin

Simplify deploying hybrid Django applications with synchronous views and asynchronous apps. This session covers ASGI support, Docker containerization, and Kamal for seamless, zero-downtime deployments on single-server setups, ideal for hobbyists and small-scale projects.

PyCon: Django & Web
Palladium
16:10
30min
Driving Trust and Fairness: Addressing Ethical Challenges in Transportation through Explainable AI
Natalie Beyer

Machine Learning can transform transportation—improving safety, optimizing routes, and reducing delays—yet it also presents ethical concerns. From potential algorithmic bias to opaque decision-making processes, trust and fairness are at stake. In this talk,I will show how Explainable AI (XAI) can offer practical solutions to these ethical dilemmas. Instead of focusing on the technical underpinnings, we will discuss how transparency, accountability, and fairness can be enhanced in AI-driven transportation systems. Using a real-world example, I will demonstrate how XAI provides the groundwork for building ethical, trustworthy, and socially responsible AI solutions in public transportation systems.

General: Ethics & Privacy
Hassium
16:10
30min
How to use Data Science Superpowers in real life, a Bayesian perspective
Tim Lenzen

In the data science field, we use all these powerful methods to solve important problems. Most of the time, we do this very well because our data science and machine-learning toolbox fits the problems we tackle quite precisely. Yet, what about our everyday choices or even our most important life decisions? Can we use for our private lives what we advocate for in our jobs or are these choices inherently different?
Many of this real life decisions are a little different than textbook machine-learning problems. There is often less or hard-to-come-by data and the decisions are infrequent, but sometimes very consequential. This talk will dive into what makes everyday decisions difficult to handle with our data science toolbox. It will show how Bayesian thinking can help to reason in such cases, especially when there is not a lot of data to rely on.

PyData: Machine Learning & Deep Learning & Statistics
Helium3
16:10
30min
Mastering Demand Forecasting: Lessons from Europe's Largest Retailer
Moreno Schlageter, Yovli Duvshani

Ever craved your favorite dish, only to find its key ingredient missing from the store? You're not alone - stock outs can have significant consequences for businesses, resulting in frustrated customers and lost sales. On the other hand, overstocking can lead to wasted storage costs and potential write-offs. The replenishment system is responsible for striking the right balance between these opposing risks.
The key to successful replenishment is making accurate predictions about future demand.

This presentation takes a deep dive into the intricate world of demand forecasting, at Europe's largest retailer. We will demonstrate how enhancing simple machine learning methods with domain knowledge allows to generate hundreds of millions of high-quality forecasts every day.

PyData: Machine Learning & Deep Learning & Statistics
Zeiss Plenary (Spectrum)
16:10
30min
Multivariate Datastrophe: Methods to Detect Obscure Drift in Your Production Data
Magdalena Kowalczuk

Getting your model into production isn’t a trivial task, but it’s only half the battle. Ensuring that your model continues to deliver great performance over time is even more critical. In this talk I would like to present a selection of what can kill your model’s performance, zoom in on multivariate data drift, and present two methods to detect this type of drift in your production data.

PyCon: MLOps & DevOps
Titanium3
16:40
16:40
30min
Coffee Break
Zeiss Plenary (Spectrum)
16:40
30min
Coffee Break
Titanium3
16:40
30min
Coffee Break
Helium3
16:40
30min
Coffee Break
Platinum3
16:40
30min
Coffee Break
Europium2
16:40
30min
Coffee Break
Hassium
16:40
30min
Coffee Break
Palladium
16:40
30min
Coffee Break
Ferrum
16:40
30min
Coffee Break
Dynamicum
17:10
17:10
30min
Soon revealed!
Europium2
17:10
30min
Soon revealed!
Dynamicum
17:10
30min
Conquering PDFs: document understanding beyond plain text
Ines Montani

NLP and data science could be so easy if all of our data came as clean and plain text. But in practice, a lot of it is hidden away in PDFs, Word documents, scans and other formats that have been a nightmare to work with. In this talk, I'll present a new and modular approach for building robust document understanding systems, using state-of-the-art models and the awesome Python ecosystem. I'll show you how you can go from PDFs to structured data and even build fully custom information extraction pipelines for your specific use case.

PyData: Natural Language Processing & Audio (incl. Generative AI NLP)
Zeiss Plenary (Spectrum)
17:10
30min
Enhancing Software Supply Chain Security with Open Source Python Tools
Anthony Harrison

The Cyber Resilience Act (CRA) is focused on improving the security and resilience of digital products. But to comply with the CRA, businesses will need to start preparing the necessary evidence to ensure compliance if they want to continue to deliver digital products to the EU market once the CRA is in force.

Key requirements within the CRA include implementing robust security measures throughout the product life-cycle, adopting secure development practices and implementing proactive vulnerability management processes.

This session will show how a number of the requirements for the CRA can be achieved by use of a number of open source Python tools.

PyCon: Security
Hassium
17:10
30min
Generative-AI: Usecase-Specific Evaluation of LLM-powered Applications
Dr. Homa Ansari

This talk addresses the critical need for use case-specific evaluation of Large Language Model (LLM)-powered applications, highlighting the limitations of generic evaluation benchmarks in capturing domain-specific requirements. It proposes a workflow for designing evaluation pipelines to optimize LLM-based applications, consisting of three key activities: human-expert evaluation and benchmark dataset curation, creation of evaluation agents, and alignment of these agents with human evaluations using the curated datasets. The workflow produces two key outcomes: a curated benchmark dataset for testing LLM applications and an evaluation agent that scores their responses. The presentation further addresses the limitations, and best practices to enhance the reliability of evaluations, ensuring LLM applications are better tailored to specific use cases.

PyData: Natural Language Processing & Audio (incl. Generative AI NLP)
Platinum3
17:10
30min
Getting Started with Bayes in Engineering: Implementing Kalman Filters with RxInfer.jl
Victor Flores Terrazas

Bayesian methods are not commonly seen in Civil Engineering and Structural Dynamics. In this talk we explore how RxInfer.jl and the Julia Programming Language can simplify Bayesian modeling by implementing a Kalman filter for tracking the dynamics of a structural system. Perfect for engineers, researchers, and data scientists eager to apply probabilistic modelling and Bayesian methods to real-world engineering challenges.

PyData: Research Software Engineering
Palladium
17:10
30min
Guiding data minds: how mentoring transforms careers for both sides
Anastasia Karavdina

Mentorship is a powerful way to shape careers while building meaningful connections in the data field. In this talk, I’ll share my journey as a professional mentor, what the role entails, and the impact it has on both mentees and mentors. Learn how mentorship drives growth, fosters innovation, and creates value for the data community—and why you should consider stepping into this rewarding role.

General: Community & Diversity
Titanium3
17:10
30min
Information Retrieval Without Feeling Lucky: The Art and Science of Search
Anja Pilz

Search is everywhere, yet effective Information Retrieval remains one of the most underestimated challenges in modern technology. While Retrieval-Augmented Generation has captured significant attention, the foundational element - Information Retrieval - often remains underexplored.

In this talk, we put Information Retrieval center stage by asking:
How do we know that user queries and data 'speak' the same language?
How do we evaluate the relevance and completeness of search results? And how do we prioritize what gets displayed? Or do we even want to hide specific content?

We try to answer these questions by introducing the audience to the art and science of Information Retrieval, exploring metrics such as precision, recall, and desirability. We’ll examine key challenges, including ambiguity, query relaxation, and the interplay between sparse and dense search techniques. Through a live demo using public content from Sendung mit der Maus, we show how hybrid search improves upon vector and keyword based search in isolation.

General: Others
Helium3
17:10
30min
Supercharge Your Testing with inline-snapshot
Frank Hoffmann

Snapshot tests are invaluable when you are working with large, complex, or frequently changing expected values in your tests.
Introducing inline-snapshot, a Python library designed for snapshot testing that integrates seamlessly with pytest, allowing you to embed snapshot values directly within your source code.
This approach not only simplifies test management but also boosts productivity by improving the maintenance of the tests.
It is particularly useful for integration testing and can be used to write your own abstractions to test complex Apis.

PyCon: Testing
Ferrum
17:50
17:50
30min
Soon revealed!
Platinum3
17:50
30min
Soon revealed!
Ferrum
17:50
30min
Build a personalized Commute agent in Python with Hopsworks, LangGraph and LLM Function Calling
Javier de la Rúa Martínez

The invention of the clock and the organization of time in zones have helped synchronize human activities across the globe. While timekeepers are better at planning and sticking to the plan, time optimists somehow believe that time is malleable and extends the closer the deadline. Nevertheless, whether you are an organized timekeeper or a creative timebender, external factors can affect your commute.

In this talk, we will define the different components necessary to build a personalized commute virtual agent in Python. The agent will help you analyze your historical lateness records, estimate future delays, and suggest the best time to leave home based on these predictions. It will be powered by a LLM and will use a technique called Function Calling to recognize the user intent from the conversation history and provide informed answers.

PyData: Data Handling & Engineering
Dynamicum
17:50
30min
Is Prompt Engineering Dead? How Auto-Optimization is Changing the Game
Iryna Kondrashchenko, Oleh Kostromin

The rise of LLMs has elevated prompt engineering as a critical skill in the AI industry, but manual prompt tuning is often inefficient and model-specific. This talk explores various automatic prompt optimization approaches, ranging from simple ones like bootstrapped few-shot to more complex techniques such as MIPRO and TextGrad, and showcases their practical applications through frameworks like DSPy and AdalFlow. By exploring the benefits, challenges, and trade-offs of these approaches, the attendees will be able to answer the question: is prompt engineering dead, or has it just evolved?

PyData: Natural Language Processing & Audio (incl. Generative AI NLP)
Zeiss Plenary (Spectrum)
17:50
30min
Modern NLP for Proactive Harmful Content Moderation
Daryna Dementieva

Despite an array of regulations implemented by governments and social media platforms worldwide (i.e. famous DSA), the problem of digital abusive speech persists. At the same time, rapid advances in NLP and large language models (LLMs) are opening up new possibilities—and responsibilities—for using this technology to make a positive social impact. Can LLMs streamline content moderation efforts? Are they effective at spotting and countering hate speech, and can they help produce more proactive solutions like text detoxification and counter-speech generation?

In this talk, we will dive into the cutting-edge research and best practices of automatic textual content moderation today. From clarifying core definitions to detailing actionable methods for leveraging multilingual NLP models, we will provide a practical roadmap for researchers, developers, and policymakers aiming to tackle the challenges of harmful online content. Join us to discover how modern NLP can foster safer, more inclusive digital communities.

PyData: Natural Language Processing & Audio (incl. Generative AI NLP)
Hassium
17:50
30min
Streamlining Python deployment with Pixi: A Perspective from production
Dennis Weyland

In our quest to improve Python deployments, we explored Pixi, a tool designed to enhance dependency management within the Conda ecosystem. This talk recounts our experience integrating Pixi into a setup used in production. We leveraged Pixi to create lockfiles, ensuring consistent builds, and to automate deployments via CI/CD pipelines. This integration led to greater reliability and efficiency, minimizing deployment errors and allowing us to concentrate more on development. Join us as we share how Pixi transformed our deployment process and offer insights into optimizing your own workflows.

PyCon: MLOps & DevOps
Europium2
17:50
30min
Streamlining the Cosmos: Pythonic Workflow Management for Astronomical Analysis
Raphael Hviding

Astronomical surveys are growing rapidly in complexity and scale, necessitating accurate, efficient, and reproducible reduction and analysis pipelines. In this talk we explore Pythonic workflow managers to streamline processing large datasets on distributed computing environments.

Modern astronomy generates vast datasets across the electromagnetic spectrum. NASA's flagship James Webb Space Telescope (JWST) provides unprecedented observations that enable deep studies of distant galaxies, cosmic structures, and other astrophysical phenomena. However, these datasets are complex and require intricate calibration and analysis pipelines to transform raw data into meaningful scientific insights.

We will discuss the development and deployment of Pythonic tools, including snakemake and pixi, to construct modular, parallelized workflows for data reduction and analysis. Attendees will learn how these tools automate complex processing steps, optimize performance in distributed computing environments, and ensure reproducibility. Using real-world examples, we will illustrate how these workflows simplify the journey from raw data to actionable scientific insights.

PyData: PyData & Scientific Libraries Stack
Palladium
17:50
30min
The earth is no longer flat - introducing support for spherical geometries in Spherely and GeoPandas
Joris Van den Bossche

The geometries in GeoPandas, using the Shapely library, are assumed to be in projected coordinates on a flat plane. While this approximation is often just fine, for global data this runs into its limitations. This presentation introduces spherely, a Python library for working with vector geometries on the sphere, and its integration into GeoPandas.

PyData: PyData & Scientific Libraries Stack
Titanium3
17:50
30min
🦀 Rüstzeit: Asynchronous Concurrency in Python & Rust
Jamie Coombes

Many Python developers are enhancing their Rust knowledge and want to take the next step in translating their understanding of advanced concepts like asynchronous programming.

In this talk, I'll help you take that step by juxtaposing Python's asyncio with Rust's async ecosystems, tokio and async-std. Through real-world examples and insights from conversations with graingert, co-author of Python's Anyio, we'll explore how each language approaches asynchronous execution, highlighting similarities and differences in syntax, performance, and ecosystem support.

This talk aims to persuade you that by leveraging Rust's powerful type system and compiler guarantees, we can build fast, reliable async code that's less prone to race conditions and concurrency bugs. Whether you're a Pythonista venturing into Rust or a Rustacean curious about Python's concurrency model, this session will provide practical insights to help you navigate async programming across both languages.

Welcome to Rüstzeit: Prepare to navigate async programming across both ecosystems.

General: Rust
Helium3
18:30
18:30
50min
Lightning Talks
Zeiss Plenary (Spectrum)
09:00
09:00
5min
Announcements
Zeiss Plenary (Spectrum)
09:00
180min
Mini-Pythonistas: Coding, Experimenting, and Exploring with Zümi!
Dr. Marisa Mohr, Anna-Lena Popkes, Hannah Hepke, Daniel Hieber

Please note, this is a children's workshop. Recommended age 10-16 years. Experienced use of keyboard and mouse, first words in English (for programming) are required. //

Welcome, mini-Pythonistas! In this workshop, we’ll dive into the world of Zümi, a programmable car that’s much more than just wheels and motors. With built-in sensors, lights, and a camera, Zümi can learn to recognize colors, respond to gestures, and even identify faces — all with your help!

PyData: Embedded Systems & Robotics
Carbonium
09:05
09:05
45min
Chasing the Dark Universe with Euclid and Python: Unveiling the Secrets of the Cosmos
Guadalupe Canas Herrera

The ESA Euclid mission, launched in July 2023, is on a quest to unravel the mysteries of dark energy and dark matter: the enigmatic components that make up 95% of the Universe. By mapping one-third of the sky with unprecedented precision, Euclid is building the largest 3D map of the cosmos.

This talk explores how cosmologists bridge theory and and Euclid observation to reveal the hidden nature of dark energy and the dark matter. We will delve into the challenges of cosmological inference, where advanced statistical methods and Python-based pipelines compare theoretical models against Euclid's vast datasets, and we will explain how Bayesian inference, machine learning, and state-of-the-art simulations are revolutionizing our understanding of the cosmos.

Keynote
Zeiss Plenary (Spectrum)
09:50
09:50
25min
Coffee Break
Zeiss Plenary (Spectrum)
09:50
25min
Coffee Break
Titanium3
09:50
25min
Coffee Break
Helium3
09:50
25min
Coffee Break
Platinum3
09:50
25min
Coffee Break
Europium2
09:50
25min
Coffee Break
Hassium
09:50
25min
Coffee Break
Palladium
09:50
25min
Coffee Break
Ferrum
09:50
25min
Coffee Break
Dynamicum
10:15
10:15
30min
Soon revealed!
Platinum3
10:15
30min
Soon revealed!
Europium2
10:15
30min
Algorithmic Music Composition With Python
Hendrik Niemeyer

Computers have long been an integral part of creating music. Virtual instruments and digital audio workstations make creating music easy and accessible. But how do programming languages and especially Python fit into this? Python can serve as a tool for creating musical notation
and MIDI files.

Throughout the session, you’ll learn how to:

  • Use Python to create melodies, harmonies, and rhythms.
  • Generate music based on rules, randomness, and mathematical principles.
  • Visualize and export your compositions as MIDI and sheet music.

By the end of the talk, you’ll have a clear understanding of how to turn simple algorithms into expressive musical works.

PyCon: Python Language & Ecosystem
Zeiss Plenary (Spectrum)
10:15
30min
Design, Generate, Deploy: Contract-First with FastAPI
Dr. Evelyne Groen, Kateryna Budzyak

This talk explores a contract-first approach to API development using the OpenAPI generator, a powerful tool for automating API generation from a standardized specification. We will cover (1) what would you need to run to have a standard implementation of the FastAPI endpoints and data models; (2) how to customize the mustache templates that are used to generate the API stubs; (3) share some ideas how to customize the CLI and (4) how to maintain the contract and how to handle breaking changes to the contract. We will close the session with a discussion of the challenges of implementing the OpenAPI generator.

PyCon: MLOps & DevOps
Titanium3
10:15
30min
Multi-tenant Conversational Analytics
Rodel van Rooijen

Ever wondered how to use GenAI to enable self-service analytics through prompting? In this talk, I will share my experience of building a multi-tenant conversational analytics set-up that is built into a Software-as-a-Service (SaaS) platform. This talk is intended for AI engineers, data scientists, software engineers and anyone interested in using GenAI to power conversational analytics using open-source tools.

I will discuss the challenges faced in designing and implementing, as well as the lessons learned along the way. We'll answer questions such as, why offer analytics through prompting? Why multi-tenancy and makes it so difficult? How to build it into an existing product? What makes open-source the preferred choice over proprietary solutions? What could the implications be for the analytics field?

PyData: Natural Language Processing & Audio (incl. Generative AI NLP)
Palladium
10:15
90min
Power up your Polars code with Polars extention
Cheuk Ting Ho

While Polars is written in Rust and has the advantages of speed and multi-threaded functionalities., everything will slow down if a Python function needs to be applied to the DataFrame. To avoid that, a Polar extension can be used to solve the problem. In this workshop, we will look at how to do it.

PyData: Data Handling & Engineering
Ferrum
10:15
30min
Scaling Python: An End-to-End ML Pipeline for ISS Anomaly Detection with Kubeflow
Christian Geier, Henrik Sebastian Steude

Building and deploying scalable, reproducible machine learning pipelines can be challenging, especially when working with orchestration tools like Slurm or Kubernetes. In this talk, we demonstrate how to create an end-to-end ML pipeline for anomaly detection in International Space Station (ISS) telemetry data using only Python code.

We show how Kubeflow Pipelines, MLFlow, and other open-source tools enable the seamless orchestration of critical steps: distributed preprocessing with Dask, hyperparameter optimization with Katib, distributed training with PyTorch Operator, experiment tracking and monitoring with MLFlow, and scalable model serving with KServe. All these steps are integrated into a holistic Kubeflow pipeline.

By leveraging Kubeflow's Python SDK, we simplify the complexities of Kubernetes configurations while achieving scalable, maintainable, and reproducible pipelines. This session provides practical insights, real-world challenges, and best practices, demonstrating how Python-first workflows empower data scientists to focus on machine learning development rather than infrastructure.

PyCon: MLOps & DevOps
Hassium
10:15
90min
The future of AI training is federated
Chong Shen Ng

Since it’s introduction in 2016, Federated Learning (FL) has become a key paradigm to AI models in scenarios when training data cannot leave its source. This applies in many industrial settings where centralizing data is challenging due to a combination of reasons, including but not limited to privacy, legal, and logistics.

The main focus of this tutorial is to introduce an alternative approach to training AI models that is straightforward and accessible. We’ll walk you through the basics of an FL system, how to iterate on your workflow and code in a research setting, and finally deploy your code to a production environment. You will learn all of these approaches using a real-world application based on open-sourced datasets, and the open-source federated AI framework, Flower, which is written in Python and designed for Python users. Throughout the tutorial, you’ll have access to hands-on open-sourced code examples to follow along.

PyData: Machine Learning & Deep Learning & Statistics
Dynamicum
10:15
30min
Why Don’t Customers Want My Free Goods? – Why Forecasting Models Don’t Answer 'What If' Questions
Matthias Binder

Forecasting and causal inference are distinct but fundamental tasks in data science. While forecasting predicts future outcomes based on history, causal inference explores the "why" behind those outcomes and helps simulate "what if" scenarios. Confusing the two can lead to misleading results.

At Blue Yonder, we encountered a case where a customer's forecasting model predicted demand accurately based on price. However, when they used the model for simulations to explore "what if" scenarios, the results were counterintuitive: lower prices led to lower demand. I will share how we resolved this issue and emphasize the importance of incorporating causal thinking when addressing questions like, "Why did this happen?" or "What if I do X?"

In this talk, I’ll show how to identify common pitfalls, like confounders, when integrating causal inference into forecasting workflows. We’ll also explore Bayesian models, powered by Markov Chain Monte Carlo (MCMC) methods, to bridge the gap between forecasting and causality using the PyMC library on practical examples.

By the end of this talk, you’ll learn how to:

  • Consider causal reasoning when building models that forecast well but can also be used for interventions and "what if" scenarios.
  • Visualize causal hypotheses with Directed Acyclic Graphs (DAGs) to understand relationships.
  • Leverage PyMC to build Bayesian models for testing causal hypotheses and answering "what if" questions.
PyData: Machine Learning & Deep Learning & Statistics
Helium3
10:55
10:55
30min
Soon revealed!
Platinum3
10:55
30min
Soon revealed!
Europium2
10:55
30min
Navigating the Security Maze: An Interactive Adventure
Clemens Hübner

How to integrate security into a software development project? Without jeopardizing timeline or budget? You decide!
This interactive session covers crucial decisions for software security, and the audience decides how the story ends...

PyCon: Security
Palladium
10:55
30min
Oh, no! Users love my GenAI-Prototype and want to use it more.
Thomas Prexl, Frank Rust

Demos and prototypes for generative AI (GenAI) projects can be quickly created with tools like Streamlit, offering impressive results for users within hours. However, scaling these solutions from prototypes to robust systems introduces significant challenges. As user demand grows, hacks and workarounds in tools like Streamlit lead to unreliability and debugging frustrations. This talk explores the journey of overcoming these obstacles, evolving to a stable tech stack with Qdrant, Postgres, Litellm, FastAPI, and Streamlit. Aimed at beginners in GenAI, it highlights key lessons.

PyCon: MLOps & DevOps
Helium3
10:55
30min
Outgrowing your node? Zero stress scaling with cuPyNumeric.
Irina Demeshko, Quynh L. Nguyen

Many data and simulation scientists use NumPy for its ease of use and good performance on CPU. This approach works well for single-node tasks, but scaling to handle larger datasets or more resource-intensive computations introduces significant challenges. Not to mention, using GPUs requires another level of complexity. We present the cuPyNumeric library, which gives developers the same familiar NumPy interface, but seamlessly distributes work across CPUs and GPUs.
In this talk we showcase the productivity and performance of cuPyNumeric library on one of the user's examples covering some detail on its implementation.

PyCon: Programming & Software Engineering
Hassium
10:55
30min
Safeguard your precious API endpoints built on FastAPI using OAuth 2.0
Semona Igama

Is implementing authorization on your API endpoints an afterthought? Who should have access to your API endpoints? Is it secure? This talk covers using OAuth 2.0 to secure API endpoints built on FastAPI following industry-recognized best practices. Come on a journey with me from taking your API endpoints to being functional AND secure. When you follow secure identity standards, you’ll be equipped with a deeper understanding of the critical need for authorization.

PyCon: Security
Zeiss Plenary (Spectrum)
10:55
30min
Serverless Orchestration: Exploring the Future of Workflow Automation
Tim Bossenmaier

Orchestration is a typical challenge in the data engineering world. Scheduling your data transformation jobs via CRON-jobs is cumbersome and error-prone. Furthermore, with an increasing number of jobs to manage it gets in-oversee able. Tools like Apache Airflow, Dagster, Luigi, and Prefect are known for addressing these challenges but often require additional resources or investment. With the advent of serverless orchestration tools, many of these disadvantages are mitigated, offering a more streamlined and cost-effective solution.

This session provides a comprehensive overview of combining serverless architecture with orchestration. We will start by defining the core concepts of orchestration and serverless technologies and discuss the benefits of integrating them. The talk will then analyze solutions available in the cloud vendor space. Attendees will leave with a well-rounded understanding of the tools and strategies available in serverless orchestration.

PyCon: Programming & Software Engineering
Titanium3
11:35
11:35
30min
Beyond Alembic and Django Migrations
Rotem Tamir

ORMs like Django and SQLAlchemy have become indispensable in Python development, simplifying the interaction between applications and databases. Yet, their built-in schema migration tools often fall short in projects that require advanced database features or robust CI/CD integration.

In this talk, we’ll explore how you can go beyond the limitations of your ORM’s migration tool. Using Atlas—a language-agnostic schema management tool—as a case study, we’ll demonstrate how Python developers can automate migration planning, leverage advanced database features, and seamlessly integrate database changes into modern CI/CD pipelines.

PyCon: Django & Web
Hassium
11:35
30min
Bias Meets Bayes: A Bayesian Perspective on Improving Model Fairness
Vince Nelidov

Bias in machine learning models remains a pressing issue, often disproportionately affecting the most vulnerable groups in society. This talk introduces a Bayesian perspective to effectively tackle these challenges, focusing on improving fairness by modeling and addressing bias directly.
You will learn about the interplay between uncertainty, equity, and predictive accuracy, while gaining actionable insights to improve fairness in diverse applications. Using a practical example of a risk-scoring model trained on data with underrepresented minority groups, I will showcase how Bayesian methods compare to traditional techniques, demonstrating their unique potential to mitigate bias while maintaining performance.

PyData: Machine Learning & Deep Learning & Statistics
Europium2
11:35
45min
Bridging the gap: unlocking SAP data for data lakes with Python and PySpark via SAP Datasphere
Rostislaw Krassow

SAP's data often remains locked away, hindering the creation of a complete data picture. This talk presents a hands-on proof of concept leveraging SAP Datasphere, Python and PySpark to bridge an Azure-based, data mesh-inspired open data lake with a centralized SAP BI environment.

This presentation will delve into the architecture of SAP Datasphere and its integration interfaces with Python. It will explore network integration, authentication, authorization and resource management options, as well as data integration patterns. The presentation will summarize the evaluated features and limitations discovered during the PoC.

PyData: Data Handling & Engineering
Helium3
11:35
45min
Going Global: Taking code from research to operational open ecosystem for AI weather forecasting
Jesper Dramsch

When I was hired as a Scientist for Machine Learning, experts said ML would never work in weather forecasting. Nowadays, I get to contribute to Anemoi, a full-featured ML weather forecasting framework used by international weather agencies to research, build, and scale AI weather forecasting models.

The project started out as a curiosity by my colleagues and soon scaled as a result of its initial success. As machine learning stories go, this is a story of change, adaptation and making things work.

In this talk, I'll share some practical lessons: how we evolved from a mono-package with four people working on it to multiple open-source packages with 40+ internal and external collaborators. Specifically, how we managed the explosion of over 300 config options without losing all of our sanity, building a separation of packages that works for both researchers and operations teams, as well as CI/CD and testing that constrains how many bugs we can introduce in a given day. You'll learn concrete patterns for growing Python packaging for ML systems, and balancing research flexibility with production stability. As a bonus, I'll sprinkle in anecdotes where LLMs like chatGPT and Copilot massively failed at facilitating this evolution.

Join me for a deep dive into the real challenges of scaling ML systems - where the weather may be hard to predict, but our code doesn't have to be.

PyCon: MLOps & DevOps
Platinum3
11:35
45min
Reinventing Streamlit
Malte Klemm

Dreaming of creating sleek, interactive web apps with just Python? Streamlit is great for dashboards, but what if your needs go beyond that? Discover how Reflex.dev, a cutting-edge full-stack Python framework, lets you level up from dashboards to full-fledged web apps!

PyCon: Django & Web
Titanium3
11:35
30min
Securing Generative AI: Essential Threat Modeling Techniques
Elizaveta Zinovyeva

Generative AI development introduces unique security challenges that traditional methods often overlook. This talk explores practical threat modeling techniques tailored for AI practitioners, focusing on real-world scenarios encountered in daily development. Through relatable examples and demonstrations, attendees will learn to identify and mitigate common vulnerabilities in AI systems. The session covers user-friendly security tools and best practices specifically designed for AI development. By the end, participants will have practical strategies to enhance the security of their AI applications, regardless of their prior security expertise.

PyData: Generative AI
Palladium
11:35
45min
They are not unit tests: a survey of unit-testing anti-patterns
Stanislav Zmiev

The entire industry approves of unit testing but almost no one can fully agree on how to do it correctly, or even on what unit tests are. This results in unit tests often being associated with slower development cycle and an overall less enjoyable workflow. I'll show you how testing turns into hell in real enterprises with the most common anti-patterns and then I'll show you that most of them are avoidable with modern tooling like mutation testing, snapshot testing, dirty-equals, and many more. We'll discuss how to make tests speed up your development and make refactoring easy.

PyCon: Testing
Zeiss Plenary (Spectrum)
12:20
12:20
60min
Lunch Break
Zeiss Plenary (Spectrum)
12:20
60min
Lunch Break
Titanium3
12:20
60min
Lunch Break
Helium3
12:20
60min
Lunch Break
Platinum3
12:20
60min
Lunch Break
Europium2
12:20
60min
Lunch Break
Hassium
12:20
60min
Lunch Break
Palladium
12:20
60min
Lunch Break
Ferrum
12:20
60min
Lunch Break
Dynamicum
13:20
13:20
5min
Announcements
Zeiss Plenary (Spectrum)
13:25
13:25
45min
Machine Learning Models in a Dynamic Environment
Isabel Drost-Fromm

"We've only tested the happy path - now users are finding all sorts of creative ways to break the app."

What is already a cause for headaches in traditional software engineering turns into a large challenge when the application is based on machine learning models: Data distribution may change from training phase to deployment. Even worse, humans interacting with the model may adjust their behaviour to the model making the gap between original training environment and deployment even larger. When deployed in a public environment the model may be exposed to users trying to game the system. When re-trained it may be exposed to users trying to poison the pool of training data.

We will take a tour of historic cases of models being gamed: What are the lessons we learnt a long time ago building e-mail spam filters? What happened when high search engine rankings started to be linked to monetary income? How can personalization and targeted advertising be exploited to influence public discourse?

“… it should be clear that improvements in communication tend to divide mankind …” by Harold Innis in Changing Concepts of Time

This keynote will turn interactive engaging the audience in sharing their stories on users playing interesting games with deployed models - including counter moves rolled out.

If we are to learn from IT security experience, one important ingredient to address these issues is a combination of collaboration and transparency - across organisations.

Keynote
Zeiss Plenary (Spectrum)
14:20
14:20
60min
Panel Discussion
Zeiss Plenary (Spectrum)
14:20
30min
Soon revealed!
Platinum3
14:20
90min
BayBE: A Bayesian Back End for Experimental Planning in the Low-To-No-Data Regime
Martin Fitzner, Alexander Hopp, Adrian Šošić

From coffee machine settings to chemical reactions to website AB testing - iterative make-test-learn cycles are ubiquitous. The Bayesian Back End (BayBE) is an open-source experimental planner enabling users to smartly navigate such black-box optimization problems in iterative settings. This tutorial will i) introduce the core concepts enabled by combining Bayesian optimization and machine learning; ii) explain our software design choices, robust tests and open-source libraries this is built on; and iii) provide a short practical hands-on session.

PyData: PyData & Scientific Libraries Stack
Dynamicum
14:20
30min
Duplicate Code Dilemma: Unlocking Automation with Open Source!
Raana Saheb-Nassagh

"Don't Repeat Yourself" – a phrase that we have all heard many times. In this talk, we will have an overview how to deal with code duplication and how open-source template libraries such as Copier and Cookiecutter can assist us in managing similarly structured repositories. Furthermore, we will explore how code updates can be automated with the help of the open-source library Renovate Bot. By the end of this session, you will gain insights into these solutions while also questioning whether they truly eliminate repetition or merely contribute to another cycle of automation.

PyCon: Programming & Software Engineering
Titanium3
14:20
30min
Machine Reasoning and System 2 Thinking
Andy Kitchen

Raw large language models struggle with complex reasoning. New techniques have
emerged that allow these models to spend more time thinking before giving an answer.
Direct token sampling can be seen as system-1 thinking and explicit step-by-step
reasoning as system-2. How can this reasoning ability be improved and what is the future?

PyData: Generative AI
Palladium
14:20
30min
Oh my license! – Achieving order by automation in the license chaos of your dependencies
Paul Müller

License issues can haunt you at night.
You spend days, weeks, and months developing beautiful software.
But then it happens.
You realize that an essential dependency is GPL-3.0 licensed.

All your code is now infected with this license.
Now you are forced to either:
1. Rewrite all parts relying on the other library
2. Open-source your codebase under the GPL-3.0 license

How could this have been avoided?

Join the talk and find out!
First, we’ll give you a brief introduction to different software licenses and their implications.
Second, we’ll show you how to automate your license checking using open-source software.

PyCon: Programming & Software Engineering
Europium2
14:20
30min
Security for Devs
Christian Barra

Two truths and a lie:

You have been hacked and you don’t know yet
You haven’t been hacked because it’s not yet convenient (for the attackers)
You think your application is secure

Security is hard. Hard to measure and is always a catch-up game.

Join this talk if you are interested in understanding a bit more about modern security and how it goes way beyond the code of your app.

PyCon: Security
Helium3
14:20
90min
Unlocking the Predictive Power of Relational Data with Automated Feature Engineering
Alexander Uhlig

Relational data can be a goldmine for classical Machine Learning applications — yet extracting useful features from multiple tables, time windows, and primary-foreign key relationships is notoriously difficult. In this code tutorial, we’ll use the H&M Fashion dataset to demonstrate how getML FastProp automates feature engineering for both classification (churn prediction) and regression (sales prediction) with minimal manual effort, outperforming both Relational Deep Learning and a skilled human data scientist according to the RelBench leaderboard.

This code tutorial is perfect for data scientists looking to leverage their relational and time-series data data effectively for any kind of predictive analytics applications.

PyData: Machine Learning & Deep Learning & Statistics
Ferrum
14:20
30min
Writing reliable software while depending on hazardous APIs
Romain Dorgueil

As we develop business critical software, we often need to rely on external APIs to get the job done. And all services are not born equal: although the ideal world would provide well operated APIs with over-met service levels, the real world is usually way worse than that. Timeouts, HTTP errors, cascading failures, unclear or changing contracts, approximate protocol implementations ... And even the oh-so-human bad faith while trying to pinpoint the root cause... Most of us have written hacks to handle commonly seen failures, from the quick and dirty implementation to well thought resilience patterns implementation, but this is usually hard to do correctly, and rarely a business priority to invest the correct amount of time and money on the topic. We'll present the options, both including direct dependencies (not framework dependant, although some families can emerge (async/sync ...)) and including a service/proxy based approach.

PyCon: MLOps & DevOps
Hassium
15:00
15:00
45min
Accuracy Is Not Enough: Building Trustworthy AI with Conformal Prediction
Chris Aivazidis

Building a good scoring model is just the beginning. In the age of critical AI applications, understanding and quantifying uncertainty is as crucial as achieving high accuracy. This talk highlights conformal prediction as the definitive approach to both uncertainty quantification and probability calibration, two extremely important topics in Deep Learning and Machine Learning. We’ll explore its theoretical underpinnings, practical implementations using TorchCP, and transformative impact on safety-critical fields like healthcare, robotics, and NLP. Whether you're building predictive systems or deploying AI in high-stakes environments, this session will provide actionable insights to level up your modelling skills for robust decision-making.

PyData: Machine Learning & Deep Learning & Statistics
Platinum3
15:00
45min
Decoding Topics: A Comparative Analysis of Python’s Leading Topic Modeling Libraries Using Climate C
Dr. Lisa Andreevna Chalaguine

Topic modelling has come a long way, evolving from traditional statistical methods to leveraging advanced embeddings and neural networks. Python’s diverse library ecosystem includes tools like Latent Dirichlet Allocation (LDA) using gensim, Top2Vec, BERTopic, and Contextualized Topic Models (CTM). This talk evaluates these popular approaches using a dataset of UK climate change policies, considering use cases relevant to organisations like DEFRA (Department for Environment, Food & Rural Affairs). The analysis explores real-time integration, dynamic topic modelling over time, adding new documents, and retrieving similar ones. Attendees will learn the strengths, limitations, and practical applications of each library to make informed decisions for their projects.

PyData: Natural Language Processing & Audio (incl. Generative AI NLP)
Hassium
15:00
45min
Distributed file-systems made easy with Python's fsspec
Einat Orr

The cloud native revolution has impacted all aspects of engineering, and data engineering is not exempt. One of the ongoing challenges in the data engineering world remains the local and distributed cloud native storage. In this talk we’ll explore working with distributed file systems in Python, through an intro to fsspec: a popular python library that is well-positioned to address the growing challenge of interacting with storage systems of different kinds in a consistent way.

In this talk we’ll show hands-on examples of working with fsspec with some of the most popular data tools in the Python community: Pandas, Tensorflow and PyArrow. We’ll demonstrate a real world implementation of fsspec and how it provides easy extensibility through open source tooling.

You’ll come away from this session with a better understanding for how to implement and extend fsspec to work with different cloud native storage systems.

PyData: Data Handling & Engineering
Titanium3
15:00
45min
Quiet on Set: Building an On-Air Sign with Open Source Technologies
Danica Fine

Learn how to build a custom On-Air sign using Apache Kafka®, Apache Flink®, and Apache Iceberg™! See how to capture events like Zoom meetings and camera usage with Python, process data with FlinkSQL, analyze trends using Iceberg, and bring it all together with a practical IoT project that easily scales out.

General: Infrastructure - Hardware & Cloud
Europium2
15:00
45min
Scraping LEGO for Fun: A Hacky Dive into Dynamic Data Extraction
Peter Lodri

Unlock the full potential of modern web scraping by combining Python, Scrapy, and Playwright to extract data from dynamic, JavaScript-heavy sites—exemplified by LEGO product pages. This talk introduces Model Context Protocol (MCP) servers for orchestrating advanced data fetching, refining CSS selectors, and integrating Large Language Models for automated code suggestions. Learn how to scale ethically, handle concurrency, and respect site policies, while maintaining flexible, maintainable pipelines for diverse use cases from research to robotics.

PyData: Data Handling & Engineering
Helium3
15:00
45min
Securing RAG Pipelines with Fine Grained Authorization
Sohan Maheshwar

Using LLMs and AI in your Enterprise? Make sure you build Fine Grained Authorization to ensure your LLMs access only the data they are authorized to.

This talk will show how you can build Relationship Based Access Control (ReBAC) for fine-grained authorization for your RAG pipelines. The talk also includes a demo using Pinecone, Langchain, OpenAI, and SpiceDB.

PyData: Generative AI
Palladium
15:45
15:45
30min
Coffee Break
Zeiss Plenary (Spectrum)
15:45
30min
Coffee Break
Titanium3
15:45
30min
Coffee Break
Helium3
15:45
30min
Coffee Break
Platinum3
15:45
30min
Coffee Break
Europium2
15:45
30min
Coffee Break
Hassium
15:45
30min
Coffee Break
Palladium
15:50
15:50
25min
Coffee Break
Ferrum
15:50
25min
Coffee Break
Dynamicum
16:15
16:15
30min
Soon revealed!
Helium3
16:15
30min
Soon revealed!
Platinum3
16:15
90min
AI Agents of Change: Creating, Reflecting, and Monetizing
Paloma Oliveira

Create, reflect, and earn—with purpose. In this workshop, you’ll not only build your own AI agent but also confront the ethical questions it raises, from its impact on jobs to its potential for social good. Together, we’ll explore how to harness AI for empowerment while uncovering pathways to turn your skills into meaningful value.

This workshop is designed to equip Python enthusiasts with the tools to create their own AI agent while fostering a deeper understanding of the societal implications of this technology. Through hands-on learning, collaborative discussions, and practical monetization strategies, you’ll leave with more than just code—you’ll gain a vision of how AI can be wielded responsibly and profitably.

PyData: Generative AI
Dynamicum
16:15
30min
Analyze data easily with duckdb - and the implications on data architectures
Matthias Niehoff

duckdb is increasingly becoming a universal tool for accessing and analyzing data. In this talk I will show with slides and live demo what duckdb is capable of and will dive deeper in how it will influence modern data architectures.

PyData: Data Handling & Engineering
Zeiss Plenary (Spectrum)
16:15
30min
Building a Self-Hosted MLOps Platform with Kubernetes
Josef Nagelschmidt

Many managed MLOps platforms, while convenient, often fall short in providing flexibility, requiring complex integrations, and causing vendor lock-in. In this talk, we’ll share our experience transitioning from managed MLOps tools to a self-hosted solution built on Kubernetes. We’ll focus on how we leveraged open-source tools like Feast, MLflow, and Ray to build a more flexible, scalable, and customizable platform that is now in use at Rewe Digital. By migrating to this self-hosted architecture, we gained greater control over our ML pipelines, reduced our dependency on third-party services, and created a more adaptable infrastructure for our ML workloads.

PyCon: MLOps & DevOps
Europium2
16:15
30min
Conquering the Queue: Lessons from processing one billion Celery tasks
Daniel Hepper

At Userlike, Celery is the backbone of our application, orchestrating over a 100 million tasks per month. In this talk, I’ll share real-world insights into scaling Celery, optimizing performance, avoiding common pitfalls, handling failures, and building a resilient architecture.

PyCon: Django & Web
Hassium
16:15
30min
Learnings from migrating a Flask app to FastAPI
Orell Garten

FastAPI has been constantly growing in popularity during the last years. A lot of this growth is driven by its relative simplicity and ease-of-use. In this talk, we'll discuss some practical insights into building a FastAPI application, based on my experience of migrating an existing Flask prototype to FastAPI.

We'll explore how FastAPI's core features like Pydantic integration and dependency injection can improve API development, while also talking about the drawbacks of FastAPI.

PyCon: Django & Web
Titanium3
16:15
30min
Streaming at 30,000 Feet: A Real-Time Journey from APIs to Stream Processing
Felix Leon Buck

Traditional API architectures face significant challenges in environments where repetitive and frequent requests are required to retrieve data updates. These request-response mechanisms introduce latency, as clients must continually query the server to check for changes, often receiving redundant or outdated information. This approach leads to increased network overhead, inefficient use of server resources and diminished scalability as the number of clients or requests grows. Additionally, frequent requests expand the attack surface, requiring security measures to mitigate risks such as (un-)authorised access, rate limiting and query sanitisation. Managing all of these inherent problem results in increasingly complex systems to maintain and improve while putting considerable implementation effort onto the customer.
Join to find out how transitioning to a streaming architecture can address these issues by providing proactive, event-based data delivery, reducing latency, minimising redundant processing, enhancing scalability and simplifying security management.

PyCon: Programming & Software Engineering
Palladium
16:15
90min
pytest - simple, rapid and fun testing with Python
Florian Bruhin

The pytest tool offers a rapid and simple way to write tests for your Python code. This training gives an introduction with exercises to some distinguishing features, such as its assertions, marks and fixtures.

Despite its simplicity, pytest is incredibly flexible and configurable. We'll look at various configuration options as well as the plugin ecosystem around pytest.

PyCon: Testing
Ferrum
16:55
16:55
45min
Soon revealed!
Helium3
16:55
45min
A11y Need Is Love (But Accessible Docs Help Too)
Smera Goel

Accessible documentation benefits everyone, from developers to end users. Using the PyData Sphinx Theme as a case study, this talk dives into common accessibility barriers in documentation websites like low contrast colors, missing focus states, etc. and practical ways to address them. Learn about accessibility improvements and take part in a live accessibility audit to see how small changes can make a big difference.

PyData: PyData & Scientific Libraries Stack
Platinum3
16:55
45min
Challenges and Lessons Learned While Building a Real-Time Lakehouse using Apache Iceberg and Kafka
Jonas Böer, Elena Ouro Paz

How do you build a large-scale data lakehouse architecture that makes data available for business analytics in real time, while being more cost-effective, more flexible and faster than the previous proprietary solution? With Python, Kafka and Iceberg, of course!

We built a large-scale data lakehouse based on Apache Iceberg for the Schwarz Group, Europe's largest retailer. The system collects business data from thousands of stores, warehouses and offices across Europe.

In this talk, we will present our architecture, the challenges we faced, and how Apache Iceberg is shaping up to be the data lakehouse format of the future.

PyData: Data Handling & Engineering
Zeiss Plenary (Spectrum)
16:55
45min
From Algorithm to Action: Building a DIY Distributed Trading Platform with Open Source
Eugen Geist

In this talk, we'll explore how you can implement your own distributed system for algorithmic trading leveraging the power of open source without being dependent on trading bot providers.

We will discuss different challenges occurring in HFT inter alia processing massive amounts of data with low latency and reliable risk control and how to solve them. Furthermore we will touch on the topic of regulatory requirements in trading.

These challenges will be addressed through a distributed system implemented in Python, utilizing Kafka for real-time data streaming, PostgreSQL for persistent storage and DuckDB for high-performance analysis. We will examine approaches to decouple the components to re-use and scale them across different markets.

Cryptocurrency markets are used as a proving ground for the PoC due to easy availability for everyone.

PyCon: Programming & Software Engineering
Europium2
16:55
45min
From LIKE to Love: Adding Proper Search to Your Django Apps
Kacper Łukawski

Is your Django application still relying on SQL LIKE queries for search? In this talk, we'll explore why basic text matching falls short of modern user expectations and how to implement proper search functionality without complexity. We'll introduce django-semantic-search, a practical package that bridges the gap between Django's ORM and powerful semantic search capabilities. Through practical code examples and real-world use cases, you'll learn how to enhance your application's search experience from basic keyword matching to understanding user intent. Whether you're building a content platform, e-commerce site, or internal tool, you'll walk away with concrete steps to implement production-ready search that your users will actually enjoy using.

PyCon: Django & Web
Hassium
16:55
45min
Lessons learned in bringing a RAG chatbot with access to 50k+ diverse documents to production
Bernhard Schäfer

Retrieval-Augmented Generation (RAG) chatbots are a key use case of GenAI in organizations, allowing users to conveniently access and query internal company data. A first RAG prototype can often be created in a matter of days. But why are the majority of prototypes still in the pilot stage? [1]

In this talk we share our insights from developing a production-grade chatbot at Merck. Our RAG chatbot for R&D experts accesses over 50,000 documents across numerous SharePoint sites and other sources. We identified three key success factors:
1. Developing a robust data pipeline that syncs documents from source systems and that handles enterprise features such as replicating user permissions.
2. Establishing a comprehensive evaluation framework with a clear optimization metric.
3. Driving adoption through an onboarding training and ongoing user engagement, such as regular office hours.

We think that many of these lessons are broadly applicable to RAG chatbots, making this talk valuable for practitioners aiming to implement GenAI solutions in business contexts.

PyData: Generative AI
Titanium3
16:55
45min
Transformers for Game Log Data
Fabian Hadiji

The Transformer architecture, originally designed for machine translation, has revolutionized deep learning with applications in natural language processing, computer vision, and time series forecasting. Recently, its capabilities have extended to sequence-to-sequence tasks involving log data, such as telemetric event data from computer games.

This talk demonstrates how to apply a Transformer-based model to game log data, showcasing its potential for sequence prediction and representation learning. Attendees will gain insights into implementing a simple Transformer in Python, optimizing it through hyperparameter tuning, architectural adjustments, and defining an appropriate vocabulary for game logs.

Real-world applications, including clustering and user level predictions, will be explored using a dataset of over 175 million events from an MMORPG. The talk will conclude with a discussion of the model's performance, computational requirements, and future opportunities for this approach.

PyData: Machine Learning & Deep Learning & Statistics
Palladium
17:50
17:50
90min
Lightning Talk
Zeiss Plenary (Spectrum)
09:00
09:00
5min
Announcements
Zeiss Plenary (Spectrum)
09:05
09:05
45min
The Future of AI: Building the Most Impactful Technology Together
Leandro von Werra

In this talk, Leandro will examine the significant benefits of combining open source principles with artificial intelligence. He will walk through the need for openness in language models to build trust, maintain control, mitigate biases, and achieve true alignment and show how open models are rapidly gaining momentum in the AI landscape, challenging proprietary systems through community-driven innovation. Finally, he will then talk about emerging trends and what the community needs to build for the next generation of models.

Keynote
Zeiss Plenary (Spectrum)
09:50
09:50
25min
Coffee Break
Zeiss Plenary (Spectrum)
09:50
25min
Coffee Break
Titanium3
09:50
25min
Coffee Break
Helium3
09:50
25min
Coffee Break
Platinum3
09:50
25min
Coffee Break
Europium2
09:50
25min
Coffee Break
Hassium
09:50
25min
Coffee Break
Palladium
09:50
25min
Coffee Break
Ferrum
09:50
25min
Coffee Break
Dynamicum
10:15
10:15
30min
Soon revealed!
Platinum3
10:15
30min
Soon revealed!
Europium2
10:15
90min
Agentic AI: Build a Multi-Agent Application with CrewAI
Alessandro Romano

This hands-on tutorial will dive into the fundamentals of building multi-agent systems using the CrewAI Python library. Starting from the basics, we’ll cover key concepts, explore advanced features, and guide you step-by-step through building a complete application from scratch. We’ll discuss implementing guardrails, securing interactions, and preventing query injection vulnerabilities along the way.

PyData: Generative AI
Dynamicum
10:15
30min
Beyond DALL-E: Advanced Image Generation Workflows with ComfyUI
René Fa

Image generation using AI has made huge progress over the last years, and many people still think that DALL-E with a text prompt is the best way to generate images. There are well-known models like Stable Diffusion and Flux, which can be used with easy-to-use frontends like A1111 or Invoke AI, but if you want to do more complex or bleeding-edge workflows, you need something else. In this talk, I want to show you ComfyUI, an open-source node-based GUI written in Python where you can build complex pipelines that are otherwise only possible using plain code.

PyData: Computer Vision (incl. Generative AI CV)
Hassium
10:15
30min
Data as (Python) Code
Francesco Calcavecchia

In contemporary data-driven environments, the seamless integration of data into automated workflows is paramount. The reliability of automation, however, is constantly threatened by breaking changes in the source data. The Data-as-Code (DaC) paradigm address this challenge by treating data as a first-class citizen within the software development lifecycle.

PyCon: MLOps & DevOps
Zeiss Plenary (Spectrum)
10:15
30min
FastHTML vs. Streamlit - The Dashboarding Face Off
Tilman Krokotsch

In the right corner, we have the go-to dashboarding solution for showcasing ML models or visualizing data, STREAMLIT (*crowd cheers*). Simple yet powerful, it defends the throne of Python dashboarding, but have you ever tried to create complex interactions with it? Things like drill-downs or logins, can make your control flow become messy really quick (*crowd nods knowlingly*).

And in the left corner, the new contender in the arena of Python web frameworks which, according to its docs, "excels at building dashboards", FastHTML (*crowd whoops*). We will see if this is true, in the ultimate dashboarding face off (*crowd gasps*). By building the same dashboard, step by step, in both frameworks, investigate their strengths and weaknesses, we will see which framework can claim the crown.

PyCon: Django & Web
Helium3
10:15
30min
From Queries to Confidence: Ensuring SQL Reliability with Python
Anna Varzina

SQL remains a foundational component of data-driven applications, but ensuring the accuracy and reliability of SQL logic is often challenging. SQL testing can be cumbersome, time-consuming, and error-prone. However, these challenges can be addressed by leveraging the simplicity of Python's testing framework such as pytest, enabling clean, robust, and automated SQL testing.

PyCon: Testing
Titanium3
10:15
90min
The Mighty Dot - Customize Attribute Access with Descriptors
Mike Müller

Whenever you use a dot in Python you access an attribute.
While this seems a very simple operation,
behind the scenes many things can happen.
This tutorial looks into this mechanism that is regulated by descriptors.
You will learn how a descriptor works and what kind of problems it can help to
solve.
Python properties are based on descriptors and solve one type of problems.
Descriptors are more general, allow more use cases, and are more re-usable.
Descriptors are an advanced topic.
But once mastered, they provide a powerful tool to hide potentially complex
behavior behind a simple dot.

PyCon: Python Language & Ecosystem
Ferrum
10:15
30min
Towards Intelligent Monitoring: Detecting Degraded Flame Torch Nozzles
Dominik Falkner

Flame cutting is a method where metals are efficiently cut using precise control of the oxygen jet and consistent mixing of fuel gas. The condition of the nozzle is changing over time: deposits formed during the cutting process can degrade the flame quality, reducing the precision of the cut. Traditionally, nozzles suspected of wear are sent back for manual inspection, where experts evaluated the flame visually and audibly to determine whether repair or replacement is needed. This project leverages machine learning to optimize this process by analyzing acoustic emission data.

PyData: Machine Learning & Deep Learning & Statistics
Palladium
10:55
10:55
30min
Soon revealed!
Platinum3
10:55
30min
Soon revealed!
Europium2
10:55
30min
Death by a Thousand API Versions
Stanislav Zmiev

API versioning is tough, really tough. We tried multiple approaches to versioning in production and eventually ended up with a solution we love. During this talk you will look into the tradeoffs of the most popular ways to do API versioning, and I will recommend which ones are fit for which products and companies. I will also present my framework, Cadwyn, that allows you to support hundreds of API versions with ease -- based on FastAPI and inspired by Stripe's approach to API versioning.

After this session, you will understand which approach to pick for your company to make your versioning cost effective and maintainable without investing too much into it.

PyCon: Django & Web
Helium3
10:55
30min
Filling in the Gaps: When Terraform Falls Short, Python and Typer Step In
Yuliia Barabash

Not all resources in today’s cloud environments have native Terraform providers. That’s where Python’s Typer library can step in, offering a flexible, production-ready command-line interface (CLI) framework to help fill in the gaps. In this session, we’ll explore how to integrate Typer with Terraform to manage resources that fall outside Terraform’s direct purview. We’ll share a real-life example of how Typer was used alongside Terraform to automate and streamline the management of an otherwise unsupported API. You’ll learn how Terraform can invoke Python scripts—passing arguments and parameters to control complex operations—while still benefiting from Terraform’s declarative model and lifecycle management. We’ll also discuss best practices for defining resource lifecycles to ensure easy maintainability and consistency across deployments. By the end, participants will see how combining Terraform’s robust infrastructure-as-code approach with Python’s versatility and Typer’s user-friendly CLI can create a powerful, cohesive strategy for managing even the trickiest resources in production environments.

General: Infrastructure - Hardware & Cloud
Palladium
10:55
30min
High-performance dataframe-agnostic GLMs with glum
Martin Stancsics

Generalized linear models (GLMs) are interpretable, relatively quick to train, and specifying them helps the modeler understand the main effects in the data. This makes them a popular choice today to complement other machine-learning approaches. glum was conceived with the aim of offering the community an efficient, feature-rich, and Python-first GLM library with a scikit-learn-style API. More recently, we are striving to keep up with PyData community's ongoing push for dataframe-agnosticism.
While glum was originally heavily based on pandas, with the help of narwhals, we are close to being able to fit models on any dataset that the latter supports. This talk presents our experiences with achieving this goal.

PyData: PyData & Scientific Libraries Stack
Hassium
10:55
30min
How Narwhals is silently bringing pandas, Polars, DuckDB, PyArrow, and more together
Marco Gorelli

If you were writing a data science tool in 2015, you'd have ensured it supported pandas and then called it a day.

But it's not 2015 anymore, we've fast-forwarded to 2025. If you write a tool which only supports pandas, users will demand support for Polars, PyArrow, DuckDB, and so many other libraries that you'll feel like giving up.

Learn about how Narwhals allows you to write dataframe-agnostic tools which can support all of the above, with zero dependencies, low overhead, static typing, and strong backwards-compatibility promises!

PyData: PyData & Scientific Libraries Stack
Zeiss Plenary (Spectrum)
10:55
30min
Using Python to enter the world of Microcontrollers
Jens Nie

So you've happily used the Raspberry Pi for your homelab projects, of course with Python based solutions as we all do. You've been down the rabbit hole with everything about temperature and humidity measurements, energy and solar tracking, video recording and time-lapse photography, object detection and security surveillance.

You don't just buy these things of the shelve. You want to deeply understand what it takes to create such a thing, and you've been quite happy with your results so far, learned a lot.

But for many simple applications ... the power draw! Yes, it's just 5 Watts you say for using a Raspberry Pi. Not a big deal in terms of cost. But you'll always need a power adapter and a free socket.

You've heard of these guys using microcontrollers that run on batteries or even solar, for days, weeks, even months.

That's exciting, but there's also a catch. These people write code in C-like languages, they build firmware to make their projects run. And it's all bare metal! That seems very different. That'll be a steep learning curve to take ... Or is it?

Well, there's MicroPython to the rescue. Let me take you with me on a journey to make a simple microcontroller based application to read a Power Meter and send the readings over WiFi for more in depth processing somewhere else.

PyData: Embedded Systems & Robotics
Titanium3
11:35
11:35
45min
Soon revealed!
Platinum3
11:35
30min
Code & Community: The Synergy of Community Building and Task Automation
Cosima Meyer

The Python community is built on a culture of support, inclusion, and collaboration. Sustaining this welcoming environment requires intentional community-building efforts, which often involve repetitive or time-consuming tasks. These tasks, however, can be automated without compromising their value—freeing up time for meaningful human engagement.

This talk showcases my project aimed at supporting underrepresented groups in tech, specifically through building Python communities on Mastodon and Bluesky. A key part of this initiative is the "Awesome PyLadies" repository, a curated collection of PyLadies blogs and YouTube channels that celebrates their work. To enhance visibility, I created a PyLadies bot for social media. This bot automates regular posts and reposts tagged content, significantly extending their reach and fostering an engaged community.

In this session, I’ll cover:
- The role of automation in community building
- The technical architecture behind the bot
- A hands-on demo on integrating Google’s Gemini into community tools
- Upcoming features and opportunities for collaboration

By combining Python, automation, and modern AI capabilities, we can create thriving, inclusive communities that scale impact while staying true to the human-centered ethos of open source.

PyData: Natural Language Processing & Audio (incl. Generative AI NLP)
Palladium
11:35
30min
GitMLOps – How we are managing 100+ ML pipelines in AWS SageMaker
Bogdan Girman

Scaling machine learning pipelines is no small feat - especially when you’re managing over 100 of them on AWS SageMaker. In this talk, I’ll take you behind the scenes of how our team at idealo built a Git-based MLOps framework that powers millions of real-time recommendations every minute.

I’ll share the challenges we faced, the solutions we implemented, and the lessons we learned while streamlining model versioning, deployment, and monitoring. This session is packed with actionable takeaways for ML engineers, data scientists, and DevOps professionals looking to simplify their MLOps workflows and operate efficiently at scale.

Whether you’re running a handful of pipelines or preparing to scale up, this talk will equip you with the tools and strategies to tackle MLOps with confidence.

PyCon: MLOps & DevOps
Europium2
11:35
30min
Guardians of the Code: Safeguarding Machine Learning Models in a Climate Tech World
Doreen Sacker

LLMs, Machine learning and AI are everywhere, yet their security is often overlooked, leaving your systems vulnerable to serious attacks. What happens when someone tampers with your model’s input, poisons your training data, or steals your model?

In this talk, I’ll explore these risks through the lens of the OWASP Machine Learning Security Top 10 using relatable, real-world examples from the climate tech world. I’ll explain how these attacks happen, their impact, and why they matter to you as a Python developer, data scientist, or data engineer.

You’ll learn practical ways to defend your models and pipelines, ensuring they’re robust against adversarial forces. Bridging theory and practice, you'll leave equipped with insights and strategies to secure your machine learning systems, whether you’re training models or deploying them in production. By the end, you’ll have a solid understanding of the risks, a toolkit of best practices, and maybe even a new perspective on how important security is everywhere.

PyCon: MLOps & DevOps
Hassium
11:35
45min
Hands-On LLM Security: Attacks and Countermeasures You Need to Know!
Clemens Hübner, Florian Teutsch

Dive into the vulnerabilities of LLMs and learn how to prevent them
From prompt injection to data poisoning, we’ll demonstrate real-world attack scenarios and reveal essential countermeasures to safeguard your applications.

PyCon: Security
Helium3
11:35
45min
Rustifying Python: A Practical Guide to Achieving High Performance While Maintaining Observability
Max Höhl

In this session, I’ll share our journey of migrating key parts of a Python application to Rust, resulting in over 200% performance improvement.
Rather than focusing on quick Rust-to-Python integration with PyO3, this talk dives into the complexities of implementing such a migration in an enterprise environment, where reliability, scalability, and observability are crucial.
You’ll learn from our mistakes, how we identified suitable areas for Rust integration, and how we extended our observability tools to cover Rust components.
This session offers practical insights for improving performance and reliability in Python applications using Rust.

PyCon: Programming & Software Engineering
Titanium3
11:35
45min
Topological data analysis: How to quantify "holes" in your data and why?
Ondrej Draganov

Do you need to compare sets of points in a plane? Identify a potential cyclic event in high-dimensional time series data? Find the second or the third highest peak of a noisily sampled function? Topological data analysis (TDA) is not a universal hammer, but it might just be the 16 mm wrench for your 16 mm hex head bolt. There is no shortage of Python libraries implementing TDA methods for various settings, but navigating the options can be challanging without prior familiarity with the topic. In my talk I will demonstrate the utility of the tool with several simple examples, list various libraries used by the TDA community, and dive a bit deeper into the methods to explain what the libraries implement and how to interpret and work with the outputs.

PyData: PyData & Scientific Libraries Stack
Zeiss Plenary (Spectrum)
12:20
12:20
60min
Lunch Break
Zeiss Plenary (Spectrum)
12:20
60min
Lunch Break
Titanium3
12:20
60min
Lunch Break
Helium3
12:20
60min
Lunch Break
Platinum3
12:20
60min
Lunch Break
Europium2
12:20
60min
Lunch Break
Hassium
12:20
60min
Lunch Break
Palladium
12:20
50min
Lunch Break
Ferrum
12:20
50min
Lunch Break
Dynamicum
13:10
13:10
90min
Reinforcement Learning for Finance
Dr. Yves J. Hilpisch

Reinforcement Learning and related algorithms, such as Deep Q-Learning (DQL), have led to major breakthroughs in different fields. DQL, for example, is at the core of the AIs developed by DeepMind that achieved superhuman levels in such complex games as Chess, Shogi, and Go ("AlphaGo", "AlphaZero"). Reinforcement Learning can also be beneficially applied to typical problems in finance, such as algorithmic trading, dynamic hedging of options, or dynamic asset allocation. The workshop addresses the problem of limited data availability in finance and solutions to it, such as synthetic data generation through GANs. It also shows how to apply the DQL algorithm to typical financial problems. The workshop is based on my new O'Reilly book "Reinforcement Learning for Finance -- A Python-based Introduction".

PyData: Machine Learning & Deep Learning & Statistics
Dynamicum
13:10
90min
What's inside the box? Building a deep learning framework from scratch.
Oleh Kostromin

Explore the inner workings of deep learning frameworks like TensorFlow and PyTorch by building your own in this workshop. We will start with the fundamental automatic differentiation mechanics and proceed to implementing more complex components like layers, modules and optimizers. This workshop is mainly designed for experienced data scientists, who want to expand their intuition about lower level framework internals.

PyData: Machine Learning & Deep Learning & Statistics
Ferrum
13:20
13:20
30min
Electify - Retrieval-Augmented Generation for Voter Information in the 2024 European Election
Christian Liedl

In general elections, voters often face the challenge of navigating complex political landscapes and extensive party manifestos. To address this, we developed Electify, an interactive application that utilizes Retrieval-Augmented Generation (RAG) to provide concise summaries of political party positions based on individual user queries. During its first roll-out for the European Election 2024, Electify attracted more than 6,000 active users. This talk will explore its development and deployment. It will focus on its technical architecture, the integration of data from party manifestos and parliamentary speeches, and the challenges of ensuring political neutrality and providing accurate replies. Additionally, we will discuss user feedback and ethical considerations, focusing on how generative AI can enhance voter information systems.

PyData: Natural Language Processing & Audio (incl. Generative AI NLP)
Helium3
13:20
30min
Extending Python with Rust, Mojo, Cuda and C and building packages
Wolf Vollprecht, Ruben Arts

We all love Python - but we especially love it for its unique ability as a glue language.

In this talk we will show a number of ways of extending Python: using Rust, C and Cython, C++, CUDA and Mojo! We will use the pixi package manager and the open source conda-forge distribution to demonstrate how to easily build custom Python extensions with these languages.

The main challenge with custom extensions is about distributing them. The new pixi build feature makes it easy to build a Python extension into a conda package as well as wheel file for PyPI.

Pixi will manage not only Python, but also the compilers and other system-level dependencies.

PyData: PyData & Scientific Libraries Stack
Titanium3
13:20
30min
From stockouts to happy customers: Proven solutions for time series forecasting in retail
Robert Haase

Time series forecasting in the retail industry is uniquely challenging: Datasets often include stockouts that censor actual demand, promotional events cause irregular demand spikes, new product launches face cold-start issues, and diverse demand patterns within an imbalanced product portfolio create modeling challenges.
In this talk, we’ll explore proven, real-world strategies and examples to address these problems. Learn how to successfully handle censored demand caused by stockouts, effectively incorporate promotional effects, and tackle the variability of diverse products using clustering and ensembling strategies. Whether you’re a seasoned data scientist or a Python developer exploring forecasting, the goal of this session is to introduce you to the key challenges in retail forecasting and equip you with actionable insights to successfully overcome them in real-life scenarios.

PyData: Machine Learning & Deep Learning & Statistics
Zeiss Plenary (Spectrum)
13:20
30min
Is your LLM any good at writing? Benchmarking on creative writing and editing tasks
Azamat Omuraliev

Many LLM benchmarks focus on reasoning and coding tasks. These are exciting tasks! But the majority of LLM usage is still in writing and editing related tasks, and there's a surprising lack of benchmarks on these.

In this talk you'll learn what it took to create a writing benchmark, and which model performs best!

PyData: Natural Language Processing & Audio (incl. Generative AI NLP)
Platinum3
13:20
30min
Responsible AI with fmeval - an open source library to evaluate LLMs
Mia Chang

The term "Responsible AI" has seen a threefold increase in search interest compared to 2020 across the globe. As developers, the questions like "How can we build large language model-enabled applications that are responsible and accountable to its users?" encountered in the conversation more often than before. And the discussion is further compounded by concerns surrounding uncertainty, bias, explainability, and other ethical considerations.

In this session, the speaker will guide you through fmeval, an open-source library designed to evaluate Large Language Models (LLMs) across a range of tasks. The library provides notebooks that you can integrate into your daily development process, enabling you to identify, measure, and mitigate potential responsible AI issues throughout your system development lifecycle.

PyData: PyData & Scientific Libraries Stack
Europium2
13:20
30min
Vector Streaming: The Memory Efficient Indexing for Vector Databases
Sonam Pankaj, Akshay Ballal

Vector databases are everywhere, powering LLMs. But indexing embeddings, especially multivector embeddings like ColPali and Colbert, at a bulk is memory intensive. Vector streaming solves this problem by parallelizing the tasks of parsing, chunking, and embedding generation and indexing it continuously chunk by chunk instead of bulk. This not only increase the speed but also makes the whole task more optimized and memory efficient.

The library gives many vector database supports, like Pinecone, Weavaite, and Elastic.

General: Rust
Hassium
13:20
30min
What we talk about when we talk about AI skills.
Paula Gonzalez Avalos

Defining what constitutes AI skills has always been ambiguous. As AI adoption accelerates across industries and the European AI Act mandates companies to ensure AI literacy among their staff, organizations face growing even more challenges in defining and developing AI competencies. In this talk, we'll present a comprehensive framework developed by the appliedAI Institute's experts that categorizes AI skills across technical, regulatory, strategic, and innovation domains. We'll also share initial data on current AI skills levels and upskilling needs and provide practical strategies for organizations to assess, develop, and acquire the AI capabilities required for their specific needs.

General: Education, Career & Life
Palladium
14:00
14:00
30min
Forecast of Hourly Train Counts on Rail Routes Affected by Construction Work
Sebastian Folz, Dr Maren Westermann

Construction work in national railroad networks often disrupts train traffic, making it vital to estimate hourly train numbers for effective re-routing. Traditionally managed by humans, this process has been automated due to staff shortages and demographic changes. DB Systel GmbH, Deutsche Bahn's IT provider, leveraged machine learning and artificial intelligence to estimate train traffic during construction. Using Python and frameworks like Pandas, scikit-learn, NumPy, PyTorch and Polars, their solution demonstrated significant benefits in performance and efficiency.

PyData: Machine Learning & Deep Learning & Statistics
Zeiss Plenary (Spectrum)
14:00
30min
Offline Disaster Relief Coordination with OpenStreetMap and FastAPI
Jannis Lübbe

In natural disaster scenarios, reliable communication is crucial. This talk presents a solution for disaster relief coordination using OpenStreetMap vector maps hosted on a local device in the emergency vehicle with FastAPI, ensuring functionality without an internet connection. By integrating a database of post codes and street names, and leveraging a LORAWAN gateway to receive positional data and water levels, this system ensures access to critical information even in blackout situations.

General: Infrastructure - Hardware & Cloud
Titanium3
14:00
30min
Optimizing Energy Tariffing System with Formal Concept Analysis and Dash
Dr. Irina Smirnova-Pinchukova

As a data scientist, I value the power of insightful visualizations to unlock unique interpretations of complex data. In my talk, I will introduce an elegant mathematical framework called Formal Concept Analysis (FCA), developed in the 1980s in Darmstadt.

FCA transforms binary data into concepts that can be visualized as a hierarchical graph, offering a fresh perspective on multidimensional data analysis. Leveraging this theory and its open-source Python libraries, I am developing an interactive Dash-based tool featuring interactive tables and graphs to explore data insights.

To illustrate its potential, I will showcase an optimization of the entire tariffing system of an energy provider company, highlighting how FCA can bring structure and clarity to even such tangled datasets.

PyData: Visualisation & Jupyter
Palladium
14:00
30min
Pipeline-level differentiable programming for the real world
Alessandro Angioi

Automatic Differentiation (AD) is not only the backbone of modern deep learning but also a transformative tool across various domains such as control systems, materials science, weather prediction, 3D rendering, data-driven scientific discovery, and so on. Thanks to a mature ML framework ecosystem, powered by libraries like PyTorch and JAX, AD performs remarkably well at a component level; however, integrating these components into differentiable pipelines still remains a significant challenge. In this talk, we will provide an accessible introduction to (pipeline-level) AD, demonstrate some cool applications you can build with it, and see how to build differentiable pipelines that hold up in the real world.

PyData: Research Software Engineering
Hassium
14:00
30min
Practical Python/Rust: Building and Maintaining Dual-Language Libraries
Ben Brandt

Building performant Python often means reaching for C extensions. This talk explores an alternative: leveraging Rust to create blazing-fast Python modules that also benefit the Rust ecosystem. I will share practical strategies from building semantic-text-splitter, a library for fast and accurate text segmentation used in both Python and Rust, demonstrating how to bridge the gap between these two languages and unlock new possibilities for performance and cross-language collaboration.

General: Rust
Helium3
14:00
30min
Using Causal thinking to make Media Mix Modeling
Carlos Trujillo

In today's data-driven landscape, understanding causal relationships is essential for effective marketing strategies. This talk will explore the link between Bayesian causal thinking and media mix modeling, utilizing Directed Acyclic Graphs (DAGs), Structural Causal Models (SCMs), and the Data Generation Process (DGP).

We will examine how DAGs represent causal assumptions, how SCMs define relationships in media mix models, and how to implement these models within a Bayesian framework. By using media mix models as causal inference tools, we can estimate counterfactuals and causal effects, offering insights into the effectiveness of media investments.

PyData: PyData & Scientific Libraries Stack
Platinum3
14:00
30min
You don’t think about your Streamlit app optimization until you try to deploy it to the cloud
Darya Petrashka

Building Streamlit apps is easy for Data Scientists - but when it’s time to deploy them to the cloud, challenges like slow model loading, scalability, and security can become major hurdles. This talk bridges two perspectives: the Data Scientist who builds the app and the MLOps engineer who deploys it. We'll dive into optimizing model loading from Hugging Face Hub, implementing features like autoscaling and authentication, and securing your app against potential threats. By the end of this talk, you’ll be ready to design Streamlit apps that are functional and deployment-ready for the cloud.

PyCon: MLOps & DevOps
Europium2
14:30
14:30
25min
Coffee Break
Zeiss Plenary (Spectrum)
14:30
25min
Coffee Break
Titanium3
14:30
25min
Coffee Break
Helium3
14:30
25min
Coffee Break
Platinum3
14:30
25min
Coffee Break
Europium2
14:30
25min
Coffee Break
Hassium
14:30
25min
Coffee Break
Palladium
14:40
14:40
15min
Coffee Break
Ferrum
14:40
15min
Coffee Break
Dynamicum
14:55
14:55
30min
3 Ways to Speed up Your Regression Modeling in Python
Alexander Fischer

Linear Regression is the workhorse of statistics and data science. Some data scientists even go as far and argue that "linear regression is all you need".

In this talk, we will introduce three ways to run regression models faster by using smarter algorithms, implemented in the scikit-learn & fastreg (sparse solvers), pyfixest (Frisch-Waugh-Lovell), and duckreg (regression compression via duckdb) libraries.

PyData: Machine Learning & Deep Learning & Statistics
Titanium3
14:55
30min
Building a HybridRAG Document Question-Answering System
Darya Petrashka

Retrieval Augmented Generation (RAG) is a powerful technique for searching across unstructured documents, but it often falls short when the task demands an understanding of intricate relationships between entities. GraphRAG addresses this by leveraging knowledge graphs to capture these relationships, but it struggles with scalability and handling diverse unstructured formats. In this talk, we’ll explore how HybridRAG combines the strengths of both approaches - RAG for scalable unstructured data retrieval and GraphRAG for semantic richness- to deliver accurate and contextually relevant answers. We’ll dive into its application, challenges, and the significant improvements it offers for question-answering systems across various domains.

PyData: Natural Language Processing & Audio (incl. Generative AI NLP)
Platinum3
14:55
30min
Demystifying Design Patterns: A Practical Guide for Developers
Tanu

Do you ever worry about your code becoming spaghetti-like and difficult to maintain?
Master the art of crafting clean, maintainable, and adaptable software by harnessing the power of design patterns. This presentation will empower you with a clear, structured understanding of these reusable solutions to address common programming challenges.

We'll delve into design patterns’ key categories: Behavioral, Structural, and Creational, as well as explore their functionality and how they can be applied in your daily development workflow. For each category, we'll also explore a practical design pattern in detail and showcase real-world applications of these patterns, along with small-scale code examples that illustrate their practical implementation.

You'll gain valuable insight into how these patterns can translate into real-world development scenarios, such as facilitating communication between objects (Behavioral), separating interfaces from implementation for flexibility (Structural), and enabling dynamic algorithm selection at runtime (Creational).

PyCon: Programming & Software Engineering
Zeiss Plenary (Spectrum)
14:55
30min
From Rules to Reality: Python's Role in Shaping Roundnet
Larissa Haas

Roundnet is a dynamic and fast-growing sport that combines quick reaction, athleticism, and strong community. However, like many emerging sports, it faces challenges in balancing competition, optimizing rules, and increasing accessibility for both players and spectators. This is where Python and data analysis come into play.

In this talk, I'll share insights from my role as Data Lead on the International Roundnet rule committee, where we use Python-powered data analysis to make informed decisions about the future of the sport. We'll explore how analyzing gameplay patterns and testing rule changes with simulation can lead to fairer, more exciting games and attract a broader audience.

PyData: Data Handling & Engineering
Hassium
14:55
30min
Graph Neural Networks for Collusion Detection using PyTorch and Deep Graph Library
Mara Mattes

Collusion is a complex phenomenon in which companies secretly collaborate to engage in fraudulent practices. This talk presents an innovative methodology for detecting and predicting collusion patterns in different national markets using neural networks (NNs) and graph neural networks (GNNs). GNNs are particularly well suited to this task because they can exploit the inherent network structures present in collusion and many other economic problems. In Python, we use PyTorch and the Deep Graph Library (DGL) to develop and train models on individual market datasets from Japan, the United States, two regions in Switzerland, Italy, and Brazil, focusing on predicting collusion in single markets. In our empirical study, we show that GNNs outperform NNs in detecting complex collusive patterns. This research contributes to the ongoing discourse on preventing collusion and optimizing detection methodologies, providing valuable guidance on the use of NNs and GNNs in economic applications to enhance market fairness and economic welfare.

PyData: Machine Learning & Deep Learning & Statistics
Helium3
14:55
30min
Intuitive A/B Test Evaluations for Coders
Thomas Mayer

A/B testing is a critical tool for making data-driven decisions, yet its statistical underpinnings—p-values, confidence intervals, and hypothesis testing—are often challenging for those without a background in statistics. Coders frequently encounter these concepts but lack a straightforward way to compute and interpret them using their existing skill set.
This talk presents a practical approach to A/B test evaluations tailored for coders. By utilizing Python’s random number generator and basic loops, it introduces bootstrapping as an accessible method for calculating p-values and confidence intervals directly from data. The goal is to simplify statistical concepts and provide coders with an intuitive understanding of how to evaluate test results without relying on complex formulas or statistical jargon.

PyData: Machine Learning & Deep Learning & Statistics
Ferrum
14:55
30min
Langfuse, OpenLIT, and Phoenix: Observability for the GenAI Era
Emanuele Fabbiani

Large Language Models (LLMs) are transforming digital products, but their non-deterministic behaviour challenges predictability and testing, making observability essential for quality and scalability.

This talk presents observability for LLM-based applications, spotlighting three tools: Langfuse, OpenLIT, and Phoenix. We'll share best practices about what and how to monitor LLM features and explore each tool's strengths and limitations.

Langfuse excels in tracing and quality monitoring but lacks OpenTelemetry support and customization. OpenLIT, while less mature, integrates well with existing observability stacks using OpenTelemetry. Phoenix stands out in debugging and experimentation but struggles with real-time tracing.

The comparison will be enhanced by live coding examples.

Attendees will walk away with an improved understanding of observability for GenAI applications and will understand which tool to use for their use case.

PyCon: Python Language & Ecosystem
Palladium
14:55
30min
The Forecast Whisperer: Secrets of Model Tuning Revealed
Illia Babounikau

Forecasting can often feel like interpreting vague signals—unclear yet full of potential. In this talk, we’ll cover advanced techniques for tuning forecasting models in professional settings, moving beyond the basics to explore methods that enhance both accuracy and interpretability.

You’ll learn:

How to set clear business goals for ML model tuning and align technical work with business needs, including balancing forecast granularity and accuracy and selecting statistically correct metric.

Practical data preparation methods, including business-driven data cleaning and detecting data problems with statistical and buiness driven approaches.

Advanced feature selection techniques such as recursive feature elimination and SHAP values, alongside hyperparameter tuning strategies including Bayesian optimization and ensemble methods.

How generative AI can support model tuning by automating feature generation, hyperparameter search, and enhancing model explainability through SHAP and LIME techniques.

Real-world case studies, including how Blue Yonder’s data science team optimized demand forecasting models for retail and supply chain applications.

We'll also discuss common mistakes like overfitting and data leakage, best practices for reliable validation, and the importance of domain knowledge in successful forecasting. Whether you're a seasoned data scientist or exploring time series forecasting, you'll gain advanced insights and techniques you can apply immediately.

PyData: Machine Learning & Deep Learning & Statistics
Dynamicum
14:55
30min
What do a tree and the human brain have in common-a not so serious introduction to digital pathology
Daniel Hieber

While trees and human brains don't share that many properties regarding their domain, the analysis of the height of a tree and cancer in human brains does.
This talk provides a not-so-serious introduction to the domain of computer vision for pathological use cases.
Besides a general introduction to (digital) pathology and the technical similarities between satellite images (GeoTIFs) and pathological images (Whole-Slide Images), we will take a look at computer vision (both ML-based and conventional) on Python.
Whether you have never done image processing in Python, are an expert (ready to share some tricks with me), or are just curious to see pictures of a human brain, this talk is for you.
Warning: this talk contains quite abstract pink-ish pictures of human tissue (and trees^^). If you are unsure this is something you are comfortable with (have a friend), do a quick search for "HE-stained whole-slide image".

PyData: Computer Vision (incl. Generative AI CV)
Europium2
15:35
15:35
30min
Building Bare-Bones Game Physics in Rust with Python Integration
Sam Kaveh

Learn how to build a minimalist game physics engine in Rust and make it accessible to Python developers using PyO3. This talk explores fundamental concepts like collision detection and motion dynamics while focusing on Python integration for scripting and testing. Ideal for developers interested in combining Rust’s performance with Python’s ease of use to create lightweight and efficient tools for games or simulations.

General: Rust
Palladium
15:35
30min
Building an Open Source RAG System for the United Nations Negotiations on Global Plastic Pollution
Rahkakavee Baskaran, Teresa Kroesen

Plastic pollution is a significant global challenge. Every year, millions of tons of plastic enter the oceans, impacting marine ecosystems and human health. To address this issue, the United Nations is negotiating a legally binding treaty with representatives from 180 countries, aiming to reduce plastic pollution and promote sustainable practices.

We have developed NegotiateAI, an open-source chat application that supports delegations during the UN negotiations on a legally binding agreement to combat plastic pollution. The tool demonstrates how generative AI and Retrieval Augmented Systems (RAG) can address complex global challenges. Built with Haystack 2.0, Qdrant, HuggingFace Spaces, and Streamlit, it showcases the potential of open-source technologies in tackling issues of global relevance.

As a beginner or advanced developer, this talk will give you valuable insights into developing impactful AI applications with open source tools in the public sector.

PyData: Natural Language Processing & Audio (incl. Generative AI NLP)
Platinum3
15:35
30min
Citation is Collaboration: Software Recognition in Research and Industry
Ivelina Momcheva

The development of open source software is increasingly recognized as a critical contribution across many disciplines, yet the mechanisms for credit and citation vary significantly. This talk uses astronomy as a case study to explore shared challenges in attributing software contributions across research and industry. It will review the evolution of journal recommendations and policies over the past decade, alongside emerging publishing practices offering insights into their impact on the recognition of software contributions. An analysis of citation patterns for widely used libraries (numpy, scipy, astropy) highlights trends over time and their dependence on publication venues and policies. The talk will conclude with strategies for both developers and users for improving the recognition of software, fostering collaboration and sustainability in software ecosystems. All data and analysis code will be made available in a public repository, supporting transparency and further study.

PyData: Research Software Engineering
Dynamicum
15:35
30min
PosePIE: Replace Your Keyboard and Mouse With AI-Driven Gesture Control
Daniel Stolpmann

In this talk, we show how to leverage publicly available tools to control any game or program using hand or body movements. To achieve this, we introduce PosePIE, an open-source programmable input emulator that generates input events on virtual gamepads, keyboards and mice based on gestures recognized by using AI-driven pose estimation. PosePIE is fully configurable by the user through Python scripts, making it easily adaptable to new applications.

PyData: Computer Vision (incl. Generative AI CV)
Hassium
15:35
30min
Reinforcement Learning Without a PhD: A Python Developer’s Journey
Jochen Luithardt

From watching AI conquer Super Mario to building production-ready Reinforcement Learning (RL) systems, this talk explores how Python developers can dive into RL without requiring advanced degrees or big tech resources. Drawing on our three-year journey of building a production RL system, I’ll show how developers can leverage RL through practical strategies and accessible tools. Using pi_optimal, an open-source RL toolkit, we’ll bridge the gap between cutting-edge RL research and real-world applications. Attendees will gain actionable insights, implementation techniques, and hands-on experience to confidently start their own RL projects.

PyData: Machine Learning & Deep Learning & Statistics
Helium3
15:35
30min
Taking Control of LLM Outputs: An Introductory Journey into Logits
Emek Gözlüklü

This talk explores the use of logits - the raw confidence scores that language models generate before selecting each token. Working directly with logits enables finer control over model behavior.

The session covers practical techniques for accessing and utilizing these scores through local models. Topics include detecting model uncertainty, implementing custom stopping conditions, and steering generation without prompt modifications.

You will learn how to analyze model confidence patterns and apply this knowledge to real-life use cases.

PyData: Natural Language Processing & Audio (incl. Generative AI NLP)
Europium2
15:35
30min
The Foundation Model Revolution for Tabular Data
Noah Hollmann, Frank Hutter

What if we could make the same revolutionary leap for tables that ChatGPT made for text? While foundation models have transformed how we work with text and images, tabular / structured data (spreadsheets and databases) - the backbone of economic and scientific analysis - has been left behind. TabPFN changes this. It's a foundation model that achieves in 2.8 seconds what traditional methods need 4 hours of hyperparameter tuning for - while delivering better results. On datasets up to 10,000 samples, it outperforms every existing Python library, from XGBoost to CatBoost to Autogluon.

Beyond raw performance, TabPFN brings foundation model capabilities to tables: native handling of messy data without preprocessing, built-in uncertainty estimation, synthetic data generation, and transfer learning - all in a few lines of Python code. Whether you're building risk models, accelerating scientific research, or optimizing business decisions, TabPFN represents the next major transformation in how we analyze data. Join us to explore and learn how to leverage these new capabilities in your work.

PyData: Machine Learning & Deep Learning & Statistics
Titanium3
15:35
30min
Where have all the post offices gone? Discovering neighborhood facilities with Python and OSM
Katie Richardson

When it comes to open geographic data, OpenStreetMap is an awesome resource. Getting started and figuring out how to make the most out of the data available can be challenging.

Using a personal example: frustration at the apparent lack of post offices in my neighborhood, we'll walk through examples of how to parse, filter, process, and visualize geospatial data with Python.

At the end of this talk, you will know how to process geographic data from OpenStreetMap using Python and find out some surprising info that I learned while answering the question: Where have all the post offices gone?

PyData: Data Handling & Engineering
Zeiss Plenary (Spectrum)
15:35
30min
Zero Code Change Acceleration: familiar interfaces and high performance
Tim Head

The PyData ecosystem is home to some of the best and most popular tools for doing data-science. Every data-scientist alive today has used pandas and scikit-learn and even Large Language Models know how to use them! For many years there have also been alternative implementations with similar interfaces and libraries with completely new approaches that focus on achieving the ultimate in performance and hardware acceleration. This talk will look at the recent efforts to give users the best of both worlds: a familiar and widely used interface as well as high performance.

PyData: PyData & Scientific Libraries Stack
Ferrum
16:15
16:15
30min
Enhancing RAG with Fast GraphRAG and InstructLab: A Scalable, Interpretable, and Efficient Framework
Tuhin Sharma

Retrieval Augmented Generation (RAG) has become a cornerstone in enriching GenAI outputs with external data, yet traditional frameworks struggle with challenges like data noise, domain specialization, and scalability. In this talk, Tuhin will dive into open-source frameworks Fast GraphRAG and InstructLab, which addresses these limitations by combining knowledge graphs with the classical PageRank algorithm and Fine-tuning, delivering a precision-focused, scalable, and interpretable solution. By leveraging the structured context of knowledge graphs, Fast GraphRAG enhances data adaptability, handles dynamic datasets efficiently, and provides traceable, explainable outputs while InstructLab adds domain depth to the LLM through Fine-tuning. Designed for real-world applications, it bridges the gap between raw data and actionable insights, redefining intelligent retrieval for developers, researchers, and enterprises. This talk will showcase Fast GraphRAG’s transformative features coupled with domain specific Fine-tuning leveraging InstructLab and demonstrate its potential to elevate RAG’s capabilities in handling the evolving demands of large language models (LLMs) for developers, researchers, and businesses.

PyData: Generative AI
Helium3
16:15
30min
Unforgettable, that's what you are: Evaluating Machine Unlearning and Forgetting
Katharine Jarmul

Can deep learning/AI models forget? In this talk, you'll explore the realm of machine unlearning, where researchers and practitioners aim to remove memorized examples from machine learning models. This is relevant for training increasingly overparameterized models and growing GDPR/Privacy concerns with large scale model development and use.

PyData: Machine Learning & Deep Learning & Statistics
Titanium3
16:45
16:45
15min
Closing Session
Zeiss Plenary (Spectrum)