PyCon DE & PyData 2025 :: pretalx

To see our schedule with full functionality, like timezone conversion and personal scheduling, please enable JavaScript and go here.

Wednesday, April 23, 2025

Thursday, April 24, 2025

Friday, April 25, 2025

08:00

08:00

120min

Registration & Welcome Coffee

Zeiss Plenary (Spectrum)

10:00

10:00

30min

Opening Session

Zeiss Plenary (Spectrum)

10:30

Kristian Kersting

The relationship between humans and machines, especially in the context of Artificial Intelligence (AI), is shaped by hopes, concerns, and moral questions. On the one hand, advances in AI offer great promise: it can help us solve complex problems, improve healthcare, streamline workflows, and much more. Yet, at the same time, there are legitimate concerns about the control over this technology, its potential impact on jobs and society, and ethical issues related to discrimination and the loss of human autonomy. In the talk I shall will explore and illustrate the complex tension between innovation and moral responsibility in AI research.

Zeiss Plenary (Spectrum)

11:15

11:15

30min

Coffee Break

Zeiss Plenary (Spectrum)

11:15

30min

Coffee Break

Titanium3

11:15

30min

Coffee Break

Helium3

11:15

30min

Coffee Break

Platinum3

11:15

30min

Coffee Break

Europium2

11:15

30min

Coffee Break

Hassium

11:15

30min

Coffee Break

Palladium

11:15

30min

Coffee Break

Ferrum

11:15

30min

Coffee Break

Dynamicum

11:45

11:45

30min

CANCELLED: Beyond Code: Fostering Diversity and Inclusion in Open Source

Palladium

11:45

90min

CANCELLED: Deploy RAG Applications Using Docker: A Step-by-Step Guide

Ferrum

Are LLMs the answer to all our problems?

Dr. Maria Börner

Generative AI models have shaken up the German market. Since the release of ChatGPT, AI is available and usable for everyone. The number of ChatGPT-based agents is growing rapidly, but concerns about privacy, copyright and ethics remain. Regulation and ethical AI go hand in hand, but are often seen as barriers. The presentation will cover the different aspects of ethics and how they are addressed by regulation. It will give an overview of how to use large language models in a safe and practical way. This won't only address the various ethical issues, but also convince your next customer to invest in your AI-based product.

General: Ethics & Privacy

Interactive end-to-end root-cause analysis with explainable AI in a Python Shiny App

Julius Möller, Simone Lederer

We demonstrate a pure Python solution for exploring and understanding datasets using state-of-the-art machine learning and explainable AI techniques. Our application features a reactive dashboard built with Shiny, specifically designed for the daily work of data scientists.

The tool provides insights into data rapidly and effortlessly through an interactive dashboard. It facilitates data preprocessing, interactive exploratory data analysis, on-demand model training, evaluation, and interpretation. It further renders dynamic, annotated, and interactive visualizations. This allows to pinpoint critical elements and relations as root causes in a haystack of features, compressing a full day's work into under an hour.

Utilizing Plotly for dynamic visualizations, along with Scikit-learn, CatBoost, SHAP values, and MLflow for experiment tracking, married with shiny reactive dashboard, we facilitate quick and easy data preprocessing and exploration, model training and evaluation, together with explainable AI.

PyData: Machine Learning & Deep Learning & Statistics

Introducing the Synthetic Data SDK - Privacy Preserving Synthetic Data for AI/ML

Michael Platzer

AI-generated synthetic data is gaining traction as a privacy-safe solution for data access and sharing. This data is created from original datasets, maintaining privacy without compromising utility.

In this Session, we'll cover the fundamental concepts of AI-generated synthetic data and demonstrate how easy it is to generate synthetic data within your local compute environment using the open-source Synthetic Data SDK.

PyData: Data Handling & Engineering

Power up your Polars code with Polars extention

While Polars is written in Rust and has the advantages of speed and multi-threaded functionalities., everything will slow down if a Python function needs to be applied to the DataFrame. To avoid that, a Polar extension can be used to solve the problem. In this workshop, we will look at how to do it.

PyData: Data Handling & Engineering

Python Performance Unleashed: Essential Optimization Techniques Beyond Libraries

Every Python developer faces performance challenges, from slow data processing to memory-intensive operations. While external libraries like Numba or Cython offer solutions, understanding core Python optimization techniques is crucial for writing efficient code. This talk explores practical optimization strategies using Python's built-in capabilities, demonstrating how to achieve significant performance improvements without external dependencies. Through real-world examples from machine learning pipelines and data processing applications, we'll examine common bottlenecks and their solutions. Whether you're building data pipelines, web applications, or ML systems, these techniques will help you write faster, more efficient Python code.

PyCon: Python Language & Ecosystem

Zeiss Plenary (Spectrum)

Why E.ON Loves Python

Christer Friberg

Join me as I share my 20-year journey with Python and its pivotal role at E.ON. Discover how we transitioned fully to Python, streamlined our development framework, and embraced MLOps principles. Learn about some of our AI projects, including image analysis and real-time inference, and our steps towards open-sourcing code to foster innovation in the energy sector. Explore why Python is our go-to language for data science and collaboration.

PyCon: MLOps & DevOps

12:25

Building an Open Source RAG System for the United Nations Negotiations on Global Plastic Pollution

Rahkakavee Baskaran, Teresa Kroesen, Anna-Lisa Wirth

Plastic pollution is a significant global challenge. Every year, millions of tons of plastic enter the oceans, impacting marine ecosystems and human health. To address this issue, the United Nations is negotiating a legally binding treaty with representatives from 180 countries, aiming to reduce plastic pollution and promote sustainable practices.

We have developed NegotiateAI, an open-source chat application that supports delegations during the UN negotiations on a legally binding agreement to combat plastic pollution. The tool demonstrates how generative AI and Retrieval Augmented Systems (RAG) can address complex global challenges. Built with Haystack 2.0, Qdrant, HuggingFace Spaces, and Streamlit, it showcases the potential of open-source technologies in tackling issues of global relevance.

As a beginner or advanced developer, this talk will give you valuable insights into developing impactful AI applications with open source tools in the public sector.

PyData: Natural Language Processing & Audio (incl. Generative AI NLP)

From Tensors to Clouds — A Practical Guide to Zarr V3 and Zarr-Python 3

A key feature of the Python data ecosystem is the reliance on simple but efficient primitives that follow well-defined interfaces to make tools work seamlessly together (Cf. http://data-apis.org/). NumPy provides an in-memory representation for tensors. Dask provides parallelisation of tensor access. Xarray provides metadata linking tensor dimensions. Zarr provides a missing feature, namely the scalable, persistent storage for annotated hierarchies of tensors. Defined through a community process, the Zarr specification enables the storage of large out-of-memory datasets locally and in the cloud. Implementations exist in C++, C, Java, Javascript, Julia, and Python, enabling.

This talk presents a systematic approach to understanding and implementing the newer version of Zarr-Python, i.e. Zarr-Python 3 by explaining the new API, deprecations, new storage backend, improved codec pipeline, etc.

PyData: Data Handling & Engineering

Generative AI Monitoring with PydanticAI and Logfire

Marcelo Trylesinski

In this talk, we will explore how the integration of PydanticAI and Logfire creates a powerful foundation for generative AI applications. We'll demonstrate how these tools combine to form sophisticated AI workflows and give you comprehensive monitoring.

The session illustrates how PydanticAI enables more reliable agent responses while Logfire provides real-time insights for efficient troubleshooting.

Through practical examples, you'll learn implementation techniques that will help your team build AI systems with observability, transforming how you develop and maintain generative AI projects. 🚀

PyData: Generative AI

Open Table Formats in the Wild: From Parquet to Delta Lake and Back

Open table formats have revolutionized analytical, columnar storage on cloud object stores with critical features like ACID compliance and enhanced metadata management, once exclusive to proprietary cloud data warehouses. Delta Lake, Iceberg, and Hudi have significantly advanced over traditional open file formats like Parquet and ORC.

In an effort to modernize our data architecture, we aimed to replace our Parquet-based bronze layer with Delta Lake, anticipating better query performance, reduced maintenance, native support for incremental processing, and more. While our initial pilot showed promise, we encountered unexpected pitfalls that ultimately brought us back to where we began.

Curious? Join me as we shed light on the current state of table formats.

PyData: Data Handling & Engineering

Zeiss Plenary (Spectrum)

The aesthetics of AI: from cyberpunk to fascism

Let’s explore the visual grammars, references and cultural norms at play in the field of AI; from Kismet to Spot®, from Clippy to Claude. As a sector we can be hyper-focused on technical process and function, to the extent that it blinkers our understanding of the cultural and political impacts of our work. Aesthetics infuse every aspect of technology. Aesthetic interpretations are manifold and mutable, constructed in-congress with the observer and not fully defined by the original designer. AI technologies add additional layers of subtext: character, consciousness, agency, intent.

Despite this murkiness, or perhaps because of it, this talk makes an passionate argument for engaging with historical aesthetic movements, for building our shared professional knowledge of fads and fashions⎯not just from the past 40 years of internet culture⎯but also the past 140 years of ideology, technology, and thought.

General: Others

Why Exceptions Are Just Sophisticated Gotos - and How to Move Beyond

Florian Wilhelm

"Why Exceptions Are Just Sophisticated Gotos - and How to Move Beyond" explores a common programming tool with a fresh perspective. While exceptions are a key feature in Python and other languages, they share surprising similarities with the notorious goto statement. This talk examines those parallels, the problems exceptions can create, and practical alternatives for better code. Attendees will gain a clear understanding of modern programming concepts and the evolution of programming.

PyCon: Programming & Software Engineering

expectation: A modern take on statistical A/B testing with e-values and martingales

This talk introduces a novel Python library for statistical testing using e-values, offering a refreshing alternative to traditional p-values. We'll explore how this approach enables real-time sequential testing, allowing data scientists to monitor experiments continuously without the statistical penalties of repeated testing. Through practical examples, we'll demonstrate how e-values provide more intuitive evidence measures and enable flexible stopping rules in A/B testing, clinical trials, and anomaly detection. The library implements cutting-edge methods from game-theoretic probability, making advanced sequential testing accessible to Python practitioners. Whether you're conducting A/B tests, monitoring production models, or running clinical trials, this talk will equip you with powerful new tools for sequential data analysis.

PyData: Machine Learning & Deep Learning & Statistics

13:10

13:10

80min

Lunch Break

Zeiss Plenary (Spectrum)

13:10

80min

Lunch Break

Titanium3

13:10

80min

Lunch Break

Helium3

13:10

80min

Lunch Break

Platinum3

13:10

80min

Lunch Break

Europium2

13:10

80min

Lunch Break

Hassium

13:10

80min

Lunch Break

Palladium

13:15

13:15

75min

Lunch Break

Ferrum

13:15

75min

Lunch Break

Dynamicum

14:30

AI coding agent - what it is, how it works and is it good for developers

In this talk, we will have a deeper technical look at AI coding agents, their design, and how they can carry out coding tasks with the support of large language models. We will look at the journey from the user entering a prompt to how it converts to actions in completing the task.

After that, we will look at the impact it could make in the industry, as a developer, whether or not you should use an AI coding agent, and what a user should be cautious of when using suchan agent.

PyData: Generative AI

Autonomous Browsing using Large Action Models

Arne Grobrügge, Nico Kreiling

The browser serves as our gateway to the internet—the largest repository of knowledge in human history. Proficiency in its use is a core skill across nearly all professions and is becoming increasingly important for Artificial Intelligence. But can Large Action Models (LAMs) autonomously operate a browser? What exactly are LAMs that promise to translate human intentions into actions? We report on a project that fully automates the job application process using AI: from navigating unfamiliar website structures and filling out forms to handling document uploads and cookie banners.

PyData: Natural Language Processing & Audio (incl. Generative AI NLP)

Benchmarking Time Series Foundation Models with sktime

Benedikt Heidrich

Recent time series foundation models such as LagLlama, Chronos, Moirai, and TinyTimesMixer promise zero-shot forecasting for arbitrary time series. One central claim of foundation models is their ability to perform zero-shot forecasting, that is, to perform well with no training data. However, performance claims of foundation models are difficult to verify, as public benchmark datasets may have been a part of the training data, and only the already trained weights are available to the user.

Therefore, performance in specific use cases must be verified based on the use case data itself to ensure a reliable assessment of forecasting performance. sktime allows users to easily produce a performance benchmark of any collection of forecasting models, foundation models, simple baselines, or custom methods on their internal use case data.

PyData: Machine Learning & Deep Learning & Statistics

From Trees to Transformers: Our Journey Towards Deep Learning for Ranking

Theodore Meynard, Mihail Douhaniaris

GetYourGuide, a global marketplace for travel experiences, reached diminishing returns with its XGBoost-based ranking system. We switched to a Deep Learning pipeline in just nine months, maintaining high throughput and low latency. We iterated on over 50 offline models and conducted more than 10 live A/B tests, ultimately deploying a PyTorch transformer that yielded significant gains. In this talk, we will share our phased approach—from a simple baseline to a high-impact launch—and discuss the key operational and modeling challenges we faced. Learn how to transition from tree-based methods to neural networks and unlock new possibilities for real-time ranking.

PyData: Machine Learning & Deep Learning & Statistics

Zeiss Plenary (Spectrum)

Instrumenting Python Applications with OpenTelemetry

Mika Naylor, Emily Woods

Observability is challenging and often requires vendor-specific instrumentation. Enter OpenTelemetry: a vendor-agnostic standard for logs, metrics, and traces. Learn how to instrument Python applications with OpenTelemetry and send telemetry to your preferred observability backends.

PyCon: MLOps & DevOps

LLM Inference Arithmetics: the Theory behind Model Serving

Have you ever asked yourself how parameters for an LLM are counted, or wondered why Gemma 2B is actually closer to a 3B model? You have no clue about what a KV-Cache is? (And, before you ask: no, it's not a Redis fork.) Do you want to find out how much GPU VRAM you need to run your model smoothly?

If your answer to any of these questions was "yes", or you have another doubt about inference with LLMs - such as batching, or time-to-first-token - this talk is for you. Well, except for the Redis part.

PyData: Generative AI

Reinforcement Learning Without a PhD: A Python Developer’s Journey

Jochen Luithardt

Reinforcement Learning (RL) has shown superhuman performance in games and is already delivering value in Big Tech. But despite its potential, RL remains largely inaccessible to most developers. Why? Because real-world RL is hard—it demands data, infrastructure, and tools that are often built for researchers, not practitioners.

This talk shares the journey of applying RL to a real-world use case without having a PhD. It’s a story of figuring things out through hands-on experimentation, trial and error, and building what didn’t exist. We’ll explore what makes RL powerful, why it’s still rare in practice, and how you can get started. Along the way, you’ll learn about the key challenges of production RL, how to work around them, and how the open-source toolkit pi_optimal can help bridge the gap. Whether you're just RL-curious or ready to dive in, this talk offers practical insights and a demo to help you take your first steps.

PyData: Machine Learning & Deep Learning & Statistics

Taking Control of LLM Outputs: An Introductory Journey into Logits

Emek Gözlüklü

This talk explores logits - the raw confidence scores that language models generate before selecting each token. Understanding and manipulating these scores gives you practical control over how models generate text.

In this introductory session, we'll explore the token-by-token generation process, examining how tokenizers work and why vocabulary matters. You'll learn about the relationship between logits, probabilities, and tokens. Then we will cover constrained decoding approaches and talk about structured generation.

PyData: Natural Language Processing & Audio (incl. Generative AI NLP)

supplyseer: Computational Supply Chain with Python

This talk introduces supplyseer, an open-source Python library that brings advanced analytics to Supply Chain and Logistics. By combining time series embedding techniques, stochastic process modeling, and geopolitical risk analysis, supplyseer helps organizations make data-driven decisions in an increasingly complex global supply chain landscape. The library implements novel approaches like Takens embedding for demand forecasting, Hawkes processes for modeling supply chain events, and Bayesian methods for inventory optimization. Through practical examples and real-world use cases, we'll explore how these mathematical concepts translate into actionable insights for supply chain practitioners.

PyData: Machine Learning & Deep Learning & Statistics

15:10

Beyond Agents: What AI Strategy Really Needs in 2025

Alexander CS Hendorf

Artificial intelligence is no longer confined to models and APIs—it now shapes systems, hardware, and real-world agents. In this talk, I reflect on strategic insights gained at NVIDIA’s GTC 2025, where AI’s convergence with simulation, synthetic data, and robotics signals a fundamental shift. Drawing from over 1,100 sessions and personal experiences at the heart of Silicon Valley, I explore emerging patterns that redefine what it means to build and deploy AI at scale. We’ll look beyond the hype of large language models to examine autonomous systems, interdisciplinary development, and the infrastructure shifts enabling AI everywhere—from cloud to desktop. This session is a call to technical leaders and practitioners to broaden their perspective, think beyond tools, and engage strategically. Whether you’re developing agents, managing data pipelines, or scaling AI across teams, this talk will challenge assumptions and highlight what truly matters in 2025 and beyond.

General: Others

Zeiss Plenary (Spectrum)

Beyond Basic Prompting: Supercharging Open Source LLMs with LMQL's Structured Generation

Christiaan Swart

This intermediate-level talk demonstrates how to leverage Language Model Query Language (LMQL) for structured generation and tool usage with open-source models like Llama. You will learn how to build a RAG system that enforces output constraints, handles tool calls, and maintains response structure - all while using open-source components. The presentation includes hands-on examples where audience members can experiment with LMQL prompts, showcasing real-world applications of constrained generation in production environments.

PyData: Natural Language Processing & Audio (incl. Generative AI NLP)

Building Reliable AI Agents for Publishing: A DSPy-Based Quality Assurance Framework

Simonas Černiauskas

As publishers increasingly adopt AI agents for content generation and analysis, ensuring output quality and reliability becomes critical. This talk introduces a novel quality assurance framework built with DSPy that addresses the unique challenges of evaluating AI agents in publishing workflows. Using real-world examples from newsroom implementations, I will demonstrate how to design and implement systematic testing pipelines that verify factual accuracy, content consistency, and compliance with editorial standards. Attendees will learn practical techniques for building reliable agent evaluation systems that go beyond simple metrics to ensure AI-generated content meets professional publishing standards.

PyData: Natural Language Processing & Audio (incl. Generative AI NLP)

Inclusive Data for 1.3 Billion: Designing Accessible Visualizations

Pavithra Eswaramoorthy

According to the World Health Organization (WHO), an estimated 1.3 billion people (1 in 6 individuals) experience a disability, and nearly 2.2 billion people (1 in 5 individuals) have vision impairment. Improving the accessibility of visualizations will enable more people to participate in and engage with our data analyses.

In this talk, we’ll discuss some principles and best practices for creating more accessible data visualizations. It will include tips for individuals who create visualizations, as well as guidelines for the developers of visualization software to help ensure your tools can help downstream designers and developers create more accessible visualizations.

PyData: Visualisation & Jupyter

PDFs - When a thousand words are worth more than a picture (or table).

Caio Benatti Moretti

PDF, a must-have in RAG systems, ensures visual fidelity across platforms and devices, at the expense of compromising what would be the core condition for computers to properly process and interpret text: semantics. That means any logical arrangement of text, upon rendering, explodes into dummy visual shards of data that literally portrait the bigger picture for the human eye to perceive, but no longer convey the information computers should grasp. Such a bottleneck already makes proper ingestion of text-only documents a big challenge, let alone when tables or figures come into play, the ultimate nightmare for PDF parsers, not to say developers. The rest you must have already foreseen: a RAG system barfing unreliable knowledge from bad chunks (based on regular PDF parsing), if those ever get to be retrieved from a vector database. In this talk you can gather some vision-driven insights on how to leverage the strengths of PDF and language models towards good chunks to be ingested. Or, in other words, how multimodal models can go beyond trivial reverse engineering by decomposing tables into its building blocks, in plain language, as how those would be explained to another human; or better yet, as how humans would ask questions about such pieces of knowledge. And from such a strategy, we transfer the same rationale to figures. Come along, gather some insights, and get inspired to break down tables and figures from your own PDFs, and to improve retrieval in your RAG systems.

PyData: Generative AI

PyData Stack: Pure Python open source data platforms

Eric Thanenthiran

Modern open source Python data packages offer the opportunity to build and deploy pure Python, production-ready data platforms. Engineers can and do play a big role in helping companies become data-driven by centralising this data, cleaning and modelling it and presenting back to the business. Now more than ever it allows engineers and companies of any size the ability to build data products and insights for relatively low cost. In this talk we’ll walk through the key components of this stack, tooling options available and demo a deployable containerised Python data stack.

PyData: Data Handling & Engineering

Size matters: Inspecting Docker images for Efficiency and Security

Inspecting Docker images is crucial for building secure and efficient containers. In this session, we will analyze the structure of a Python-based Docker image using various tools, focusing on best practices for minimizing image size and reducing layers with multi-stage builds. We’ll also address common security pitfalls, including proper handling of build and runtime secrets.

While this talk offers valuable insights for anyone working with Docker, it is especially beneficial for Python developers seeking to master clean and secure containerization techniques.

PyCon: MLOps & DevOps

16:10

16:10

30min

CANCELLED: Multivariate Datastrophe: Methods to Detect Obscure Drift in Your Producti- on Data

Titanium3

Beyond FOMO — Keeping Up-to-Date in AI

Carsten Frommhold

The rapid evolution of AI technologies, particularly since the emergence of Large Language Models, has transformed the data science landscape from a field of steady progress to one of constant breakthroughs. This acceleration creates unique challenges for practitioners, from managing FOMO to battling imposter syndrome. Drawing from personal experience transitioning from mathematical modeling to modern AI development, this talk explores practical strategies for staying current while maintaining sanity. We'll discuss building effective learning structures, creating collaborative knowledge-sharing environments, and finding the right balance between innovation and implementation. Attendees will leave with actionable insights on navigating technological change while fostering sustainable growth in their teams and careers.

General: Education, Career & Life

Building Serverless Python AI skills as WASM components

Frameworks like llama-stack and langchain allow for quick prototyping of generative AI applications. However, companies often struggle to deploy these applications into production quickly. This talk explores the design of a Python SDK that enables the development of AI skills in Python and their compilation into WebAssembly (WASM) components, targeting a specific host runtime that offers interfaces for interacting with LLMs and associated tooling.

PyData: Generative AI

Conformal Prediction: uncertainty quantification to humanise models

Vincenzo Ventriglia

Quantifying model uncertainties is critical to improve model reliability and make sound decisions. Conformal Prediction is a framework for uncertainty quantification that provides mathematical guarantees of true outcome coverage, allowing more informed decisions to be made by stakeholders

PyData: Machine Learning & Deep Learning & Statistics

Deploying Synchronous and Asynchronous Django Applications for Hobby Projects

Simplify deploying hybrid Django applications with synchronous views and asynchronous apps. This session covers ASGI support, Docker containerization, and Kamal for seamless, zero-downtime deployments on single-server setups, ideal for hobbyists and small-scale projects.

PyCon: Django & Web

Driving Trust and Addressing Ethical Challenges in Transportation through Explainable AI

Machine Learning can transform transportation—improving safety, optimizing routes, and reducing delays—yet it also presents ethical concerns. In this talk,I will show how Explainable AI (XAI) can offer practical solutions these ethical dilemmas like lack of trust in AI solutions. Instead of focusing on the technical underpinnings, we will discuss how transparency can be enhanced in AI-supported transportation systems. Using a real-world example, I will demonstrate how XAI provides the groundwork for building ethical, trustworthy, and socially responsible AI solutions in public transportation systems.

General: Ethics & Privacy

How to use Data Science Superpowers in real life, a Bayesian perspective

In the data science field, we use all these powerful methods to solve important problems. Most of the time, we do this very well because our data science and machine-learning toolbox fits the problems we tackle quite precisely. Yet, what about our everyday choices or even our most important life decisions? Can we use for our private lives what we advocate for in our jobs or are these choices inherently different?
Many of this real life decisions are a little different than textbook machine-learning problems. There is often less or hard-to-come-by data and the decisions are infrequent, but sometimes very consequential. This talk will dive into what makes everyday decisions difficult to handle with our data science toolbox. It will show how Bayesian thinking can help to reason in such cases, especially when there is not a lot of data to rely on.

PyData: Machine Learning & Deep Learning & Statistics

Jeannie: An Agentic Field Worker Assistant

Andrei Beliankou, Jose Moreno Ortega

Jeannie is an LLM-based agentic workflow implemented in Python to automate task management for field workers in the energy sector. This system addresses inefficiencies and safety risks in tasks like PV panel installation and powerline repair.

Using open-source tools (LangChain family, OpenStreetMap and OpenWeatherMap APIs), Jeannie retrieves tasks, fetches weather and directions, identifies past incidents via RAG, and emails tailored reports with safety warnings.

This presentation offers a case study of Jeannie’s implementation for E.ON in Germany, demonstrating how daily task automation enhances worker safety and efficiency. Attendees will discover how to create agentic systems with Python, integrate APIs, and apply RAG for safety applications, with access to open-source code and data for replicating the workflow.

PyData: Generative AI

Mastering Demand Forecasting: Lessons from Europe's Largest Retailer

Moreno Schlageter, Yovli Duvshani

Ever craved your favorite dish, only to find its key ingredient missing from the store? You're not alone - stock outs can have significant consequences for businesses, resulting in frustrated customers and lost sales. On the other hand, overstocking can lead to wasted storage costs and potential write-offs. The replenishment system is responsible for striking the right balance between these opposing risks.
The key to successful replenishment is making accurate predictions about future demand.

This presentation takes a deep dive into the intricate world of demand forecasting, at Europe's largest retailer. We will demonstrate how enhancing simple machine learning methods with domain knowledge allows to generate hundreds of millions of high-quality forecasts every day.

PyData: Machine Learning & Deep Learning & Statistics

Zeiss Plenary (Spectrum)

16:40

16:40

30min

Coffee Break

Zeiss Plenary (Spectrum)

16:40

30min

Coffee Break

Titanium3

16:40

30min

Coffee Break

Helium3

16:40

30min

Coffee Break

Platinum3

16:40

30min

Coffee Break

Europium2

16:40

30min

Coffee Break

Hassium

16:40

30min

Coffee Break

Palladium

16:40

30min

Coffee Break

Ferrum

16:40

30min

Coffee Break

Dynamicum

17:10

Citation is Collaboration: Software Recognition in Research and Industry

Ivelina Momcheva

The development of open source software is increasingly recognized as a critical contribution across many disciplines, yet the mechanisms for credit and citation vary significantly. This talk uses astronomy as a case study to explore shared challenges in attributing software contributions across research and industry. It will review the evolution of journal recommendations and policies over the past decade, alongside emerging publishing practices offering insights into their impact on the recognition of software contributions. An analysis of citation patterns for widely used libraries (numpy, scipy, astropy) highlights trends over time and their dependence on publication venues and policies. The talk will conclude with strategies for both developers and users for improving the recognition of software, fostering collaboration and sustainability in software ecosystems. All data and analysis code will be made available in a public repository, supporting transparency and further study.

PyData: Research Software Engineering

Conquering PDFs: document understanding beyond plain text

NLP and data science could be so easy if all of our data came as clean and plain text. But in practice, a lot of it is hidden away in PDFs, Word documents, scans and other formats that have been a nightmare to work with. In this talk, I'll present a new and modular approach for building robust document understanding systems, using state-of-the-art models and the awesome Python ecosystem. I'll show you how you can go from PDFs to structured data and even build fully custom information extraction pipelines for your specific use case.

PyData: Natural Language Processing & Audio (incl. Generative AI NLP)

Zeiss Plenary (Spectrum)

Enhancing Software Supply Chain Security with Open Source Python Tools

Anthony Harrison

The Cyber Resilience Act (CRA) is focused on improving the security and resilience of digital products. But to comply with the CRA, businesses will need to start preparing the necessary evidence to ensure compliance if they want to continue to deliver digital products to the EU market once the CRA is in force.

Key requirements within the CRA include implementing robust security measures throughout the product life-cycle, adopting secure development practices and implementing proactive vulnerability management processes.

This session will show how a number of the requirements for the CRA can be achieved by use of a number of open source Python tools.

PyCon: Security

Generative-AI: Usecase-Specific Evaluation of LLM-powered Applications

Dr. Homa Ansari

This talk addresses the critical need for usecase-specific evaluation of Large Language Model (LLM)-powered applications, highlighting the limitations of generic evaluation benchmarks in capturing domain-specific requirements. It proposes a workflow for designing more reliable evaluatios to optimize LLM-based applications, consisting of three key activities: human-expert evaluation and benchmark dataset curation, creation of evaluation agents, and alignment of these agents with human evaluations using the curated datasets. The workflow produces two key outcomes: a curated benchmark dataset for testing LLM applications and an evaluation agent that scores their responses. The presentation further addresses the limitations, and best practices to enhance the reliability of evaluations, ensuring LLM applications are better tailored to specific use cases.

PyData: Natural Language Processing & Audio (incl. Generative AI NLP)

Getting Started with Bayes in Engineering: Implementing Kalman Filters with RxInfer.jl

Victor Flores Terrazas

Bayesian methods are not commonly seen in Civil Engineering and Structural Dynamics. In this talk we explore how RxInfer.jl and the Julia Programming Language can simplify Bayesian modeling by implementing a Kalman filter for tracking the dynamics of a structural system. Perfect for engineers, researchers, and data scientists eager to apply probabilistic modelling and Bayesian methods to real-world engineering challenges.

PyData: Research Software Engineering

Guiding data minds: how mentoring transforms careers for both sides

Anastasia Karavdina

Mentorship is a powerful way to shape careers while building meaningful connections in the data field. In this talk, I’ll share my journey as a professional mentor, what the role entails, and the impact it has on both mentees and mentors. Learn how mentorship drives growth, fosters innovation, and creates value for the data community—and why you should consider stepping into this rewarding role.

General: Community & Diversity

Information Retrieval Without Feeling Lucky: The Art and Science of Search

Search is everywhere, yet effective Information Retrieval remains one of the most underestimated challenges in modern technology. While Retrieval-Augmented Generation has captured significant attention, the foundational element - Information Retrieval - often remains underexplored.

In this talk, we put Information Retrieval center stage by asking:
How do we know that user queries and data 'speak' the same language?
How do we evaluate the relevance and completeness of search results? And how do we prioritize what gets displayed? Or do we even want to hide specific content?

We try to answer these questions by introducing the audience to the art and science of Information Retrieval, exploring metrics such as precision, recall, and desirability. We’ll examine key challenges, including ambiguity, query relaxation, and the interplay between sparse and dense search techniques. Through a live demo using public content from Sendung mit der Maus, we show how hybrid search improves upon vector and keyword based search in isolation.

General: Others

Secure “Human in the Loop” Interactions for AI Agents

Juan Cruz Martinez

Explore the power of Human-in-the-Loop (HITL) for GenAI agents! Learn how to build AI systems that augment your abilities, not replace your judgment, especially when high-stakes actions are involved. This session will focus on practical implementation using Python and Langchain to stay in control.

PyData: Generative AI

Supercharge Your Testing with inline-snapshot

Snapshot tests are invaluable when you are working with large, complex, or frequently changing expected values in your tests.
Introducing inline-snapshot, a Python library designed for snapshot testing that integrates seamlessly with pytest, allowing you to embed snapshot values directly within your source code.
This approach not only simplifies test management but also boosts productivity by improving the maintenance of the tests.
It is particularly useful for integration testing and can be used to write your own abstractions to test complex Apis.

17:50

Build a personalized Commute agent in Python with Hopsworks, LangGraph and LLM Function Calling

Javier de la Rúa Martínez

The invention of the clock and the organization of time in zones have helped synchronize human activities across the globe. While timekeepers are better at planning and sticking to the plan, time optimists somehow believe that time is malleable and extends the closer the deadline. Nevertheless, whether you are an organized timekeeper or a creative timebender, external factors can affect your commute.

In this talk, we will define the different components necessary to build a personalized commute virtual agent in Python. The agent will help you analyze your historical lateness records, estimate future delays, and suggest the best time to leave home based on these predictions. It will be powered by a LLM and will use a technique called Function Calling to recognize the user intent from the conversation history and provide informed answers.

PyData: Data Handling & Engineering

From Idea to Integration: An Intro to the Model Context Protocol (MCP)

The Model Context Protocol (MCP) has emerged as a standard for connecting Large Language Models with diverse data sources and enabling interactions with other systems. In this talk, we’ll introduce the MCP standard and demonstrate how to build a MCP Server using real world examples. We’ll then explore its applications, showing how it empowers developers and makes data from complex systems accessible to non-technical users. Finally, we’ll dive into recent protocol updates, including improvements to Streamable HTTP transport and security enhancements, and share practical strategies for deploying MCP servers as well as clients.

PyData: Generative AI

Is Prompt Engineering Dead? How Auto-Optimization is Changing the Game

Iryna Kondrashchenko, Oleh Kostromin

The rise of LLMs has elevated prompt engineering as a critical skill in the AI industry, but manual prompt tuning is often inefficient and model-specific. This talk explores various automatic prompt optimization approaches, ranging from simple ones like bootstrapped few-shot to more complex techniques such as MIPRO and TextGrad, and showcases their practical applications through frameworks like DSPy and AdalFlow. By exploring the benefits, challenges, and trade-offs of these approaches, the attendees will be able to answer the question: is prompt engineering dead, or has it just evolved?

PyData: Natural Language Processing & Audio (incl. Generative AI NLP)

Zeiss Plenary (Spectrum)

Modern NLP for Proactive Harmful Content Moderation

Daryna Dementieva

Despite an array of regulations implemented by governments and social media platforms worldwide (i.e. famous DSA), the problem of digital abusive speech persists. At the same time, rapid advances in NLP and large language models (LLMs) are opening up new possibilities—and responsibilities—for using this technology to make a positive social impact. Can LLMs streamline content moderation efforts? Are they effective at spotting and countering hate speech, and can they help produce more proactive solutions like text detoxification and counter-speech generation?

In this talk, we will dive into the cutting-edge research and best practices of automatic textual content moderation today. From clarifying core definitions to detailing actionable methods for leveraging multilingual NLP models, we will provide a practical roadmap for researchers, developers, and policymakers aiming to tackle the challenges of harmful online content. Join us to discover how modern NLP can foster safer, more inclusive digital communities.

PyData: Natural Language Processing & Audio (incl. Generative AI NLP)

Streamlining Python deployment with Pixi: A Perspective from production

In our quest to improve Python deployments, we explored Pixi, a tool designed to enhance dependency management within the Conda ecosystem. This talk recounts our experience integrating Pixi into a setup used in production. We leveraged Pixi to create lockfiles, ensuring consistent builds, and to automate deployments via CI/CD pipelines. This integration led to greater reliability and efficiency, minimizing deployment errors and allowing us to concentrate more on development. Join us as we share how Pixi transformed our deployment process and offer insights into optimizing your own workflows.

PyCon: MLOps & DevOps

Streamlining the Cosmos: Pythonic Workflow Management for Astronomical Analysis

Raphael Hviding

Astronomical surveys are growing rapidly in complexity and scale, necessitating accurate, efficient, and reproducible reduction and analysis pipelines. In this talk we explore Pythonic workflow managers to streamline processing large datasets on distributed computing environments.

Modern astronomy generates vast datasets across the electromagnetic spectrum. NASA's flagship James Webb Space Telescope (JWST) provides unprecedented observations that enable deep studies of distant galaxies, cosmic structures, and other astrophysical phenomena. However, these datasets are complex and require intricate calibration and analysis pipelines to transform raw data into meaningful scientific insights.

We will discuss the development and deployment of Pythonic tools, including snakemake and pixi, to construct modular, parallelized workflows for data reduction and analysis. Attendees will learn how these tools automate complex processing steps, optimize performance in distributed computing environments, and ensure reproducibility. Using real-world examples, we will illustrate how these workflows simplify the journey from raw data to actionable scientific insights.

PyData: PyData & Scientific Libraries Stack

The earth is no longer flat - introducing support for spherical geometries in Spherely and GeoPandas

Joris Van den Bossche

The geometries in GeoPandas, using the Shapely library, are assumed to be in projected coordinates on a flat plane. While this approximation is often just fine, for global data this runs into its limitations. This presentation introduces spherely, a Python library for working with vector geometries on the sphere, and its integration into GeoPandas.

PyData: PyData & Scientific Libraries Stack

Zero Code Change Acceleration: familiar interfaces and high performance

The PyData ecosystem is home to some of the best and most popular tools for doing data-science. Every data-scientist alive today has used pandas and scikit-learn and even Large Language Models know how to use them! For many years there have also been alternative implementations with similar interfaces and libraries with completely new approaches that focus on achieving the ultimate in performance and hardware acceleration. This talk will look at the recent efforts to give users the best of both worlds: a familiar and widely used interface as well as high performance.

PyData: PyData & Scientific Libraries Stack

🦀 Rüstzeit: Asynchronous Concurrency in Python & Rust

Many Python developers are enhancing their Rust knowledge and want to take the next step in translating their understanding of advanced concepts like asynchronous programming.

In this talk, I'll help you take that step by juxtaposing Python's asyncio with Rust's async ecosystems, tokio and async-std. Through real-world examples and insights from conversations with graingert, co-author of Python's Anyio, we'll explore how each language approaches asynchronous execution, highlighting similarities and differences in syntax, performance, and ecosystem support.

This talk aims to persuade you that by leveraging Rust's powerful type system and compiler guarantees, we can build fast, reliable async code that's less prone to race conditions and concurrency bugs. Whether you're a Pythonista venturing into Rust or a Rustacean curious about Python's concurrency model, this session will provide practical insights to help you navigate async programming across both languages.

Welcome to Rüstzeit: Prepare to navigate async programming across both ecosystems.

18:30

Lightning Talks (1/2)

Lightning Talks at PyCon DE & PyData are short, 5-minute presentations open to all attendees. They’re a fun and fast-paced way to share ideas, showcase projects, spark discussions, or raise awareness about topics you care about — whether technical, community-related, or just inspiring. No slides are required, and talks can be spontaneous or prepared. It’s a great chance to speak up and connect with the community!

Please note: community conference and event announcements are limited to 1 minute only. All event announcements will be collected in a slide slide deck.

General: Others

Zeiss Plenary (Spectrum)

09:00

09:00

5min

Announcements

Zeiss Plenary (Spectrum)

Mini-Pythonistas: Coding, Experimenting, and Exploring with Zümi!

Dr. Marisa Mohr, Anna-Lena Popkes, Hannah Hepke, Daniel Hieber

Please note, this is a children's workshop. Recommended age 10-16 years. Experienced use of keyboard and mouse, first words in English (for programming) are required. //

Welcome, mini-Pythonistas! In this workshop, we’ll dive into the world of Zümi, a programmable car that’s much more than just wheels and motors. With built-in sensors, lights, and a camera, Zümi can learn to recognize colors, respond to gestures, and even identify faces — all with your help!

PyData: Embedded Systems & Robotics

09:05

Chasing the Dark Universe with Euclid and Python: Unveiling the Secrets of the Cosmos

Guadalupe Canas Herrera

The ESA Euclid mission, launched in July 2023, is on a quest to unravel the mysteries of dark energy and dark matter: the enigmatic components that make up 95% of the Universe. By mapping one-third of the sky with unprecedented precision, Euclid is building the largest 3D map of the cosmos.

This talk explores how cosmologists bridge theory and and Euclid observation to reveal the hidden nature of dark energy and the dark matter. We will delve into the challenges of cosmological inference, where advanced statistical methods and Python-based pipelines compare theoretical models against Euclid's vast datasets, and we will explain how Bayesian inference, machine learning, and state-of-the-art simulations are revolutionizing our understanding of the cosmos.

Zeiss Plenary (Spectrum)

09:50

09:50

25min

Coffee Break

Zeiss Plenary (Spectrum)

09:50

25min

Coffee Break

Titanium3

09:50

25min

Coffee Break

Helium3

09:50

25min

Coffee Break

Platinum3

09:50

25min

Coffee Break

Europium2

09:50

25min

Coffee Break

Hassium

09:50

25min

Coffee Break

Palladium

09:50

25min

Coffee Break

Ferrum

09:50

25min

Coffee Break

Dynamicum

09:50

25min

Coffee Break

OpenSpace

10:15

Algorithmic Music Composition With Python

Hendrik Niemeyer

Computers have long been an integral part of creating music. Virtual instruments and digital audio workstations make creating music easy and accessible. But how do programming languages and especially Python fit into this? Python can serve as a tool for creating musical notation
and MIDI files.

Throughout the session, you’ll learn how to:

Use Python to create melodies, harmonies, and rhythms.
Generate music based on rules, randomness, and mathematical principles.
Visualize and export your compositions as MIDI and sheet music.

By the end of the talk, you’ll have a clear understanding of how to turn simple algorithms into expressive musical works.

PyCon: Python Language & Ecosystem

Zeiss Plenary (Spectrum)

BayBE: A Bayesian Back End for Experimental Planning in the Low-To-No-Data Regime

Martin Fitzner, Alexander Hopp, Adrian Šošić

From coffee machine settings to chemical reactions to website AB testing - iterative make-test-learn cycles are ubiquitous. The Bayesian Back End (BayBE) is an open-source experimental planner enabling users to smartly navigate such black-box optimization problems in iterative settings. This tutorial will i) introduce the core concepts enabled by combining Bayesian optimization and machine learning; ii) explain our software design choices, robust tests and open-source libraries this is built on; and iii) provide a short practical hands-on session.

PyData: PyData & Scientific Libraries Stack

Blazing-Fast Python in Your Database: Unlocking Data Science at Scale with Exasol

Alexander Stigsen

What if your Python models could run inside your database—at scale, with parallel execution, and zero data movement? Meet Exasol: a high-performance Analytics Engine with native Python support and a massively parallel processing (MPP) engine. In this session, you’ll learn how to run Python directly where your data lives using user-defined functions (UDFs) and customizable script language containers. Whether you're doing forecasting, categorization, or calling APIs in real time, Exasol enables fast, scalable Python execution—perfect for demanding data science workflows. We’ll share real-world use cases, including large-scale model inference across thousands of sensors. If you're tired of bottlenecks and batch jobs, this is your shortcut to blazing-fast, in-database Python.

PyData: Machine Learning & Deep Learning & Statistics

Building versatile operating setups for real world use and testing with Python and the Raspberry Pi

Rosenxt is the host of a number of ventures aiming to provide next level solutions for demanding problems in a variety of industries based on decades of engineering excellence.

Some of them address challenges in water environments ranging from water pipelines to offshore applications.
As differing as these areas may seem, regarding the solutions we build for them they have a lot in common.

Whether its the necessary power supply, movement and steering concepts or sensing approaches.
All of them benefit from generalized, smart solutions that we design as components that can later be orchestrated and configured in various setups to fulfill quite different purposes.

This presentation explores the versatility of leveraging a Raspberry Pi based hardware platform combined with a Python based application stack to bridge development and deployment of various basic components, such as motors and motor controllers, lift foils, steering units and controls.
By utilizing a unified platform, we demonstrate how the same system can seamlessly transition from test bench measurements during hardware component development to real-world applications for various industries.

The talk highlights how this approach can create a robust framework to help streamlining workflows, enhance scalability and reduce costs.

PyData: Embedded Systems & Robotics

Career Path Experience Stories

Kristina Khvatova

As part of the PyConDE & PyData 2025 Conference, we would like to present an initiative aimed primarily at students and those just starting their careers in computer science. Our goal is to showcase the diverse career paths possible and break some myths about typical job skills and responsibilities relevant, so as to inspire and encourage their journey.

General: Education, Career & Life

Design, Generate, Deploy: Contract-First with FastAPI

Dr. Evelyne Groen, Kateryna Budzyak

This talk explores a contract-first approach to API development using the OpenAPI generator, a powerful tool for automating API generation from a standardized specification. We will cover (1) what would you need to run to have a standard implementation of the FastAPI endpoints and data models; (2) how to customize the mustache templates that are used to generate the API stubs; (3) share some ideas how to customize the CLI and (4) how to maintain the contract and how to handle breaking changes to the contract. We will close the session with a discussion of the challenges of implementing the OpenAPI generator.

PyCon: MLOps & DevOps

Multi-tenant Conversational Analytics

Rodel van Rooijen

Ever wondered how to use GenAI to enable self-service analytics through prompting? In this talk, I will share my experience of building a multi-tenant conversational analytics set-up that is built into a Software-as-a-Service (SaaS) platform. This talk is intended for AI engineers, data scientists, software engineers and anyone interested in using GenAI to power conversational analytics using open-source tools.

I will discuss the challenges faced in designing and implementing, as well as the lessons learned along the way. We'll answer questions such as, why offer analytics through prompting? Why multi-tenancy and makes it so difficult? How to build it into an existing product? What makes open-source the preferred choice over proprietary solutions? What could the implications be for the analytics field?

PyData: Natural Language Processing & Audio (incl. Generative AI NLP)

Probably Fun: Board Games to teach Data Science

Dr. Kristian Rother, Paula Gonzalez Avalos

In this tutorial, you will speed-date with board and card games that can be used to teach Data Science. You will play one game for 15 minutes, reflect on the Data Science concepts it involves, and then rotate to the next table.

As a result, you will experience multiple ideas that you can use to make complex ideas more understandable and enjoyable. We would like to demonstrate how gamification can not only used to produce short puzzles and quizzes, but also as a tool to reason complex problem-solving strategies.

We will bring a set of carefully selected games that have been proven effective in teaching statistics, programming, machine learning and other Data Science skills. We also believe that it is probably fun to participate in this tutorial.

General: Education, Career & Life

Scaling Python: An End-to-End ML Pipeline for ISS Anomaly Detection with Kubeflow

Christian Geier, Henrik Sebastian Steude

Building and deploying scalable, reproducible machine learning pipelines can be challenging, especially when working with orchestration tools like Slurm or Kubernetes. In this talk, we demonstrate how to create an end-to-end ML pipeline for anomaly detection in International Space Station (ISS) telemetry data using only Python code.

We show how Kubeflow Pipelines, MLFlow, and other open-source tools enable the seamless orchestration of critical steps: distributed preprocessing with Dask, hyperparameter optimization with Katib, distributed training with PyTorch Operator, experiment tracking and monitoring with MLFlow, and scalable model serving with KServe. All these steps are integrated into a holistic Kubeflow pipeline.

By leveraging Kubeflow's Python SDK, we simplify the complexities of Kubernetes configurations while achieving scalable, maintainable, and reproducible pipelines. This session provides practical insights, real-world challenges, and best practices, demonstrating how Python-first workflows empower data scientists to focus on machine learning development rather than infrastructure.

PyCon: MLOps & DevOps

Unforgettable, that's what you are: Evaluating Machine Unlearning and Forgetting

Katharine Jarmul

Can deep learning/AI models forget? In this talk, you'll explore the realm of machine unlearning, where researchers and practitioners aim to remove memorized examples from machine learning models. This is relevant for training increasingly overparameterized models and growing GDPR/Privacy concerns with large scale model development and use.

PyData: Machine Learning & Deep Learning & Statistics

10:45

10:45

15min

Stage Set-Up

Zeiss Plenary (Spectrum)

10:55

Composable AI: Building Next-Gen AI Agents with MCP

At Blue Yonder, we're embarking on a journey toward building composable AI agents using Model Context Protocol (MCP). We're discovering firsthand the challenges of integrating diverse products and APIs into useful, context-aware agents. In this talk, I'll discuss our early experiences, the challenges we've faced, and why MCP is emerging as a potential game changer for developing scalable, flexible AI solutions.

PyData: Generative AI

Navigating the Security Maze: An Interactive Adventure

Clemens Hübner

How to integrate security into a software development project? Without jeopardizing timeline or budget? You decide!
This interactive session covers crucial decisions for software security, and the audience decides how the story ends...

PyCon: Security

Oh, no! Users love my GenAI-Prototype and want to use it more.

Thomas Prexl, Frank Rust

Demos and prototypes for generative AI (GenAI) projects can be quickly created with tools like Streamlit, offering impressive results for users within hours. However, scaling these solutions from prototypes to robust systems introduces significant challenges. As user demand grows, hacks and workarounds in tools like Streamlit lead to unreliability and debugging frustrations. This talk explores the journey of overcoming these obstacles, evolving to a stable tech stack with Qdrant, Postgres, Litellm, FastAPI, and Streamlit. Aimed at beginners in GenAI, it highlights key lessons.

PyCon: MLOps & DevOps

Outgrowing your node? Zero stress scaling with cuPyNumeric.

Many data and simulation scientists use NumPy for its ease of use and good performance on CPU. This approach works well for single-node tasks, but scaling to handle larger datasets or more resource-intensive computations introduces significant challenges. Not to mention, using GPUs requires another level of complexity. We present the cuPyNumeric library, which gives developers the same familiar NumPy interface, but seamlessly distributes work across CPUs and GPUs.
In this talk we showcase the productivity and performance of cuPyNumeric library on one of the user's examples covering some detail on its implementation.

PyCon: Programming & Software Engineering

Scalable Python and SQL Data Engineering without Migraines

This session is for data and ML engineers with a basic understanding of data engineering and Python. It shows how to easily use Python code in Snowflake Notebooks to create data pipelines. By the end, you’ll know how to build and process data pipelines with Python.

PyData: Machine Learning & Deep Learning & Statistics

Serverless Orchestration: Exploring the Future of Workflow Automation

Tim Bossenmaier

Orchestration is a typical challenge in the data engineering world. Scheduling your data transformation jobs via CRON-jobs is cumbersome and error-prone. Furthermore, with an increasing number of jobs to manage it gets in-oversee able. Tools like Apache Airflow, Dagster, Luigi, and Prefect are known for addressing these challenges but often require additional resources or investment. With the advent of serverless orchestration tools, many of these disadvantages are mitigated, offering a more streamlined and cost-effective solution.

This session provides a comprehensive overview of combining serverless architecture with orchestration. We will start by defining the core concepts of orchestration and serverless technologies and discuss the benefits of integrating them. The talk will then analyze solutions available in the cloud vendor space. Attendees will leave with a well-rounded understanding of the tools and strategies available in serverless orchestration.

PyCon: Programming & Software Engineering

11:00

AI in Reality Fireside Chat: Enterprise AI & Open‑Source Innovation

Alexander CS Hendorf, Dr. Alexander Beck, Walid Mehanna, Ines Montani

This fireside chat brings together leading voices from industry and open-source to explore how artificial intelligence is being meaningfully integrated into enterprise environments—beyond the buzzwords. Moderated by Alexander CS Hendorf, the conversation features Walid Mehanna (Chief Data Officer, Merck), Dr. Alexander Beck (CTO, Quoniam), and Ines Montani (co-founder explosion.ai, spaCy), who share their diverse perspectives from pharmaceuticals, finance, and AI tooling.

Together, they’ll explore the cultural, technical, and ethical dimensions of AI adoption in large organizations, the growing influence of open-source ecosystems, and the long-term vision required to build sustainable, human-centered AI systems. This session is designed for those who want to move past the hype and better understand what real-world innovation at scale looks like—and what it demands from leadership, infrastructure, and community.

General: Others

Zeiss Plenary (Spectrum)

11:35

Beyond Alembic and Django Migrations

ORMs like Django and SQLAlchemy have become indispensable in Python development, simplifying the interaction between applications and databases. Yet, their built-in schema migration tools often fall short in projects that require advanced database features or robust CI/CD integration.

In this talk, we’ll explore how you can go beyond the limitations of your ORM’s migration tool. Using Atlas—a language-agnostic schema management tool—as a case study, we’ll demonstrate how Python developers can automate migration planning, leverage advanced database features, and seamlessly integrate database changes into modern CI/CD pipelines.

PyCon: Django & Web

Bias Meets Bayes: A Bayesian Perspective on Improving Model Fairness

Bias in machine learning models remains a pressing issue, often disproportionately affecting the most vulnerable groups in society. This talk introduces a Bayesian perspective to effectively tackle these challenges, focusing on improving fairness by modeling and addressing bias directly.
You will learn about the interplay between uncertainty, equity, and predictive accuracy, while gaining actionable insights to improve fairness in diverse applications. Using a practical example of a risk-scoring model trained on data with underrepresented minority groups, I will showcase how Bayesian methods compare to traditional techniques, demonstrating their unique potential to mitigate bias while maintaining performance.

PyData: Machine Learning & Deep Learning & Statistics

Bridging the gap: unlocking SAP data for data lakes with Python and PySpark via SAP Datasphere

Rostislaw Krassow

SAP's data often remains locked away, hindering the creation of a complete data picture. This talk presents a hands-on proof of concept leveraging SAP Datasphere, Python and PySpark to bridge an Azure-based, data mesh-inspired open data lake with a centralized SAP BI environment.

This presentation will delve into the architecture of SAP Datasphere and its integration interfaces with Python. It will explore network integration, authentication, authorization and resource management options, as well as data integration patterns. The presentation will summarize the evaluated features and limitations discovered during the PoC.

PyData: Data Handling & Engineering

Going Global: Taking code from research to operational open ecosystem for AI weather forecasting

When I was hired as a Scientist for Machine Learning, experts said ML would never work in weather forecasting. Nowadays, I get to contribute to Anemoi, a full-featured ML weather forecasting framework used by international weather agencies to research, build, and scale AI weather forecasting models.

The project started out as a curiosity by my colleagues and soon scaled as a result of its initial success. As machine learning stories go, this is a story of change, adaptation and making things work.

In this talk, I'll share some practical lessons: how we evolved from a mono-package with four people working on it to multiple open-source packages with 40+ internal and external collaborators. Specifically, how we managed the explosion of over 300 config options without losing all of our sanity, building a separation of packages that works for both researchers and operations teams, as well as CI/CD and testing that constrains how many bugs we can introduce in a given day. You'll learn concrete patterns for growing Python packaging for ML systems, and balancing research flexibility with production stability. As a bonus, I'll sprinkle in anecdotes where LLMs like chatGPT and Copilot massively failed at facilitating this evolution.

Join me for a deep dive into the real challenges of scaling ML systems - where the weather may be hard to predict, but our code doesn't have to be.

PyCon: MLOps & DevOps

Reinventing Streamlit

Dreaming of creating sleek, interactive web apps with just Python? Streamlit is great for dashboards, but what if your needs go beyond that? Discover how Reflex.dev, a cutting-edge full-stack Python framework, lets you level up from dashboards to full-fledged web apps!

PyCon: Django & Web

Securing Generative AI: Essential Threat Modeling Techniques

Elizaveta Zinovyeva

Generative AI development introduces unique security challenges that traditional methods often overlook. This talk explores practical threat modeling techniques tailored for AI practitioners, focusing on real-world scenarios encountered in daily development. Through relatable examples and demonstrations, attendees will learn to identify and mitigate common vulnerabilities in AI systems. The session covers user-friendly security tools and best practices specifically designed for AI development. By the end, participants will have practical strategies to enhance the security of their AI applications, regardless of their prior security expertise.

PyData: Generative AI

12:20

12:20

60min

Lunch Break

Zeiss Plenary (Spectrum)

12:20

60min

Lunch Break

Titanium3

12:20

60min

Lunch Break

Helium3

12:20

60min

Lunch Break

Platinum3

12:20

60min

Lunch Break

Europium2

12:20

60min

Lunch Break

Hassium

12:20

60min

Lunch Break

Palladium

12:20

60min

Lunch Break

Ferrum

12:20

60min

Lunch Break

Dynamicum

13:20

13:20

5min

Announcements

Zeiss Plenary (Spectrum)

13:25

Machine Learning Models in a Dynamic Environment

Isabel Drost-Fromm

"We've only tested the happy path - now users are finding all sorts of creative ways to break the app."

What is already a cause for headaches in traditional software engineering turns into a large challenge when the application is based on machine learning models: Data distribution may change from training phase to deployment. Even worse, humans interacting with the model may adjust their behaviour to the model making the gap between original training environment and deployment even larger. When deployed in a public environment the model may be exposed to users trying to game the system. When re-trained it may be exposed to users trying to poison the pool of training data.

We will take a tour of historic cases of models being gamed: What are the lessons we learnt a long time ago building e-mail spam filters? What happened when high search engine rankings started to be linked to monetary income? How can personalization and targeted advertising be exploited to influence public discourse?

“… it should be clear that improvements in communication tend to divide mankind …” by Harold Innis in Changing Concepts of Time

This keynote will turn interactive engaging the audience in sharing their stories on users playing interesting games with deployed models - including counter moves rolled out.

If we are to learn from IT security experience, one important ingredient to address these issues is a combination of collaboration and transparency - across organisations.

Zeiss Plenary (Spectrum)

14:20

AI Agents of Change: Creating, Reflecting, and Monetizing

Paloma Oliveira, Tereza Iofciu

Create, reflect, and earn—with purpose. In this workshop, you’ll not only build your own AI agent but also confront the ethical questions it raises, from its impact on jobs to its potential for social good. Together, we’ll explore how to harness AI for empowerment while uncovering pathways to turn your skills into meaningful value.

This workshop is designed to equip Python enthusiasts with the tools to create their own AI agent while fostering a deeper understanding of the societal implications of this technology. Through hands-on learning, collaborative discussions, and practical monetization strategies, you’ll leave with more than just code—you’ll gain a vision of how AI can be wielded responsibly and profitably.

PyData: Generative AI

Analyze data easily with duckdb - and the implications on data architectures

Matthias Niehoff

duckdb is increasingly becoming a universal tool for accessing and analyzing data. In this talk I will show with slides and live demo what duckdb is capable of and will dive deeper in how it will influence modern data architectures.

PyData: Data Handling & Engineering

Dataframely — A declarative, 🐻‍❄️-native data frame validation library

Oliver Borchert, Daniel Elsner

Understanding the structure and content of data frames is crucial when working with tabular data — a core requirement for the robust pipelines we build at QuantCo.

Libraries such as pandera or patito already exist to ease the process of defining data frame schemas and validating that data frames comply with these schemas. However, when building production-ready data pipelines, we encountered limitations of these libraries. Specifically, we were missing support for strict static type checking, validation of interdependent data frames, and graceful validation including introspection of failures.

To remedy the shortcomings of these libraries, we started building dataframely at the beginning of last year. Dataframely is a declarative data frame validation library with first-class support for polars data frames.

Over the last year, we have gained experience in using dataframely both for analytical and production code across several projects. The result was a drastic improvement of the legibility of our pipeline code and our confidence in its correctness. To enable the wider data engineering community to benefit from similar effects, we have recently open-sourced dataframely and are keen on introducing it in this talk.

PyData: Data Handling & Engineering

Duplicate Code Dilemma: Unlocking Automation with Open Source!

Raana Saheb-Nassagh

"Don't Repeat Yourself" – a phrase that we have all heard many times. In this talk, we will have an overview how to deal with code duplication and how open-source template libraries such as Copier can assist us in managing similarly structured repositories. Furthermore, we will explore how code updates can be automated with the help of open-source libraries like Renovate Bot. By the end of this session, you will gain insights into these solutions while also questioning whether they truly eliminate repetition or merely contribute to another cycle of automation.

PyCon: Programming & Software Engineering

Machine Reasoning and System 2 Thinking

Raw large language models struggle with complex reasoning. New techniques have
emerged that allow these models to spend more time thinking before giving an answer.
Direct token sampling can be seen as system-1 thinking and explicit step-by-step
reasoning as system-2. How can this reasoning ability be improved and what is the future?

PyData: Generative AI

Oh my license! – Achieving order by automation in the license chaos of your dependencies

License issues can haunt you at night.
You spend days, weeks, and months developing beautiful software.
But then it happens.
You realize that an essential dependency is GPL-3.0 licensed.

All your code is now infected with this license.
Now you are forced to either:
1. Rewrite all parts relying on the other library
2. Open-source your codebase under the GPL-3.0 license

How could this have been avoided?

Join the talk and find out!
First, we’ll give you a brief introduction to different software licenses and their implications.
Second, we’ll show you how to automate your license checking using open-source software.

PyCon: Programming & Software Engineering

Safeguard your precious API endpoints built on FastAPI using OAuth 2.0

Is implementing authorization on your API endpoints an afterthought? Who should have access to your API endpoints? Is it secure? This talk covers using OAuth 2.0 to secure API endpoints built on FastAPI following industry-recognized best practices. Come on a journey with me from taking your API endpoints to being functional AND secure. When you follow secure identity standards, you’ll be equipped with a deeper understanding of the critical need for authorization.

PyCon: Security

Zeiss Plenary (Spectrum)

Unlocking the Predictive Power of Relational Data with Automated Feature Engineering

Alexander Uhlig

Relational data can be a goldmine for classical Machine Learning applications — yet extracting useful features from multiple tables, time windows, and primary-foreign key relationships is notoriously difficult. In this code tutorial, we’ll use the H&M Fashion dataset to demonstrate how getML FastProp automates feature engineering for both classification (churn prediction) and regression (sales prediction) with minimal manual effort, outperforming both Relational Deep Learning and a skilled human data scientist according to the RelBench leaderboard.

This code tutorial is perfect for data scientists looking to leverage their relational and time-series data data effectively for any kind of predictive analytics applications.

PyData: Machine Learning & Deep Learning & Statistics

Writing reliable software while depending on hazardous APIs

Romain Dorgueil

As we develop business critical software, we often need to rely on external APIs to get the job done. And all services are not born equal: although the ideal world would provide well operated APIs with over-met service levels, the real world is usually way worse than that. Timeouts, HTTP errors, cascading failures, unclear or changing contracts, approximate protocol implementations ... And even the oh-so-human bad faith while trying to pinpoint the root cause... Most of us have written hacks to handle commonly seen failures, from the quick and dirty implementation to well thought resilience patterns implementation, but this is usually hard to do correctly, and rarely a business priority to invest the correct amount of time and money on the topic. We'll present the options, both including direct dependencies (not framework dependant, although some families can emerge (async/sync ...)) and including a service/proxy based approach.

PyCon: MLOps & DevOps

15:00

Accuracy Is Not Enough: Building Trustworthy AI with Conformal Prediction

Chris Aivazidis

Building a good scoring model is just the beginning. In the age of critical AI applications, understanding and quantifying uncertainty is as crucial as achieving high accuracy. This talk highlights conformal prediction as the definitive approach to both uncertainty quantification and probability calibration, two extremely important topics in Deep Learning and Machine Learning. We’ll explore its theoretical underpinnings, practical implementations using TorchCP, and transformative impact on safety-critical fields like healthcare, robotics, and NLP. Whether you're building predictive systems or deploying AI in high-stakes environments, this session will provide actionable insights to level up your modelling skills for robust decision-making.

PyData: Machine Learning & Deep Learning & Statistics

Decoding Topics: A Comparative Analysis of Python’s Leading Topic Modeling Libraries Using Climate C

Dr. Lisa Andreevna Chalaguine

Topic modelling has come a long way, evolving from traditional statistical methods to leveraging advanced embeddings and neural networks. Python’s diverse library ecosystem includes tools like Latent Dirichlet Allocation (LDA) using gensim, Top2Vec, BERTopic, and Contextualized Topic Models (CTM). This talk evaluates these popular approaches using a dataset of UK climate change policies, considering use cases relevant to organisations like DEFRA (Department for Environment, Food & Rural Affairs). The analysis explores real-time integration, dynamic topic modelling over time, adding new documents, and retrieving similar ones. Attendees will learn the strengths, limitations, and practical applications of each library to make informed decisions for their projects.

PyData: Natural Language Processing & Audio (incl. Generative AI NLP)

Distributed file-systems made easy with Python's fsspec

Einat Orr, Barak Amar

The cloud native revolution has impacted all aspects of engineering, and data engineering is not exempt. One of the ongoing challenges in the data engineering world remains the local and distributed cloud native storage. In this talk we’ll explore working with distributed file systems in Python, through an intro to fsspec: a popular python library that is well-positioned to address the growing challenge of interacting with storage systems of different kinds in a consistent way.

In this talk we’ll show hands-on examples of working with fsspec with some of the most popular data tools in the Python community: Pandas, Tensorflow and PyArrow. We’ll demonstrate a real world implementation of fsspec and how it provides easy extensibility through open source tooling.

You’ll come away from this session with a better understanding for how to implement and extend fsspec to work with different cloud native storage systems.

PyData: Data Handling & Engineering

Quiet on Set: Building an On-Air Sign with Open Source Technologies

Learn how to build a custom On-Air sign using Apache Kafka®, Apache Flink®, and Apache Iceberg™! See how to capture events like Zoom meetings and camera usage with Python, process data with FlinkSQL, analyze trends using Iceberg, and bring it all together with a practical IoT project that easily scales out.

General: Infrastructure - Hardware & Cloud

Scraping LEGO for Fun: A Hacky Dive into Dynamic Data Extraction

Unlock the full potential of modern web scraping by combining Python, Scrapy, and Playwright to extract data from dynamic, JavaScript-heavy sites—exemplified by LEGO product pages. This talk introduces Model Context Protocol (MCP) servers for orchestrating advanced data fetching, refining CSS selectors, and integrating Large Language Models for automated code suggestions. Learn how to scale ethically, handle concurrency, and respect site policies, while maintaining flexible, maintainable pipelines for diverse use cases from research to robotics.

PyData: Data Handling & Engineering

Securing RAG Pipelines with Fine Grained Authorization

Sohan Maheshwar

Using LLMs and AI in your Enterprise? Make sure you build Fine Grained Authorization to ensure your LLMs access only the data they are authorized to.

This talk will show how you can build Relationship Based Access Control (ReBAC) for fine-grained authorization for your RAG pipelines. The talk also includes a demo using Pinecone, Langchain, OpenAI, and SpiceDB.

PyData: Generative AI

They are not unit tests: a survey of unit-testing anti-patterns

Stanislav Zmiev

The entire industry approves of unit testing but almost no one can fully agree on how to do it correctly, or even on what unit tests are. This results in unit tests often being associated with slower development cycle and an overall less enjoyable workflow. I'll show you how testing turns into hell in real enterprises with the most common anti-patterns and then I'll show you that most of them are avoidable with modern tooling like mutation testing, snapshot testing, dirty-equals, and many more. We'll discuss how to make tests speed up your development and make refactoring easy.

Zeiss Plenary (Spectrum)

15:45

15:45

30min

Coffee Break

Zeiss Plenary (Spectrum)

15:45

30min

Coffee Break

Titanium3

15:45

30min

Coffee Break

Helium3

15:45

30min

Coffee Break

Platinum3

15:45

30min

Coffee Break

Europium2

15:45

30min

Coffee Break

Hassium

15:45

30min

Coffee Break

Palladium

15:50

15:50

25min

Coffee Break

Ferrum

15:50

25min

Coffee Break

Dynamicum

16:15

Building a Self-Hosted MLOps Platform with Kubernetes

Josef Nagelschmidt

Many managed MLOps platforms, while convenient, often fall short in providing flexibility, requiring complex integrations, and causing vendor lock-in. In this talk, we’ll share our experience transitioning from managed MLOps tools to a self-hosted solution built on Kubernetes. We’ll focus on how we leveraged open-source tools like Feast, MLflow, and Ray to build a more flexible, scalable, and customizable platform that is now in use at Rewe Digital. By migrating to this self-hosted architecture, we gained greater control over our ML pipelines, reduced our dependency on third-party services, and created a more adaptable infrastructure for our ML workloads.

PyCon: MLOps & DevOps

Cache me if you can: Boosted application performance with Redis and client-side caching

Did you know Redis can notify your app about server-side data changes? This feature enables client-side tracking and caching in redis-py, helping to reduce network round-trips and optimize performance. In this talk, we explore how client-side caching works in redis-py and how you can use it to make your applications even faster.

PyData: Data Handling & Engineering

Conquering the Queue: Lessons from processing one billion Celery tasks

At Userlike, Celery is the backbone of our application, orchestrating over a 100 million tasks per month. In this talk, I’ll share real-world insights into scaling Celery, optimizing performance, avoiding common pitfalls, handling failures, and building a resilient architecture.

PyCon: Django & Web

Learnings from migrating a Flask app to FastAPI

FastAPI has been constantly growing in popularity during the last years. A lot of this growth is driven by its relative simplicity and ease-of-use. In this talk, we'll discuss some practical insights into building a FastAPI application, based on my experience of migrating an existing Flask prototype to FastAPI.

We'll explore how FastAPI's core features like Pydantic integration and dependency injection can improve API development, while also talking about the drawbacks of FastAPI.

PyCon: Django & Web

Optimizing in the Python Ecosystem – Powered by Gurobi

Join us as we explore integrating Gurobi and prescriptive analytics into your Python ecosystem. In this session, you’ll discover model-building techniques that leverage NumPy and SciPy.sparse as well as the data structures of pandas. We’ll also show you how to seamlessly integrate trained regressors from scikit-learn as constraints in your optimization models. Elevate your workflows and unlock new decision-making capabilities with Gurobi in Python.

PyCon: Python Language & Ecosystem

PyLadies Panel: AI Skills & Careers

Tereza Iofciu, Anastasia Karavdina, Jesper Dramsch, Guadalupe Canas Herrera

As generative AI and autonomous agents rapidly transform the workplace, the skills required to thrive are evolving just as quickly. This panel will explore the essential AI skills that are driving career growth.

General: Education, Career & Life

Zeiss Plenary (Spectrum)

Streaming at 30,000 Feet: A Real-Time Journey from APIs to Stream Processing

Felix Leon Buck

Traditional API architectures face significant challenges in environments where repetitive and frequent requests are required to retrieve data updates. These request-response mechanisms introduce latency, as clients must continually query the server to check for changes, often receiving redundant or outdated information. This approach leads to increased network overhead, inefficient use of server resources and diminished scalability as the number of clients or requests grows. Additionally, frequent requests expand the attack surface, requiring security measures to mitigate risks such as (un-)authorised access, rate limiting and query sanitisation. Managing all of these inherent problem results in increasingly complex systems to maintain and improve while putting considerable implementation effort onto the customer.
Join to find out how transitioning to a streaming architecture can address these issues by providing proactive, event-based data delivery, reducing latency, minimising redundant processing, enhancing scalability and simplifying security management.

PyCon: Programming & Software Engineering

The future of AI training is federated

Since it’s introduction in 2016, Federated Learning (FL) has become a key paradigm to AI models in scenarios when training data cannot leave its source. This applies in many industrial settings where centralizing data is challenging due to a combination of reasons, including but not limited to privacy, legal, and logistics.

The main focus of this tutorial is to introduce an alternative approach to training AI models that is straightforward and accessible. We’ll walk you through the basics of an FL system, how to iterate on your workflow and code in a research setting, and finally deploy your code to a production environment. You will learn all of these approaches using a real-world application based on open-sourced datasets, and the open-source federated AI framework, Flower, which is written in Python and designed for Python users. Throughout the tutorial, you’ll have access to hands-on open-sourced code examples to follow along.

PyData: Machine Learning & Deep Learning & Statistics

pytest - simple, rapid and fun testing with Python

The pytest tool offers a rapid and simple way to write tests for your Python code. This training gives an introduction with exercises to some distinguishing features, such as its assertions, marks and fixtures.

Despite its simplicity, pytest is incredibly flexible and configurable. We'll look at various configuration options as well as the plugin ecosystem around pytest.

16:55

A11y Need Is Love (But Accessible Docs Help Too)

Accessible documentation benefits everyone, from developers to end users. Using the PyData Sphinx Theme as a case study, this talk dives into common accessibility barriers in documentation websites like low contrast colors, missing focus states, etc. and practical ways to address them. Learn about accessibility improvements and take part in a live accessibility audit to see how small changes can make a big difference.

PyData: PyData & Scientific Libraries Stack

Challenges and Lessons Learned While Building a Real-Time Lakehouse using Apache Iceberg and Kafka

Jonas Böer, Elena Ouro Paz

How do you build a large-scale data lakehouse architecture that makes data available for business analytics in real time, while being more cost-effective, more flexible and faster than the previous proprietary solution? With Python, Kafka and Iceberg, of course!

We built a large-scale data lakehouse based on Apache Iceberg for the Schwarz Group, Europe's largest retailer. The system collects business data from thousands of stores, warehouses and offices across Europe.

In this talk, we will present our architecture, the challenges we faced, and how Apache Iceberg is shaping up to be the data lakehouse format of the future.

PyData: Data Handling & Engineering

From Algorithm to Action: Building a DIY Distributed Trading Platform with Open Source

In this talk, we'll explore how you can implement your own distributed system for algorithmic trading leveraging the power of open source without being dependent on trading bot providers.

We will discuss different challenges occurring in HFT inter alia processing massive amounts of data with low latency and reliable risk control and how to solve them. Furthermore we will touch on the topic of regulatory requirements in trading.

These challenges will be addressed through a distributed system implemented in Python, utilizing Kafka for real-time data streaming and PostgreSQL for persistent storage. We will examine approaches to decouple the components to re-use and scale them across different markets.

Cryptocurrency markets are used as a proving ground for the PoC due to easy availability for everyone.

PyCon: Programming & Software Engineering

From LIKE to Love: Adding Proper Search to Your Django Apps

Kacper Łukawski

Is your Django application still relying on SQL LIKE queries for search? In this talk, we'll explore why basic text matching falls short of modern user expectations and how to implement proper search functionality without complexity. We'll introduce django-semantic-search, a practical package that bridges the gap between Django's ORM and powerful semantic search capabilities. Through practical code examples and real-world use cases, you'll learn how to enhance your application's search experience from basic keyword matching to understanding user intent. Whether you're building a content platform, e-commerce site, or internal tool, you'll walk away with concrete steps to implement production-ready search that your users will actually enjoy using.

PyCon: Django & Web

Lessons learned in bringing a RAG chatbot with access to 50k+ diverse documents to production

Bernhard Schäfer, Nico Mohr

Retrieval-Augmented Generation (RAG) chatbots are a key use case of GenAI in organizations, allowing users to conveniently access and query internal company data. A first RAG prototype can often be created in a matter of days. But why are the majority of prototypes still in the pilot stage? [1]

In this talk we share our insights from developing a production-grade chatbot at Merck. Our RAG chatbot for R&D experts accesses over 50,000 documents across numerous SharePoint sites and other sources. We identified three technical key success factors:
1. Building a robust data pipeline that syncs documents from source systems and that handles enterprise features such as replicating user permissions.
2. Developing a chatbot workflow from user question to answer with retrieval components such as hybrid search and reranking
3. Establishing a comprehensive evaluation framework with a clear optimization metric.

We think that many of these lessons are broadly applicable to RAG chatbots, making this talk valuable for practitioners aiming to implement GenAI solutions in business contexts.

PyData: Generative AI

Transformers for Game Log Data

The Transformer architecture, originally designed for machine translation, has revolutionized deep learning with applications in natural language processing, computer vision, and time series forecasting. Recently, its capabilities have extended to sequence-to-sequence tasks involving log data, such as telemetric event data from computer games.

This talk demonstrates how to apply a Transformer-based model to game log data, showcasing its potential for sequence prediction and representation learning. Attendees will gain insights into implementing a simple Transformer in Python, optimizing it through hyperparameter tuning, architectural adjustments, and defining an appropriate vocabulary for game logs.

Real-world applications, including clustering and user level predictions, will be explored using a dataset of over 175 million events from an MMORPG. The talk will conclude with a discussion of the model's performance, computational requirements, and future opportunities for this approach.

PyData: Machine Learning & Deep Learning & Statistics

17:45

Lightning Talks (2/2)

Lightning Talks at PyCon DE & PyData are short, 5-minute presentations open to all attendees. They’re a fun and fast-paced way to share ideas, showcase projects, spark discussions, or raise awareness about topics you care about — whether technical, community-related, or just inspiring. No slides are required, and talks can be spontaneous or prepared. It’s a great chance to speak up and connect with the community!

Please note: community conference and event announcements are limited to 1 minute only. All event announcements will be collected in a slide slide deck.

General: Others

Zeiss Plenary (Spectrum)

19:15

19:15

180min

Social Event @ darmstadtium (extra ticket required)

Zeiss Plenary (Spectrum)

09:00

09:00

5min

Announcements

Zeiss Plenary (Spectrum)

09:05

The Future of AI: Building the Most Impactful Technology Together

Leandro von Werra

In this talk, Leandro will examine the significant benefits of combining open source principles with artificial intelligence. He will walk through the need for openness in language models to build trust, maintain control, mitigate biases, and achieve true alignment and show how open models are rapidly gaining momentum in the AI landscape, challenging proprietary systems through community-driven innovation. Finally, he will then talk about emerging trends and what the community needs to build for the next generation of models.

Zeiss Plenary (Spectrum)

09:50

09:50

25min

Coffee Break

Zeiss Plenary (Spectrum)

09:50

25min

Coffee Break

Titanium3

09:50

25min

Coffee Break

Helium3

09:50

25min

Coffee Break

Platinum3

09:50

25min

Coffee Break

Europium2

09:50

25min

Coffee Break

Hassium

09:50

25min

Coffee Break

Palladium

09:50

25min

Coffee Break

Ferrum

09:50

25min

Coffee Break

Dynamicum

10:15

Agentic AI: Build a Multi-Agent Application with CrewAI

Alessandro Romano

This hands-on tutorial will dive into the fundamentals of building multi-agent systems using the CrewAI Python library. Starting from the basics, we’ll cover key concepts, explore advanced features, and guide you step-by-step through building a complete application from scratch. We’ll discuss implementing guardrails, securing interactions, and preventing query injection vulnerabilities along the way.

PyData: Generative AI

Beyond DALL-E: Advanced Image Generation Workflows with ComfyUI

Image generation using AI has made huge progress over the last years, and many people still think that DALL-E with a text prompt is the best way to generate images. There are well-known models like Stable Diffusion and Flux, which can be used with easy-to-use frontends like A1111 or Invoke AI, but if you want to do more complex or bleeding-edge workflows, you need something else. In this talk, I want to show you ComfyUI, an open-source node-based GUI written in Python where you can build complex pipelines that are otherwise only possible using plain code.

PyData: Computer Vision (incl. Generative AI CV)

Building Bare-Bones Game Physics in Rust with Python Integration

Learn how to build a minimalist game physics engine in Rust and make it accessible to Python developers using PyO3. This talk explores fundamental concepts like collision detection and motion dynamics while focusing on Python integration for scripting and testing. Ideal for developers interested in combining Rust’s performance with Python’s ease of use to create lightweight and efficient tools for games or simulations.

Data as (Python) Code

Francesco Calcavecchia

In contemporary data-driven environments, the seamless integration of data into automated workflows is paramount. The reliability of automation, however, is constantly threatened by breaking changes in the source data. The Data-as-Code (DaC) paradigm address this challenge by treating data as a first-class citizen within the software development lifecycle.

PyCon: MLOps & DevOps

Zeiss Plenary (Spectrum)

FastHTML vs. Streamlit - The Dashboarding Face Off

Tilman Krokotsch

In the right corner, we have the go-to dashboarding solution for showcasing ML models or visualizing data, STREAMLIT (*crowd cheers*). Simple yet powerful, it defends the throne of Python dashboarding, but have you ever tried to create complex interactions with it? Things like drill-downs or logins, can make your control flow become messy really quick (*crowd nods knowlingly*).

And in the left corner, the new contender in the arena of Python web frameworks which, according to its docs, "excels at building dashboards", FastHTML (*crowd whoops*). We will see if this is true, in the ultimate dashboarding face off (*crowd gasps*). By building the same dashboard, step by step, in both frameworks, investigate their strengths and weaknesses, we will see which framework can claim the crown.

PyCon: Django & Web

From Queries to Confidence: Ensuring SQL Reliability with Python

SQL remains a foundational component of data-driven applications, but ensuring the accuracy and reliability of SQL logic is often challenging. SQL testing can be cumbersome, time-consuming, and error-prone. However, these challenges can be addressed by leveraging the simplicity of Python's testing framework such as pytest, enabling clean, robust, and automated SQL testing.

The Mighty Dot - Customize Attribute Access with Descriptors

Whenever you use a dot after an object in Python you access an attribute. While this seems a very simple operation, behind the scenes many things can happen. This tutorial looks into this mechanism that is regulated by descriptors. You will learn how a descriptor works and what kind of problems it can help to solve. Python properties are based on descriptors and solve one type of problems. Descriptors are more general, allow more use cases, and are more re-usable. Descriptors are an advanced topic. But once mastered, they provide a powerful tool to hide potentially complex behavior behind a simple dot.

PyCon: Python Language & Ecosystem

Towards Intelligent Monitoring: Detecting Degraded Flame Torch Nozzles

Dominik Falkner

Flame cutting is a method where metals are efficiently cut using precise control of the oxygen jet and consistent mixing of fuel gas. The condition of the nozzle is changing over time: deposits formed during the cutting process can degrade the flame quality, reducing the precision of the cut. Traditionally, nozzles suspected of wear are sent back for manual inspection, where experts evaluated the flame visually and audibly to determine whether repair or replacement is needed. This project leverages machine learning to optimize this process by analyzing acoustic emission data.

PyData: Machine Learning & Deep Learning & Statistics

Where have all the post offices gone? Discovering neighborhood facilities with Python and OSM

Katie Richardson

When it comes to open geographic data, OpenStreetMap is an awesome resource. Getting started and figuring out how to make the most out of the data available can be challenging.

Using a personal example: frustration at the apparent lack of post offices in my neighborhood, we'll walk through examples of how to parse, filter, process, and visualize geospatial data with Python.

At the end of this talk, you will know how to process geographic data from OpenStreetMap using Python and find out some surprising info that I learned while answering the question: Where have all the post offices gone?

PyData: Data Handling & Engineering

10:55

Death by a Thousand API Versions

Stanislav Zmiev

API versioning is tough, really tough. We tried multiple approaches to versioning in production and eventually ended up with a solution we love. During this talk you will look into the tradeoffs of the most popular ways to do API versioning, and I will recommend which ones are fit for which products and companies. I will also present my framework, Cadwyn, that allows you to support hundreds of API versions with ease -- based on FastAPI and inspired by Stripe's approach to API versioning.

After this session, you will understand which approach to pick for your company to make your versioning cost effective and maintainable without investing too much into it.

PyCon: Django & Web

Filling in the Gaps: When Terraform Falls Short, Python and Typer Step In

Yuliia Barabash

Not all resources in today’s cloud environments have native Terraform providers. That’s where Python’s Typer library can step in, offering a flexible, production-ready command-line interface (CLI) framework to help fill in the gaps. In this session, we’ll explore how to integrate Typer with Terraform to manage resources that fall outside Terraform’s direct purview. We’ll share a real-life example of how Typer was used alongside Terraform to automate and streamline the management of an otherwise unsupported API. You’ll learn how Terraform can invoke Python scripts—passing arguments and parameters to control complex operations—while still benefiting from Terraform’s declarative model and lifecycle management. We’ll also discuss best practices for defining resource lifecycles to ensure easy maintainability and consistency across deployments. By the end, participants will see how combining Terraform’s robust infrastructure-as-code approach with Python’s versatility and Typer’s user-friendly CLI can create a powerful, cohesive strategy for managing even the trickiest resources in production environments.

General: Infrastructure - Hardware & Cloud

High-performance dataframe-agnostic GLMs with glum

Martin Stancsics

Generalized linear models (GLMs) are interpretable, relatively quick to train, and specifying them helps the modeler understand the main effects in the data. This makes them a popular choice today to complement other machine-learning approaches. glum was conceived with the aim of offering the community an efficient, feature-rich, and Python-first GLM library with a scikit-learn-style API. More recently, we are striving to keep up with PyData community's ongoing push for dataframe-agnosticism.
While glum was originally heavily based on pandas, with the help of narwhals, we are close to being able to fit models on any dataset that the latter supports. This talk presents our experiences with achieving this goal.

PyData: PyData & Scientific Libraries Stack

How Narwhals is silently bringing pandas, Polars, DuckDB, PyArrow, and more together

If you were writing a data science tool in 2015, you'd have ensured it supported pandas and then called it a day.

But it's not 2015 anymore, we've fast-forwarded to 2025. If you write a tool which only supports pandas, users will demand support for Polars, PyArrow, DuckDB, and so many other libraries that you'll feel like giving up.

Learn about how Narwhals allows you to write dataframe-agnostic tools which can support all of the above, with zero dependencies, low overhead, static typing, and strong backwards-compatibility promises!

PyData: PyData & Scientific Libraries Stack

Zeiss Plenary (Spectrum)

PosePIE: Replace Your Keyboard and Mouse With AI-Driven Gesture Control

Daniel Stolpmann

In this talk, we show how to leverage publicly available tools to control any game or program using hand or body movements. To achieve this, we introduce PosePIE, an open-source programmable input emulator that generates input events on virtual gamepads, keyboards and mice based on gestures recognized by using AI-driven pose estimation. PosePIE is fully configurable by the user through Python scripts, making it easily adaptable to new applications.

PyData: Computer Vision (incl. Generative AI CV)

The Foundation Model Revolution for Tabular Data

Noah Hollmann, Frank Hutter

What if we could make the same revolutionary leap for tables that ChatGPT made for text? While foundation models have transformed how we work with text and images, tabular / structured data (spreadsheets and databases) - the backbone of economic and scientific analysis - has been left behind. TabPFN changes this. It's a foundation model that achieves in 2.8 seconds what traditional methods need 4 hours of hyperparameter tuning for - while delivering better results. On datasets up to 10,000 samples, it outperforms every existing Python library, from XGBoost to CatBoost to Autogluon.

Beyond raw performance, TabPFN brings foundation model capabilities to tables: native handling of messy data without preprocessing, built-in uncertainty estimation, synthetic data generation, and transfer learning - all in a few lines of Python code. Whether you're building risk models, accelerating scientific research, or optimizing business decisions, TabPFN represents the next major transformation in how we analyze data. Join us to explore and learn how to leverage these new capabilities in your work.

PyData: Machine Learning & Deep Learning & Statistics

Using Python to enter the world of Microcontrollers

So you've happily used the Raspberry Pi for your homelab projects, of course with Python based solutions as we all do. You've been down the rabbit hole with everything about temperature and humidity measurements, energy and solar tracking, video recording and time-lapse photography, object detection and security surveillance.

You don't just buy these things of the shelve. You want to deeply understand what it takes to create such a thing, and you've been quite happy with your results so far, learned a lot.

But for many simple applications ... the power draw! Yes, it's just 5 Watts you say for using a Raspberry Pi. Not a big deal in terms of cost. But you'll always need a power adapter and a free socket.

You've heard of these guys using microcontrollers that run on batteries or even solar, for days, weeks, even months.

That's exciting, but there's also a catch. These people write code in C-like languages, they build firmware to make their projects run. And it's all bare metal! That seems very different. That'll be a steep learning curve to take ... Or is it?

Well, there's MicroPython to the rescue. Let me take you with me on a journey to make a simple microcontroller based application to read a Power Meter and send the readings over WiFi for more in depth processing somewhere else.

PyData: Embedded Systems & Robotics

11:35

Code & Community: The Synergy of Community Building and Task Automation

The Python community is built on a culture of support, inclusion, and collaboration. Sustaining this welcoming environment requires intentional community-building efforts, which often involve repetitive or time-consuming tasks. These tasks, however, can be automated without compromising their value—freeing up time for meaningful human engagement.

This talk showcases my project aimed at supporting underrepresented groups in tech, specifically through building Python communities on Mastodon and Bluesky. A key part of this initiative is the "Awesome PyLadies" repository, a curated collection of PyLadies blogs and YouTube channels that celebrates their work. To enhance visibility, I created a PyLadies bot for social media. This bot automates regular posts and reposts tagged content, significantly extending their reach and fostering an engaged community.

In this session, I’ll cover:
- The role of automation in community building
- The technical architecture behind the bot
- A hands-on demo on integrating Google’s Gemini into community tools
- Upcoming features and opportunities for collaboration

By combining Python, automation, and modern AI capabilities, we can create thriving, inclusive communities that scale impact while staying true to the human-centered ethos of open source.

PyData: Natural Language Processing & Audio (incl. Generative AI NLP)

Enhancing RAG with Fast GraphRAG and InstructLab: A Scalable, Interpretable, and Efficient Framework

Retrieval Augmented Generation (RAG) has become a cornerstone in enriching GenAI outputs with external data, yet traditional frameworks struggle with challenges like data noise, domain specialization, and scalability. In this talk, Tuhin will dive into open-source frameworks Fast GraphRAG and InstructLab, which addresses these limitations by combining knowledge graphs with the classical PageRank algorithm and Fine-tuning, delivering a precision-focused, scalable, and interpretable solution. By leveraging the structured context of knowledge graphs, Fast GraphRAG enhances data adaptability, handles dynamic datasets efficiently, and provides traceable, explainable outputs while InstructLab adds domain depth to the LLM through Fine-tuning. Designed for real-world applications, it bridges the gap between raw data and actionable insights, redefining intelligent retrieval for developers, researchers, and enterprises. This talk will showcase Fast GraphRAG’s transformative features coupled with domain specific Fine-tuning leveraging InstructLab and demonstrate its potential to elevate RAG’s capabilities in handling the evolving demands of large language models (LLMs) for developers, researchers, and businesses.

PyData: Generative AI

GitMLOps – How we are managing 100+ ML pipelines in AWS SageMaker

Scaling machine learning pipelines is no small feat - especially when you’re managing over 100 of them on AWS SageMaker. In this talk, I’ll take you behind the scenes of how our team at idealo built a Git-based MLOps framework that powers millions of real-time recommendations every minute.

I’ll share the challenges we faced, the solutions we implemented, and the lessons we learned while streamlining model versioning, deployment, and monitoring. This session is packed with actionable takeaways for ML engineers, data scientists, and DevOps professionals looking to simplify their MLOps workflows and operate efficiently at scale.

Whether you’re running a handful of pipelines or preparing to scale up, this talk will equip you with the tools and strategies to tackle MLOps with confidence.

PyCon: MLOps & DevOps

Guardians of the Code: Safeguarding Machine Learning Models in a Climate Tech World

LLMs, Machine learning and AI are everywhere, yet their security is often overlooked, leaving your systems vulnerable to serious attacks. What happens when someone tampers with your model’s input, poisons your training data, or steals your model?

In this talk, I’ll explore these risks through the lens of the OWASP Machine Learning Security Top 10 using relatable, real-world examples from the climate tech world. I’ll explain how these attacks happen, their impact, and why they matter to you as a Python developer, data scientist, or data engineer.

You’ll learn practical ways to defend your models and pipelines, ensuring they’re robust against adversarial forces. Bridging theory and practice, you'll leave equipped with insights and strategies to secure your machine learning systems, whether you’re training models or deploying them in production. By the end, you’ll have a solid understanding of the risks, a toolkit of best practices, and maybe even a new perspective on how important security is everywhere.

PyCon: MLOps & DevOps

Hands-On LLM Security: Attacks and Countermeasures You Need to Know!

Clemens Hübner, Florian Teutsch

Dive into the vulnerabilities of LLMs and learn how to prevent them
From prompt injection to data poisoning, we’ll demonstrate real-world attack scenarios and reveal essential countermeasures to safeguard your applications.

PyCon: Security

Rustifying Python: A Practical Guide to Achieving High Performance While Maintaining Observability

In this session, I’ll share our journey of migrating key parts of a Python application to Rust, resulting in over 200% performance improvement.
Rather than focusing on quick Rust-to-Python integration with PyO3, this talk dives into the complexities of implementing such a migration in an enterprise environment, where reliability, scalability, and observability are crucial.
You’ll learn from our mistakes, how we identified suitable areas for Rust integration, and how we extended our observability tools to cover Rust components.
This session offers practical insights for improving performance and reliability in Python applications using Rust.

PyCon: Programming & Software Engineering

Topological data analysis: How to quantify "holes" in your data and why?

Ondrej Draganov

Do you need to compare sets of points in a plane? Identify a potential cyclic event in high-dimensional time series data? Find the second or the third highest peak of a noisily sampled function? Topological data analysis (TDA) is not a universal hammer, but it might just be the 16 mm wrench for your 16 mm hex head bolt. There is no shortage of Python libraries implementing TDA methods for various settings, but navigating the options can be challanging without prior familiarity with the topic. In my talk I will demonstrate the utility of the tool with several simple examples, list various libraries used by the TDA community, and dive a bit deeper into the methods to explain what the libraries implement and how to interpret and work with the outputs.

PyData: PyData & Scientific Libraries Stack

Zeiss Plenary (Spectrum)

12:20

12:20

60min

Lunch Break

Zeiss Plenary (Spectrum)

12:20

60min

Lunch Break

Titanium3

12:20

60min

Lunch Break

Helium3

12:20

60min

Lunch Break

Platinum3

12:20

60min

Lunch Break

Europium2

12:20

60min

Lunch Break

Hassium

12:20

60min

Lunch Break

Palladium

12:20

45min

Lunch Break

Ferrum

12:20

45min

Lunch Break

Dynamicum

13:05

Reinforcement Learning for Finance

Dr. Yves J. Hilpisch

Reinforcement Learning and related algorithms, such as Deep Q-Learning (DQL), have led to major breakthroughs in different fields. DQL, for example, is at the core of the AIs developed by DeepMind that achieved superhuman levels in such complex games as Chess, Shogi, and Go ("AlphaGo", "AlphaZero"). Reinforcement Learning can also be beneficially applied to typical problems in finance, such as algorithmic trading, dynamic hedging of options, or dynamic asset allocation. The workshop addresses the problem of limited data availability in finance and solutions to it, such as synthetic data generation through GANs. It also shows how to apply the DQL algorithm to typical financial problems. The workshop is based on my new O'Reilly book "Reinforcement Learning for Finance -- A Python-based Introduction".

PyData: Machine Learning & Deep Learning & Statistics

What's inside the box? Building a deep learning framework from scratch.

Explore the inner workings of deep learning frameworks like TensorFlow and PyTorch by building your own in this workshop. We will start with the fundamental automatic differentiation mechanics and proceed to implementing more complex components like layers, modules and optimizers. This workshop is mainly designed for experienced data scientists, who want to expand their intuition about lower level framework internals.

PyData: Machine Learning & Deep Learning & Statistics

13:20

Electify - Retrieval-Augmented Generation for Voter Information in the 2024 European Election

Christian Liedl

In general elections, voters often face the challenge of navigating complex political landscapes and extensive party manifestos. To address this, we developed Electify, an interactive application that utilizes Retrieval-Augmented Generation (RAG) to provide concise summaries of political party positions based on individual user queries. During its first roll-out for the European Election 2024, Electify attracted more than 6,000 active users. This talk will explore its development and deployment. It will focus on its technical architecture, the integration of data from party manifestos and parliamentary speeches, and the challenges of ensuring political neutrality and providing accurate replies. Additionally, we will discuss user feedback and ethical considerations, focusing on how generative AI can enhance voter information systems.

PyData: Natural Language Processing & Audio (incl. Generative AI NLP)

Extending Python with Rust, Mojo, Cuda and C and building packages

Wolf Vollprecht, Ruben Arts

We all love Python - but we especially love it for its unique ability as a glue language.

In this talk we will show a number of ways of extending Python: using Rust, C and Cython, C++, CUDA and Mojo! We will use the pixi package manager and the open source conda-forge distribution to demonstrate how to easily build custom Python extensions with these languages.

The main challenge with custom extensions is about distributing them. The new pixi build feature makes it easy to build a Python extension into a conda package as well as wheel file for PyPI.

Pixi will manage not only Python, but also the compilers and other system-level dependencies.

PyData: PyData & Scientific Libraries Stack

From stockouts to happy customers: Proven solutions for time series forecasting in retail

Time series forecasting in the retail industry is uniquely challenging: Datasets often include stockouts that censor actual demand, promotional events cause irregular demand spikes, new product launches face cold-start issues, and diverse demand patterns within an imbalanced product portfolio create modeling challenges.
In this talk, we’ll explore proven, real-world strategies and examples to address these problems. Learn how to successfully handle censored demand caused by stockouts, effectively incorporate promotional effects, and tackle the variability of diverse products using clustering and ensembling strategies. Whether you’re a seasoned data scientist or a Python developer exploring forecasting, the goal of this session is to introduce you to the key challenges in retail forecasting and equip you with actionable insights to successfully overcome them in real-life scenarios.

PyData: Machine Learning & Deep Learning & Statistics

Zeiss Plenary (Spectrum)

Is your LLM any good at writing? Benchmarking on creative writing and editing tasks

Azamat Omuraliev

Many LLM benchmarks focus on reasoning and coding tasks. These are exciting tasks! But the majority of LLM usage is still in writing and editing related tasks, and there's a surprising lack of benchmarks on these.

In this talk you'll learn what it took to create a writing benchmark, and which model performs best!

PyData: Natural Language Processing & Audio (incl. Generative AI NLP)

Responsible AI with fmeval - an open source library to evaluate LLMs

The term "Responsible AI" has seen a threefold increase in search interest compared to 2020 across the globe. As developers, the questions like "How can we build large language model-enabled applications that are responsible and accountable to its users?" encountered in the conversation more often than before. And the discussion is further compounded by concerns surrounding uncertainty, bias, explainability, and other ethical considerations.

In this session, the speaker will guide you through fmeval, an open-source library designed to evaluate Large Language Models (LLMs) across a range of tasks. The library provides notebooks that you can integrate into your daily development process, enabling you to identify, measure, and mitigate potential responsible AI issues throughout your system development lifecycle.

PyData: PyData & Scientific Libraries Stack

Vector Streaming: The Memory Efficient Indexing for Vector Databases

Sonam Pankaj, Akshay Ballal

Vector databases are everywhere, powering LLMs. But indexing embeddings, especially multivector embeddings like ColPali and Colbert, at a bulk is memory intensive. Vector streaming solves this problem by parallelizing the tasks of parsing, chunking, and embedding generation and indexing it continuously chunk by chunk instead of bulk. This not only increase the speed but also makes the whole task more optimized and memory efficient.

The library gives many vector database supports, like Pinecone, Weavaite, and Elastic.

What we talk about when we talk about AI skills.

Paula Gonzalez Avalos

Defining what constitutes AI skills has always been ambiguous. As AI adoption accelerates across industries and the European AI Act mandates companies to ensure AI literacy among their staff, organizations face growing even more challenges in defining and developing AI competencies. In this talk, we'll present a comprehensive framework developed by the appliedAI Institute's experts that categorizes AI skills across technical, regulatory, strategic, and innovation domains. We'll also share initial data on current AI skills levels and upskilling needs and provide practical strategies for organizations to assess, develop, and acquire the AI capabilities required for their specific needs.

General: Education, Career & Life

14:00

Forecast of Hourly Train Counts on Rail Routes Affected by Construction Work

Sebastian Folz, Dr Maren Westermann

Construction work in national railroad networks often disrupts train traffic, making it vital to estimate hourly train numbers for effective re-routing. Traditionally managed by humans, this process has been automated due to staff shortages and demographic changes. DB Systel GmbH, Deutsche Bahn's IT provider, leveraged machine learning and artificial intelligence to estimate train traffic during construction. Using Python and frameworks like Pandas, scikit-learn, NumPy, PyTorch and Polars, their solution demonstrated significant benefits in performance and efficiency.

PyData: Machine Learning & Deep Learning & Statistics

Zeiss Plenary (Spectrum)

Offline Disaster Relief Coordination with OpenStreetMap and FastAPI

In natural disaster scenarios, reliable communication is crucial. This talk presents a solution for disaster relief coordination using OpenStreetMap vector maps hosted on a local device in the emergency vehicle with FastAPI, ensuring functionality without an internet connection. By integrating a database of post codes and street names, and leveraging a LORAWAN gateway to receive positional data and water levels, this system ensures access to critical information even in blackout situations.

General: Infrastructure - Hardware & Cloud

Optimizing Energy Tariffing System with Formal Concept Analysis and Dash

Dr. Irina Smirnova-Pinchukova

As a data scientist, I value the power of insightful visualizations to unlock unique interpretations of complex data. In my talk, I will introduce an elegant mathematical framework called Formal Concept Analysis (FCA), developed in the 1980s in Darmstadt.

FCA transforms binary data into concepts that can be visualized as a hierarchical graph, offering a fresh perspective on multidimensional data analysis. Leveraging this theory and its open-source Python libraries, I am developing an interactive Dash-based tool featuring interactive tables and graphs to explore data insights.

To illustrate its potential, I will showcase an optimization of the entire tariffing system of an energy provider company, highlighting how FCA can bring structure and clarity to even such tangled datasets.

PyData: Visualisation & Jupyter

Pipeline-level differentiable programming for the real world

Alessandro Angioi

Automatic Differentiation (AD) is not only the backbone of modern deep learning but also a transformative tool across various domains such as control systems, materials science, weather prediction, 3D rendering, data-driven scientific discovery, and so on. Thanks to a mature ML framework ecosystem, powered by libraries like PyTorch and JAX, AD performs remarkably well at a component level; however, integrating these components into differentiable pipelines still remains a significant challenge. In this talk, we will provide an accessible introduction to (pipeline-level) AD, demonstrate some cool applications you can build with it, and see how to build differentiable pipelines that hold up in the real world.

PyData: Research Software Engineering

Practical Python/Rust: Building and Maintaining Dual-Language Libraries

Building performant Python often means reaching for C extensions. This talk explores an alternative: leveraging Rust to create blazing-fast Python modules that also benefit the Rust ecosystem. I will share practical strategies from building semantic-text-splitter, a library for fast and accurate text segmentation used in both Python and Rust, demonstrating how to bridge the gap between these two languages and unlock new possibilities for performance and cross-language collaboration.

Using Causal thinking to make Media Mix Modeling

Carlos Trujillo

In today's data-driven landscape, understanding causal relationships is essential for effective marketing strategies. This talk will explore the link between Bayesian causal thinking and media mix modeling, utilizing Directed Acyclic Graphs (DAGs), Structural Causal Models (SCMs), and the Data Generation Process (DGP).

We will examine how DAGs represent causal assumptions, how SCMs define relationships in media mix models, and how to implement these models within a Bayesian framework. By using media mix models as causal inference tools, we can estimate counterfactuals and causal effects, offering insights into the effectiveness of media investments.

PyData: PyData & Scientific Libraries Stack

You don’t think about your Streamlit app optimization until you try to deploy it to the cloud

Darya Petrashka

Building Streamlit apps is easy for Data Scientists - but when it’s time to deploy them to the cloud, challenges like slow model loading, scalability, and security can become major hurdles. This talk bridges two perspectives: the Data Scientist who builds the app and the MLOps engineer who deploys it. We'll dive into optimizing model loading from Hugging Face Hub, implementing features like autoscaling and authentication, and securing your app against potential threats. By the end of this talk, you’ll be ready to design Streamlit apps that are functional and deployment-ready for the cloud.

PyCon: MLOps & DevOps

14:40

3 Ways to Speed up Your Regression Modeling in Python

Alexander Fischer

Linear Regression is the workhorse of statistics and data science. Some data scientists even go as far and argue that "linear regression is all you need".

In this talk, we will introduce three ways to run regression models faster by using smarter algorithms, implemented in the scikit-learn & fastreg (sparse solvers), pyfixest (Frisch-Waugh-Lovell), and duckreg (regression compression via duckdb) libraries.

PyData: Machine Learning & Deep Learning & Statistics

Building a HybridRAG Document Question-Answering System

Darya Petrashka

Retrieval Augmented Generation (RAG) is a powerful technique for searching across unstructured documents, but it often falls short when the task demands an understanding of intricate relationships between entities. GraphRAG addresses this by leveraging knowledge graphs to capture these relationships, but it struggles with scalability and handling diverse unstructured formats. In this talk, we’ll explore how HybridRAG combines the strengths of both approaches - RAG for scalable unstructured data retrieval and GraphRAG for semantic richness- to deliver accurate and contextually relevant answers. We’ll dive into its application, challenges, and the significant improvements it offers for question-answering systems across various domains.

PyData: Natural Language Processing & Audio (incl. Generative AI NLP)

Demystifying Design Patterns: A Practical Guide for Developers

Do you ever worry about your code becoming spaghetti-like and difficult to maintain?
Master the art of crafting clean, maintainable, and adaptable software by harnessing the power of design patterns. This presentation will empower you with a clear, structured understanding of these reusable solutions to address common programming challenges.

We'll delve into design patterns’ key categories: Behavioral, Structural, and Creational, as well as explore their functionality and how they can be applied in your daily development workflow. For each category, we'll also explore a practical design pattern in detail and showcase real-world applications of these patterns, along with small-scale code examples that illustrate their practical implementation.

You'll gain valuable insight into how these patterns can translate into real-world development scenarios, such as facilitating communication between objects (Behavioral), separating interfaces from implementation for flexibility (Structural), and enabling dynamic algorithm selection at runtime (Creational).

PyCon: Programming & Software Engineering

Zeiss Plenary (Spectrum)

From Rules to Reality: Python's Role in Shaping Roundnet

Roundnet is a dynamic and fast-growing sport that combines quick reaction, athleticism, and strong community. However, like many emerging sports, it faces challenges in balancing competition, optimizing rules, and increasing accessibility for both players and spectators. This is where Python and data analysis come into play.

In this talk, I'll share insights from my role as Data Lead on the International Roundnet rule committee, where we use Python-powered data analysis to make informed decisions about the future of the sport. We'll explore how analyzing gameplay patterns and testing rule changes with simulation can lead to fairer, more exciting games and attract a broader audience.

PyData: Data Handling & Engineering

Intuitive A/B Test Evaluations for Coders

A/B testing is a critical tool for making data-driven decisions, yet its statistical underpinnings—p-values, confidence intervals, and hypothesis testing—are often challenging for those without a background in statistics. Coders frequently encounter these concepts but lack a straightforward way to compute and interpret them using their existing skill set.
This talk presents a practical approach to A/B test evaluations tailored for coders. By utilizing Python’s random number generator and basic loops, it introduces bootstrapping as an accessible method for calculating p-values and confidence intervals directly from data. The goal is to simplify statistical concepts and provide coders with an intuitive understanding of how to evaluate test results without relying on complex formulas or statistical jargon.

PyData: Machine Learning & Deep Learning & Statistics

Langfuse, OpenLIT, and Phoenix: Observability for the GenAI Era

Emanuele Fabbiani

Large Language Models (LLMs) are transforming digital products, but their non-deterministic behaviour challenges predictability and testing, making observability essential for quality and scalability.

This talk presents observability for LLM-based applications, spotlighting three tools: Langfuse, OpenLIT, and Phoenix. We'll share best practices about what and how to monitor LLM features and explore each tool's strengths and limitations.

Langfuse excels in tracing and quality monitoring but lacks OpenTelemetry support and customization. OpenLIT, while less mature, integrates well with existing observability stacks using OpenTelemetry. Phoenix stands out in debugging and experimentation but struggles with real-time tracing.

The comparison will be enhanced by live coding examples.

Attendees will walk away with an improved understanding of observability for GenAI applications and will understand which tool to use for their use case.

PyCon: Python Language & Ecosystem

Switching from Data Scientist to Manager

Theodore Meynard

In this presentation, I will discuss my transition from a Data Scientist to a management role, covering key managerial responsibilities, preparation tips, and the pros and cons of the switch. The talk is particularly relevant for engineers who have recently moved into management or are considering the change, as well as those interested in understanding the challenges managers face. The session will include brief presentations followed by interactive discussions with the audience.

General: Education, Career & Life

The Forecast Whisperer: Secrets of Model Tuning Revealed

Illia Babounikau

Forecasting can often feel like interpreting vague signals—unclear yet full of potential. In this talk, we’ll cover advanced techniques for tuning forecasting models in professional settings, moving beyond the basics to explore methods that enhance both accuracy and interpretability.

You’ll learn:

How to set clear business goals for ML model tuning and align technical work with business needs, including balancing forecast granularity and accuracy and selecting statistically correct metric.

Practical data preparation methods, including business-driven data cleaning and detecting data problems with statistical and buiness driven approaches.

Advanced feature selection techniques such as recursive feature elimination and SHAP values, alongside hyperparameter tuning strategies including Bayesian optimization and ensemble methods.

How generative AI can support model tuning by automating feature generation, hyperparameter search, and enhancing model explainability through SHAP and LIME techniques.

Real-world case studies, including how Blue Yonder’s data science team optimized demand forecasting models for retail and supply chain applications.

We'll also discuss common mistakes like overfitting and data leakage, best practices for reliable validation, and the importance of domain knowledge in successful forecasting. Whether you're a seasoned data scientist or exploring time series forecasting, you'll gain advanced insights and techniques you can apply immediately.

PyData: Machine Learning & Deep Learning & Statistics

What do a tree and the human brain have in common-a not so serious introduction to digital pathology

While trees and human brains don't share that many properties regarding their domain, the analysis of the height of a tree and cancer in human brains does.
This talk provides a not-so-serious introduction to the domain of computer vision for pathological use cases.
Besides a general introduction to (digital) pathology and the technical similarities between satellite images (GeoTIFs) and pathological images (Whole-Slide Images), we will take a look at computer vision for medical tasks using Python.
Whether you have never done image processing in Python, are an expert (ready to share some tricks with me), or are just curious to see pictures of a human brain, this talk is for you.
Warning: this talk contains quite abstract pink-ish pictures of human tissue (and trees^^). If you are unsure this is something you are comfortable with (have a friend), do a quick search for "HE-stained whole-slide image".

PyData: Computer Vision (incl. Generative AI CV)

15:20

15:20

25min

Closing Session

Zeiss Plenary (Spectrum)