<?xml version='1.0' encoding='utf-8' ?>
<iCalendar xmlns:pentabarf='http://pentabarf.org' xmlns:xCal='urn:ietf:params:xml:ns:xcal'>
    <vcalendar>
        <version>2.0</version>
        <prodid>-//Pentabarf//Schedule//EN</prodid>
        <x-wr-caldesc></x-wr-caldesc>
        <x-wr-calname></x-wr-calname>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>BQPEUG@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-BQPEUG</pentabarf:event-slug>
            <pentabarf:title>Opening Session</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260414T100000</dtstart>
            <dtend>20260414T103000</dtend>
            <duration>003000</duration>
            <summary>Opening Session</summary>
            <description>Opening Session</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Plenary</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/BQPEUG/</url>
            <location>Merck Plenary (Spectrum) [1st Floor]</location>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>BFL7MQ@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-BFL7MQ</pentabarf:event-slug>
            <pentabarf:title>From Scratch to Scale: Turning LLM Code into Architecture Insights</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260414T103000</dtstart>
            <dtend>20260414T111500</dtend>
            <duration>004500</duration>
            <summary>From Scratch to Scale: Turning LLM Code into Architecture Insights</summary>
            <description>Python has been at the center of my work in machine learning and AI for more than a decade. It is where I start from scratch, experiment with ideas, and build systems that help me understand how large language models really work.

In this keynote, I will look at what it means to build and study LLMs in Python today. Starting from small, from-scratch implementations, I will show how Python and PyTorch help us understand modern model architectures, compare new designs against reference code, and learn details that papers often leave out. I will then connect those implementation lessons to current LLM trends, especially the push to reduce inference costs and KV-cache pressure as reasoning models and agentic workflows need longer contexts. At the end, I will also share a practical roadmap of libraries, open projects, and learning resources for going from first principles to real-world LLM development.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Keynote</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/BFL7MQ/</url>
            <location>Merck Plenary (Spectrum) [1st Floor]</location>
            
            <attendee>Sebastian Raschka</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>88TTRY@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-88TTRY</pentabarf:event-slug>
            <pentabarf:title>Sentinel Values in Python: Semantics, Double Dispatch, and the Limits of Typing</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260414T114500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Sentinel Values in Python: Semantics, Double Dispatch, and the Limits of Typing</summary>
            <description>Sentinel values are a fundamental but under-documented part of Python’s design. They are used to represent absence, unsupported operations, incomplete state, and to coordinate control flow between objects. Yet, they are often treated as ad-hoc implementation details.

This talk starts by clarifying what sentinel values are and why None is frequently semantically overloaded and incorrect for modelling “missing” or “unset” values. We then examine built-in sentinels such as `NotImplemented`, `Ellipsis`, and `dataclasses.MISSING`, with a detailed look at how `NotImplemented` enables double dispatch in equality and ordering operations.

The second half of the talk focuses on typing, where sentinel values expose fundamental tensions between Python’s dynamic semantics and static type systems. We will discuss:

* why Optional[T] does not mean “unset”
* why Literal appears attractive for sentinels but rarely works in practice
* what limited type narrowing is possible today and under which assumptions
* why a fully reliable, user-defined sentinel with correct narrowing is currently not achievable in a portable way

To ground this in practice, we will look at real-world patterns used in production code, including Pydantic’s experimental missing concept, and explain the trade-offs these designs make.

Finally, we will examine [PEP 661](https://peps.python.org/pep-0661/), the proposal to standardize sentinel values and their typing semantics. We will explain what it would solve, why it was deferred, and what that deferral means for library and API authors today.

The talk concludes with concrete, honest guidelines: when sentinel values are the right tool, how to design APIs around them, and how to communicate absence clearly in typed Python code without pretending the type system can do more than it currently can.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/88TTRY/</url>
            <location>Merck Plenary (Spectrum) [1st Floor]</location>
            
            <attendee>Florian Wilhelm</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>KLN78E@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-KLN78E</pentabarf:event-slug>
            <pentabarf:title>The foundation model revolution for tabular data</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260414T122500</dtstart>
            <dtend>20260414T125500</dtend>
            <duration>003000</duration>
            <summary>The foundation model revolution for tabular data</summary>
            <description>Tabular data, spreadsheets organized in rows and columns, are ubiquitous across healthcare, business and finance. The fundamental prediction task of filling in missing values of a label column based on the rest of the columns is essential for thousands of use cases of high societal and commercial value. While gradient-boosted decision trees have dominated tabular data for the past 20 years, we demonstrate that this is rapidly changing, with the foundation model revolution having arrived at tabular data. We will show the methods behind this and their extensions to causality, interpretability and robustness, and demo various agentic extensions.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Sponsored Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/KLN78E/</url>
            <location>Merck Plenary (Spectrum) [1st Floor]</location>
            
            <attendee>Frank Hutter</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>9CYWET@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-9CYWET</pentabarf:event-slug>
            <pentabarf:title>Stop Waiting, Start Shipping: Real-World Strategy for Open-Source LLMs</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260414T143000</dtstart>
            <dtend>20260414T154500</dtend>
            <duration>011500</duration>
            <summary>Stop Waiting, Start Shipping: Real-World Strategy for Open-Source LLMs</summary>
            <description>Alexander Hendorf and Sebastian Raschka sit down for a fireside chat on the current state of open-source LLMs.

With Chinese models like DeepSeek and Qwen competing directly with Llama and Mistral, the choice of capable open-source models has never been wider — so why are so many teams still waiting for the next generation instead of building with what is already here?

Questions we want to discuss:

- What role do Chinese and American OSS models play in the current competitive landscape?
- Where do open-source models still fall short of proprietary ones, and where has the gap closed?
- What biases should practitioners be aware of and how to handle them?
- Are AI agents a fundamental shift or are we seeing diminishing returns?
- What deployment strategies actually work — especially for European teams that rely on talent and domain expertise rather than hyperscaler compute?

Half the session is reserved for audience questions.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Panel</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/9CYWET/</url>
            <location>Merck Plenary (Spectrum) [1st Floor]</location>
            
            <attendee>Sebastian Raschka</attendee>
            
            <attendee>Alexander CS Hendorf</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>WAJQR7@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-WAJQR7</pentabarf:event-slug>
            <pentabarf:title>Panel: Evolution, Revolution, or Illusion? The Future of Python and Coding in the Age of AI</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260414T163000</dtstart>
            <dtend>20260414T173000</dtend>
            <duration>010000</duration>
            <summary>Panel: Evolution, Revolution, or Illusion? The Future of Python and Coding in the Age of AI</summary>
            <description>Software engineering is at a crossroads. With AI systems now capable of generating, debugging, and even reasoning about code, the very definition of programming is being challenged.
Does it still make sense to invest years learning Python, or any programming language, if machines can translate natural language specifications into working software? Are we witnessing the evolution of coding into a higher-level craft, the revolution of the software industry, or merely an illusion fueled by hype from those who benefit most?
This panel moderated by Sebastian Neubauer will confront these questions head-on. We will debate whether programming languages remain essential, whether software engineers are at risk of obsolescence, or whether the demand for engineers may actually explode in ways we cannot yet imagine. We will also explore the risks of over-reliance on AI, including potential security vulnerabilities, fragile or unexplainable systems, and the loss of deep understanding of the software we build.
Come prepared for uncomfortable questions, bold predictions, and no easy answers. This is a session designed to challenge assumptions, spark debate, and imagine the possible futures of Python and software engineering in an AI-assisted world.

Note: Join our interactive workshop to explore the future of Python and AI-assisted coding on Wednesday . Everyone is welcome to share ideas, debate risks, the future of Python and help shape what software engineering could look like in the age of AI.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Panel</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/WAJQR7/</url>
            <location>Merck Plenary (Spectrum) [1st Floor]</location>
            
            <attendee>Sebastian Neubauer</attendee>
            
            <attendee>Markus Klein</attendee>
            
            <attendee>Asya Melnik</attendee>
            
            <attendee>Serhii Sokolenko</attendee>
            
            <attendee>Ines Montani</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>NXEVSE@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-NXEVSE</pentabarf:event-slug>
            <pentabarf:title>Lightning Talks 1</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260414T175000</dtstart>
            <dtend>20260414T190000</dtend>
            <duration>011000</duration>
            <summary>Lightning Talks 1</summary>
            <description>Lightning Talks 1</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Lightning Talks</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/NXEVSE/</url>
            <location>Merck Plenary (Spectrum) [1st Floor]</location>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>JJDCW3@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-JJDCW3</pentabarf:event-slug>
            <pentabarf:title>Python Hates Being PID 1: Writing Container-Aware Code for Kubernetes</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260414T114500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Python Hates Being PID 1: Writing Container-Aware Code for Kubernetes</summary>
            <description>**The Problem** : The large-scale adoption of Kubernetes means more Python developers are now writing code that runs as a containerized workload on Kubernetes. However, most of us still write applications with a standard Linux server in mind. In a containerized environment, these assumptions are either untrue or dangerous. Python apps not hardened for a containerized environment lead to production failures that are notoriously hard to debug:
- Unexplained Latency: API requests that stall for hundreds of milliseconds due to Linux CFS Quota throttling, even when monitoring shows low CPU usage.
- Silent OOM Kills: Containers that vanish instantly without a traceback because they hit a Cgroup limit that the Python Garbage Collector cannot see.
- Zombie Processes: Subprocesses that were never truly killed and are now exhausting the process table because Python ignores its duties as PID 1.

**The Solution** : This talk will briefly get you up to speed with containerization before taking a technical deep dive into the interactions between Kubernetes, the CPython interpreter and the Linux container runtime. We will move beyond basic Dockerfile best practices and focus on hardening the application code itself to survive in a hostile Kubernetes environment.

**Pre-requisites** : This talk is aimed towards intermediate to senior Python Developers and Data Engineers having basic familiarity with Docker. No advanced Kubernetes or Linux Kernel knowledge required, we will run through the foundational topics in brief.

**Outline (30 Minutes)**
1. Who am I? (2 mins)
2. The Lie of the Container (3 mins)
    - Understanding how the container runtime isolates your process and the resources it needs.
3. The PID 1 Problem (4 mins)
    - How the Linux kernel treats PID 1 processes and why the standard Python interpreter fails these duties.
    - Present well established solutions to the problem (init: true, tini, etc) and common pitfalls.
4. The CPU Quota &amp; Memory Limit (8 mins)
    - How container CPU limits in Kubernetes translate to Linux CFS (Completely Fair Scheduler) quotas. 
    - Visualizing how the enforcement of CFS quotas interacts with the Python GIL to cause latency spikes.
    - Python’s memory management and the dreaded OOM kill.
5. Hardening your Python Code (8 mins)
    - How to use the Cgroup file system or psutil to achieve true resource awareness.
    - Strategies for avoiding CPU throttling and tuning numeric libraries (Pandas/Numpy) from attempting to use too many cores.
    - Why `gc.collect()` is often insufficient and how to release memory before the OOM killer strikes.
6. Conclusion &amp; Checklist (5 mins)
    - A &quot;Production-Ready&quot; checklist for Python on K8s.
    - Q&amp;A.

**After this talk you will** :
- Understand the lifecycle of a containerized Python app and handle shutdowns gracefully.
- Fine-tune a containerized Python app for stability and avoid CPU throttling and OOM kills.
- Look beyond the standard system calls to write truly resource aware Python apps.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/JJDCW3/</url>
            <location>Titanium [2nd Floor]</location>
            
            <attendee>Kavish Nareshchandra Dahekar</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>BQYTVM@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-BQYTVM</pentabarf:event-slug>
            <pentabarf:title>Beyond Stateless: Why Your Web Service Architecture is Fighting Against Performance</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260414T122500</dtstart>
            <dtend>20260414T131000</dtend>
            <duration>004500</duration>
            <summary>Beyond Stateless: Why Your Web Service Architecture is Fighting Against Performance</summary>
            <description>## The Problem Many Face

Every developer of a successful web service knows this progression: You start with a simple FastAPI or Django app. It works great locally. Then you deploy it, traffic grows, and suddenly you&#x27;re working primarily on infrastructure complexity. Load balancers, cache layers, database replicas, message queues, and before you know it, your simple microservice based business logic has become a complex distributed system mesh including careful cache invalidation logic.

But what if this complexity isn&#x27;t inevitable? What if it&#x27;s actually the result of a historical mistake that became &quot;best practice&quot;?

## Challenging the Stateless Dogma

This talk challenges a fundamental assumption of modern web architecture: that stateless services are superior for scalability. I&#x27;ll demonstrate that this belief, born from the constraints of early web servers, is now actively harmful to both performance and developer productivity. The truth is: separating logic from state (the core of stateless architecture) creates most of the complexity we fight daily. Every database query, every cache lookup, every message queue: they&#x27;re all workarounds for the fact that we threw away our object&#x27;s state after each request.

## Key Takeaways

- Stateless isn&#x27;t a virtue, it&#x27;s a workaround: modern systems can and should maintain state efficiently across requests.
- Your objects can be the cache: when objects persist in distributed memory, explicit caching becomes redundant.
- Scale by writing normal Python code: the same object-oriented patterns work from prototype to web-scale.
- Performance through simplicity: eliminating layers of infrastructure translation improves both latency and throughput.
- Focus on business logic, not plumbing: let the framework handle distribution, persistence, and failover.

## Who Should Attend

Python developers who:
- are building or maintaining web services,
- have experienced the pain of cache invalidation,
- want to scale without changing their programming model,
- are curious about alternatives to microservices.

## A Paradigm Shift

Just as we moved from manual memory management to garbage collection, it&#x27;s time to move on from manual state management. Your Python objects should live as long as they&#x27;re needed, not just for the duration of a request. This isn&#x27;t theoretical. Systems using this approach power gaming platforms with millions of concurrent users, financial systems requiring microsecond latency, and IoT platforms managing billions of devices. The technology exists. We just need to unlearn the &quot;stateless is good&quot; mantra.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk (long)</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/BQYTVM/</url>
            <location>Titanium [2nd Floor]</location>
            
            <attendee>Heiner Wolf</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>7H9DF8@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-7H9DF8</pentabarf:event-slug>
            <pentabarf:title>How to mix conda and pip without causing “environmental” damage.</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260414T143000</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>How to mix conda and pip without causing “environmental” damage.</summary>
            <description>Users frequently run `pip` inside their `conda` environments, sometimes successfully, sometimes with unintentional consequences. Confusing errors and broken environments often lead users to ask: when is it safe to use `pip` in a `conda` environment, and when is it not?

In this presentation I will answer this question. 

I will begin by discussing the differences between `pip` and `conda` (a question conda maintainers get asked a lot!), starting with the specific use-cases of both tools. 
This will include an “enlightenment” moment: `pip` and `conda` solve slightly different problems, one is a Python package installer, the other is a language agnostic package and environment manager.

I will then explain the differences between `.conda` packages, tarballs, and Python wheels, revealing how these format differences make interoperability difficult and mixing tools unreliable.
Users end up mixing `pip` and `conda` because sometimes the packaging ecosystem leaves them no other choice. Users often report, &quot;I tried installing a package  with `conda`, but it didn&#x27;t work, so I ran `pip install` instead and it worked”. This mixing, sadly, has consequences, which I refer to as “environmental damage”. 
I will highlight this damage in my talk. 

`pip` and `conda` are two separate ecosystems but over time many community efforts (most recent being `conda-pypi`),  have tried to improve interoperability. I will explain how the latest updates in `conda` along with the features in `conda-pypi` have now made it possible to `conda install` Python wheels from PyPI directly into `conda` environments. Thereby bringing us a step closer to better interoperability. 

I will conclude the presentation with best-practice recommendations for using `pip` and `conda` together. 
By the end of this presentation, users will have learned when to use `pip`, when to use `conda`, why they are different and how to combine them safely.


Here is a link to the conda-pypi repository on GitHub: https://github.com/conda-incubator/conda-pypi 

Time outline of the presentation:
3 mins- self introduction and introduction to the topic (what to expect)
5 mins- difference between pip and conda and their use cases
10 mins- different package formats, problems with mixing pip and conda
5 mins- wheels support feature in conda-pypi and updates in conda
5 mins- how it helps users and best practices 
2 mins- closing remarks</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/7H9DF8/</url>
            <location>Titanium [2nd Floor]</location>
            
            <attendee>Mahe Iram Khan</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>YKQ33N@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-YKQ33N</pentabarf:event-slug>
            <pentabarf:title>Destructive Testing: 10 Practical Ways to Expose Hidden Application Risks</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260414T151000</dtstart>
            <dtend></dtend>
            <duration>004500</duration>
            <summary>Destructive Testing: 10 Practical Ways to Expose Hidden Application Risks</summary>
            <description>Quality assurance is not about confirming that software works — it is about discovering how it fails. This talk explores ten concrete ways to break an application on purpose, based on real-world testing patterns and common failure modes seen in modern software systems.

The focus is on practical thinking, not theory. While Python is used as the primary example language for test automation and experimentation, the concepts apply to any technology stack. The session is relevant for QAs, test engineers, and developers who want to build more resilient systems and improve cross-discipline collaboration.

Goals of the Talk
* Improve destructive testing and exploratory thinking for QAs
* Help developers understand common blind spots in application design
* Demonstrate how Python can be used effectively to probe system weaknesses
* Encourage a shared quality mindset across roles</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk (long)</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/YKQ33N/</url>
            <location>Titanium [2nd Floor]</location>
            
            <attendee>Pascal Puchtler</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>HKFCBM@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-HKFCBM</pentabarf:event-slug>
            <pentabarf:title>Pair &amp; Share: How formal Mentoring pushed REWE Analytics to a new level</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260414T163000</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Pair &amp; Share: How formal Mentoring pushed REWE Analytics to a new level</summary>
            <description>Did you ever wonder how to bring your analytics department to the next level? Do you want to help colleagues to network, learn or pass on their knowledge? And did you ever want to start your own mentoring program in a large corporation? Think no more, as I will describe in detail how we set up a mentoring program, Pair &amp; Share, in REWE Group’s analytics department, with its 150 data scientists, data engineers, analysts and other data people. As one of Europe’s largest retail corporations, REWE Group owns and manages prominent supermarket chains such as REWE and PENNY, among many other subsidiaries. However, before Pair and Share there was no formal process for personal, technical or methodological growth within REWE Analytics. Although there are plenty of possibilities, further training and education was self-organized and fragmented. To increase growth among our colleagues and build and strengthen inter-team exchange, we introduced the formal mentoring program, Pair &amp; Share. 

This talk will cover a brief overview of REWE Group and our analytics department, who we are and what we do. This is followed by a description of why, while we found personal growth and training to be fine, we realized that we could do better with Pair &amp; Share. Afterwards, I will explain how we planned the details of the mentoring program and defined the parameters like the matching process, time frame and how to recruit participants. As the first iteration of mentoring comes to an end in March 2026, I will share my experiences of the first six months of mentoring. This will include the kind of roadblocks we faced, how participants shaped their own mentoring experience and what pleasant surprises we encountered. As we measured participant satisfaction with regular pulse checks as well as many feedback sessions, I will conclude the talk with an overview of what went well and how we plan to do better with the next iteration of mentoring.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/HKFCBM/</url>
            <location>Titanium [2nd Floor]</location>
            
            <attendee>Axel Buddendiek</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>U9KQU9@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-U9KQU9</pentabarf:event-slug>
            <pentabarf:title>Building Trust in Your Data Pipelines with Observability</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260414T171000</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Building Trust in Your Data Pipelines with Observability</summary>
            <description>This talk explores how observability can be applied to data pipelines to improve reliability, data quality, and confidence in complex data systems.

The talk begins with an introduction to observability in the context of data engineering. It explains the three core pillars: metrics, alarms, and logs, and discusses why observability is particularly important for data pipelines, where failures are often silent and correctness issues may only surface through stakeholder complaints. 

The first section focuses on metrics. It demonstrates how straightforward it can be to instrument data pipelines with basic metrics using Python. The talk then discusses which metrics are worth monitoring, adapting established concepts such as the four golden signals to data engineering use cases. A concrete example based on a near–real-time event processing pipeline illustrates how fine-grained metrics can reveal systematic failures for specific event types.

The second section focuses on alerting. It addresses the challenge that engineers rarely have time to continuously inspect dashboards and therefore rely on alarms to surface important issues. The talk outlines what makes a good alarm, emphasizing that alarms should be actionable, reliable, and provide sufficient context for investigation. A scenario with excessive and noisy alarms is used to illustrate alarm fatigue and a strategy how to get out of such a situation is described.

The final section covers log messages and their importance to reason about how a pipeline ended up in a specific state. It discusses why logs are often difficult to work with in data pipelines, as they may contain a mixture of critical errors, informational messages, and low-level framework output. The talk introduces structured logging as a way to add context and make logs easier to search, filter, and aggregate. Examples include monitoring the distribution of log levels to uncover hidden issues and using centralized logging to identify dependencies between pipelines that are otherwise hard to detect.

The talk concludes by emphasizing how the three pillars of observability build trust in a data pipeline.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/U9KQU9/</url>
            <location>Titanium [2nd Floor]</location>
            
            <attendee>Stefan Dienst</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>PFXR9G@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-PFXR9G</pentabarf:event-slug>
            <pentabarf:title>Fight your garbage data: implementation of a pythonic data quality monitoring framework in PySpark</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260414T122500</dtstart>
            <dtend></dtend>
            <duration>004500</duration>
            <summary>Fight your garbage data: implementation of a pythonic data quality monitoring framework in PySpark</summary>
            <description>In the talk we share our expirience from the project implemented in Q3 2025. We start with the motivation for the project, involved stakeholders and their needs. We will then define the criteria for a successful data quality monitoring solution and share findings from our evaluation of existing frameworks. We will also discuss why popular frameworks like Great Expectations or SODA did not meet our requirements. 

Next, we will demonstrate our implementation based on DQX—a lightweight, open-source Python library designed for traceable, row-level data quality checks before and after data is persisted. DQX, developed and maintained by Databricks labs, allows developers to concentrate on the core implementation while providing business users YAML files for maintenance of business rules. Furthermore, DQX’s seamless integration with PySpark enables efficient and cost-effective quality monitoring within our IoT data lake. 

Finally, we move beyond the code to the organisational reality. We will discuss how we embedded Data Quality Monitor into the organisation and share our opinion on the hard questions: who is responsible for maintaining rules? who monitors the results? 

**Talk outline** 

* Motivation for the project 

     * Initial situation and objectives   

* Framework evaluation 

     * Evaluation criteria for a successful data quality monitoring 

     * Comparison of available frameworks 

* Our implementation with DQX 

     * How to use built-in data quality checks 

     * How to add custom data quality checks 

     * Automated rule generation with DQX Profiler 

     * Output and visualisation options 

     * Python project structure 

* Embedding in organisation 

     * Rule maintenance 

     * How to communicate data quality issues 

* Summary  

 

**Key takeaways** 

* Understanding of most important criteria when choosing the framework for data quality monitoring from perspective of a data engineer and an architect 

* Understanding of DQX framework 

* Ideas how to integrate data quality monitoring into organisations.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk (long)</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/PFXR9G/</url>
            <location>Helium [3rd Floor]</location>
            
            <attendee>Rostislaw Krassow</attendee>
            
            <attendee>Joshua Finger</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>WSNBD9@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-WSNBD9</pentabarf:event-slug>
            <pentabarf:title>Hype, Hope, or Headache? Making Sense of GenAI, LLMs, and AI Agents with Anecdotal Evidence</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260414T143000</dtstart>
            <dtend>20260414T150000</dtend>
            <duration>003000</duration>
            <summary>Hype, Hope, or Headache? Making Sense of GenAI, LLMs, and AI Agents with Anecdotal Evidence</summary>
            <description>After nearly 20 years in data science I’ve seen many “revolutions” come and go: neural networks, SVMs, bayesian statistics, random forests, XGBoost and deep learning. Each came with bold promises, and each eventually settled into a realistic place in production systems (read: became boring). Generative AI, however, feels fundamentally different.

In this talk, I’ll share my view *why* the current GenAI hype stands apart from previous cycles: technically, culturally, and organizationally. Even with some understanding how these things work, I am still blown away by the stream of stunning new capabilities. This is not a “GenAI is bad” rant. Instead, it’s a critical attempt to understand the shift we’re seeing, and the risks that come with it if we don’t adjust our thinking.

Using industrial examples such as supply chains (just because I work in this field), but also personal experience, I’ll show where LLM-based approaches still have serious limitations today, and where GenAI can realistically add value. We’ll disentangle different categories of risk from technical fragility, evaluation problems and mere costs to organizational overconfidence and misuse.

A big part of the talk dives into the rapidly emerging field of AI Agents. We’ll explore what AI agents actually are, where they make sense today, and where the current hype is just snake oil, particularly to senior decision-makers who may underestimate complexity, costs, and failure modes.

The goal of this talk is not to slow innovation, but to enable better decisions. If we want GenAI to be a success in real-world systems, we need to understand both the change it represents and the limits it still has. 

An anti-bullshit take on the possibilities ahead, with honesty, anecdotes, and (for those who know me, of course) a bit of humor.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/WSNBD9/</url>
            <location>Helium [3rd Floor]</location>
            
            <attendee>Sebastian Neubauer</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>HBFL78@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-HBFL78</pentabarf:event-slug>
            <pentabarf:title>Demystifying Parallel Programming in Python: from CPU to quantum processors, including GPU and TPU</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260414T151000</dtstart>
            <dtend></dtend>
            <duration>004500</duration>
            <summary>Demystifying Parallel Programming in Python: from CPU to quantum processors, including GPU and TPU</summary>
            <description># Demystifying Parallel Programming in Python

## Understanding the Hardware Basics

* A gentle introduction to modern processors: What are CPUs, GPUs, TPUs, and quantum processors?
* Essential terminology explained: cores, hyper-threading, cache memory, multithreading, multiprocessing, multitasking, SIMD, NUMA, and more—no prior knowledge required!

## Parallel Programming Techniques for Beginners
A practical overview of Python’s parallel programming tools, organized by approach:

* Just-In-Time (JIT) compilation: Speed up your code without changing your workflow
* Multithreading: Do more at once, and removing the GIL with Python 3.13+
* Multiprocessing: Use all your CPU cores
* Distributed computing: Scale your code across multiple machines
* Quantum programming: A first look at the future of computing

## Hands-On Examples

* JIT compilation made easy: PyPy, Numba, and JAX
* The GIL and Python 3.13: What’s changing and why it matters
* Distributed computing for everyone: Celery and Dask on HPC clusters
* GPU computing for beginners: CuPy, cuDF, and Numba
* Your first quantum “Hello World”: A taste of the quantum revolution

## Conclusion

By the end of this talk, you’ll have a clear map of Python’s parallel programming landscape.
No experience needed—just bring your curiosity and let’s explore together!</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk (long)</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/HBFL78/</url>
            <location>Helium [3rd Floor]</location>
            
            <attendee>Gaël Pegliasco</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>PK8XNB@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-PK8XNB</pentabarf:event-slug>
            <pentabarf:title>Come for the Code, Stay for the People.</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260414T163000</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Come for the Code, Stay for the People.</summary>
            <description>I have a confession to make: after seventeen years in the Python community and countless technical talks attended and organized, this is the first time I&#x27;m putting myself out there to talk about community itself. It feels vulnerable. It feels necessary.

**How it started**

My journey began the way many do—volunteering. Stuffing badge holders, directing people to rooms, answering the same question about Wi-Fi passwords a hundred times. It wasn&#x27;t glamorous, but it was transformative. Volunteering was my first taste of what it means to contribute to something larger than myself, and it opened doors I didn&#x27;t even know existed.

Back then, I came for the code. I had no idea I&#x27;d stay for the people.

**More than code**

Through seventeen years, the Python community taught me things I carry with me everywhere. Things that have nothing to do with syntax or libraries.

The value of patience and kindness when someone asks a &quot;basic&quot; question—because we were all beginners once. The importance of explicit inclusion, because &quot;everyone is welcome&quot; means nothing without deliberate action. The power of mentorship, both giving and receiving. The understanding that community health requires active maintenance, not passive hope.

This is what &quot;stay for the people&quot; actually means.

**Who this talk is for**

Having attended and organized Python conferences for years, I&#x27;ve noticed something consistent: there are always newcomers. People experiencing their first Python event, unsure of what to expect, wondering if they belong. This talk is for them.

But it&#x27;s also for anyone thinking about community engagement—whether in Developer Relations, open source maintainership, or simply as someone who cares about the spaces they inhabit.

**Looking forward**

I don&#x27;t have all the answers. I want to end with questions rather than conclusions. How do we engage with a generation that communicates differently? How do we preserve depth in an age of fragmented attention? What can newcomers teach us about building community in ways we haven&#x27;t imagined?

My hope is that this talk sparks conversations that continue long after I leave the stage. And honestly? I hope to revisit this topic in ten years and see how wrong—or right—we were.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/PK8XNB/</url>
            <location>Helium [3rd Floor]</location>
            
            <attendee>Valerio Maggio</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>EE39VN@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-EE39VN</pentabarf:event-slug>
            <pentabarf:title>Reaching the next level of abstraction: meta classes and what they enable</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260414T114500</dtstart>
            <dtend>20260414T121500</dtend>
            <duration>003000</duration>
            <summary>Reaching the next level of abstraction: meta classes and what they enable</summary>
            <description>Python is accessible and easy, but what makes it especially fun and powerful are its deep meta programming capabilities. One salient example are meta classes, which allow us to deeply hook into the class creation process. But, they seem quite complex at first glance, which may have deterred you so far from exploring them. In my talk, I want to alleviate your uncertainty and give you concrete examples of how meta classes work and what they enable you to do. We will look at using them to customize class creation, ensure data integrity by adding custom validators, or defining custom syntactic sugar that reduces boilerplate.

Outline:
* Programming and meta programming
* Everything is an object
* Higher-order functions
* Meta class basics: customizing class creation and enforcing constraints
* Advanced example: custom syntactic sugar
* With great power comes great responsibility</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/EE39VN/</url>
            <location>Platinum [2nd Floor]</location>
            
            <attendee>Valentin Zieglmeier</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>WQGXJ3@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-WQGXJ3</pentabarf:event-slug>
            <pentabarf:title>Exploring Germany&#x27;s Urban Geography with Census and OpenStreetMap Data</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260414T122500</dtstart>
            <dtend></dtend>
            <duration>004500</duration>
            <summary>Exploring Germany&#x27;s Urban Geography with Census and OpenStreetMap Data</summary>
            <description>By the end of this talk, audience members will be empowered with the tools they need to help identify and bring light to important problems affecting their cities. To achieve this, I show how to combine data on urban structure from OpenStreetMap and demographic data from the German Census in PostgreSQL. Once the data is gathered, I then show how to do the actual analysis and present the findings with Python.

The presentation will be broken up into the following sections:

**Laying the foundation**

The first step is creating an organized database that will serve as the data source for the rest of the study. I show how to use &quot;PgOSM Flex&quot; for this plus a tool that I wrote in Python to make it easy to import German Census data into PostgreSQL.

**Asking meaningful questions**

With all the data in place, it&#x27;s time to formulate a research question to drive our analysis. Formulating a meaningful research question can keep our analysis on track and much better organized. To get there, we explore the data we have available and consider the types of questions we can actually answer.

**Analyze and present**

Now that we have a clear question in mind, we&#x27;ll construct the queries we need to generate the data necessary for our analysis. Once exported from PostgreSQL, we perform the analysis and generate the final reports using popular scientific libraries in Python.

**Final thoughts**

To conclude the talk, I share how this analysis could be extended by including even more datasets. I also discuss the limitations of these types of studies while offering practical advice on how you can make a positive impact with your research.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk (long)</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/WQGXJ3/</url>
            <location>Platinum [2nd Floor]</location>
            
            <attendee>Travis Hathaway</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>KBGXKC@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-KBGXKC</pentabarf:event-slug>
            <pentabarf:title>Making my Apache Spark™ talk more interesting using AI</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260414T143000</dtstart>
            <dtend>20260414T150000</dtend>
            <duration>003000</duration>
            <summary>Making my Apache Spark™ talk more interesting using AI</summary>
            <description>In this talk, we&#x27;ll walk through a basic Apache Spark data pipeline which reads in an image dataset, processes it, and detects raccoons. That said, sponsored talks are always boring: let&#x27;s see what we can do to spice things up using AI! We&#x27;ll use Snowflake&#x27;s Cortex Code CLI coding agent together to improve the talk live, taking suggestions from the audience as we go!

Attendees to the talk can expect to learn the following:
- What Apache Spark is, what it excels at, and how to set up a basic cluster
- How to use HuggingFace ViT (vision transformer) to run a basic computer vision setup
- A little bit about Snowflake&#x27;s new coding agent, Cortex Code CLI (the part where we advertise at you, but I promise it will be fun)
- Building a basic Streamlit app
- .. and whatever other fun we get up to together!

Join for a session full of fun experimentation with interesting tools – and learn a bit about data pipelines too! This session is suitable for beginner to intermediates!coding agent, live!</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Sponsored Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/KBGXKC/</url>
            <location>Platinum [2nd Floor]</location>
            
            <attendee>Celeste Horgan</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>UUHYUS@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-UUHYUS</pentabarf:event-slug>
            <pentabarf:title>AsyncIO vs Threads: who survives in the No-GIL Era?</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260414T151000</dtstart>
            <dtend>20260414T154000</dtend>
            <duration>003000</duration>
            <summary>AsyncIO vs Threads: who survives in the No-GIL Era?</summary>
            <description>Concurrency in Python is full of stereotypes: &quot;threads are useless because of the GIL&quot;, &quot;async is always faster&quot;, &quot;just make everything async&quot;. This session replaces opinions with mechanics and measurements, and updates the story for Python 3.14&#x27;s free-threaded (no-GIL) build.

What we&#x27;ll cover

1) How things actually work under the hood
- A Python thread is an OS thread (pthread_create/clone). The OS scheduler runs it like any other thread - the GIL only matters when Python bytecode executes.
- asyncio is also scheduling: one OS thread, many Tasks, cooperative switching at await, and readiness notifications via epoll/select.

2) Why IO-heavy workloads often look &quot;equally fast&quot; in threads and asyncio
- both models hide IO latency by switching while waiting;
- the real difference shows up in scalability and cost: per-thread memory/stack + OS limits vs lightweight Tasks.

3) When &quot;async&quot; is secretly a thread pool
- aiofiles delegates file operations to run_in_executor();
- Motor (async MongoDB driver) runs the synchronous PyMongo core in a ThreadPoolExecutor;
- frameworks like Django must bridge sync and async worlds (sync_to_async), adding overhead and sharp edges.

4) Benchmarks that mirror real services
- 100 / 1,000 / 10,000 concurrent IO waits: why &quot;10k threads&quot; fails but &quot;10k tasks&quot; is fine;
- memory and CPU overhead comparison (what you pay for concurrency);
- a microservice-style endpoint (FastAPI-like) in sync/threaded vs async mode.

5) What changes with free-threading (no-GIL)
- a high-level view of what CPython changes to make it possible;
- rerunning the same benchmark with and without the GIL;
- when an interpreter upgrade can deliver &quot;async-rewrite-level&quot; gains for mixed CPU+IO workloads.

Takeaways
- a practical checklist for choosing threading vs asyncio vs multiprocessing;
- performance vs resource-usage intuition you can apply to real services;
- guidance on how to read &quot;async&quot; claims in library docs.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/UUHYUS/</url>
            <location>Platinum [2nd Floor]</location>
            
            <attendee>Igor Anokhin</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>JPTTMK@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-JPTTMK</pentabarf:event-slug>
            <pentabarf:title>The Art of the Optimal: A Pythonic Approach to Complex Decision-Making</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260414T163000</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>The Art of the Optimal: A Pythonic Approach to Complex Decision-Making</summary>
            <description>As Python developers, we frequently tackle complex decision-making problems by writing custom scripts and heuristic algorithms. While a standard greedy algorithm might provide a quick, intuitive fix, it rarely finds the best possible solution—often leaving significant efficiency, performance, and cost-savings on the table.

In this talk, we will explore the untapped power of mathematical optimization. We will start with a classic operations challenge: the Paintshop Problem. You will see firsthand how a standard rule-based Python heuristic compares to a mathematical optimization model, and how rigorously defining constraints and objectives can guarantee a globally optimal solution.

But optimization isn&#x27;t just for traditional logistics! We will also bridge the gap to Machine Learning. We will demonstrate how optimization techniques can be utilized as a powerful verification step for ML models, such as calculating the minimum pixel changes required to trick a neural network into a misclassification.

While we can only scratch the surface of these vast topics, you will walk away with a fresh perspective on problem-solving. Whether you are automating business operations or building robust ML pipelines, you will learn when to graduate from basic heuristics and start leveraging the true &quot;art of the optimal.&quot;</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Sponsored Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/JPTTMK/</url>
            <location>Platinum [2nd Floor]</location>
            
            <attendee>Justine Broihan</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>8YGQZC@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-8YGQZC</pentabarf:event-slug>
            <pentabarf:title>Type Errors for Better Agent-Assisted Development</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260414T171000</dtstart>
            <dtend>20260414T174000</dtend>
            <duration>003000</duration>
            <summary>Type Errors for Better Agent-Assisted Development</summary>
            <description>As AI coding agents take on larger Python tasks, a practical question emerges: what&#x27;s the best way to catch the bugs they introduce? Tests are thorough but slow. Linting is fast but shallow. Type checking occupies an interesting middle ground: deep enough to catch semantic errors, fast enough to run on every edit, and concrete enough to tell the agent exactly what to fix.

In this talk, I explore connecting Pyrefly, a Python type checker built at Meta, to Claude Code. I&#x27;ll walk through integration options and discuss practical considerations like token costs and setup complexity. Whether you&#x27;re building tools for AI agents or using them in your daily work, you&#x27;ll leave with a clearer picture of where type checking fits in the agentic development loop.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Sponsored Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/8YGQZC/</url>
            <location>Platinum [2nd Floor]</location>
            
            <attendee>Kyle Into</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>AWFFUS@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-AWFFUS</pentabarf:event-slug>
            <pentabarf:title>Kickstart Coding at Scale: How Project Template Automation Unlocks Developer Productivity</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260414T114500</dtstart>
            <dtend>20260414T121500</dtend>
            <duration>003000</duration>
            <summary>Kickstart Coding at Scale: How Project Template Automation Unlocks Developer Productivity</summary>
            <description>**The Problem**
As organizations scale, their repository count grows — and with it, the diversity of project setups. Different CI configurations, inconsistent linting rules, varying packaging approaches: every new project reinvents the wheel. Developers spend valuable time on boilerplate instead of writing code. Another approach — a centralized, &quot;magic&quot; build pipeline — trades one problem for another: it&#x27;s opaque, brittle, and leaves no room for project-specific needs. We illustrate this with a concrete example: pre-commit configuration.

**A Paved Road with Copier**
The Python tool Copier goes beyond one-time scaffolding — it’s a lifecycle management tool. When the template evolves, copier update merges improvements into existing projects, respecting local customizations. This is what sets it apart from cookiecutter and similar tools. We built an internal project template that generates CI workflows, pre-commit configuration, conda packaging, documentation scaffolding, and more — all customizable through simple yes/no questions during setup. Crucially, projects can deviate from the template whenever needed, without breaking the update mechanism. This section includes a live demo.

**Automated Migration at Scale**
A template is only useful if projects stay up to date. We built a GitHub bot that runs monthly across all repositories in our organization, executes copier update, and opens Pull Requests with the changes. Merge conflicts are minimized by encouraging teams not to diverge too far from the template. For the conflicts that do arise, mergiraf helps with resolution — but maintainers may still need to step in.

**Tracking Progress with a Dashboard**
To answer &quot;how many projects are up-to-date?&quot;, we built a Streamlit dashboard that shows the template version for each repository, with search filters and charts. This gives the team visibility into adoption progress and helps identify repositories that are falling behind.

**Lessons Learned**
We share practical lessons from rolling this out across a large organization — what worked, what&#x27;s still challenging, and where we see current limitations.

**Takeaways**
Attendees will learn how to:
- Use Copier to create and continuously update project templates that standardize without locking developers in.
- Automate template updates across repositories via a GitHub bot and automated Pull Requests.
- Use a dashboard to track which projects are up-to-date and which are lagging.
- Reduce &quot;boilerplate fatigue&quot; so teams can focus on shipping code.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Sponsored Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/AWFFUS/</url>
            <location>Europium [3rd Floor]</location>
            
            <attendee>Yannik Tausch</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>NF7MKB@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-NF7MKB</pentabarf:event-slug>
            <pentabarf:title>Programming Quantum Networks in Python</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260414T122500</dtstart>
            <dtend></dtend>
            <duration>004500</duration>
            <summary>Programming Quantum Networks in Python</summary>
            <description>Quantum networks connect quantum devices including quantum computers, enabling applications not possible in classical networks, such as secure quantum computing in the cloud and quantum key distribution. These networks are now moving from theory to reality, and as part of the Quantum Internet Alliance, we are actively building a prototype quantum network in Europe, driven by applications developed in Python.  

Even though quantum systems are governed by the rules of quantum mechanics, you don&#x27;t need to be an expert in quantum physics to start programming them!  

Developing applications for quantum networks reveals new challenges. For example, unlike in classical networks where data is copied and retransmitted, quantum information cannot be copied. Once lost, it is irretrievable. This motivates a new networking primitive for transferring data, the quantum teleportation protocol.  

In this talk, we will walk through the quantum teleportation protocol step-by-step using the NetQASM SDK and the SquidASM simulator, Python tools developed by our research group for quantum network programming and simulation. We&#x27;ll conclude by sharing resources so that you can begin experimenting with quantum network programming yourself. No prior quantum experience required.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Invited Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/NF7MKB/</url>
            <location>Europium [3rd Floor]</location>
            
            <attendee>Samuel Oslovich</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>333HDN@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-333HDN</pentabarf:event-slug>
            <pentabarf:title>From Hard Problems to Proven Solutions: Solving Decision Problems with Gurobi</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260414T143000</dtstart>
            <dtend>20260414T150000</dtend>
            <duration>003000</duration>
            <summary>From Hard Problems to Proven Solutions: Solving Decision Problems with Gurobi</summary>
            <description>Many real-world applications require making the best possible decisions under complex constraints — whether in scheduling, resource allocation, routing, or planning. These problems quickly become difficult as the number of interacting choices grows.

This session introduces mathematical optimization as a practical tool for solving such problems. Using Gurobi, we demonstrate how to formulate decision problems and compute solutions that satisfy all constraints and come with clear guarantees about their quality.

You’ll see how to express optimization models using familiar data structures such as NumPy arrays, SciPy.sparse matrices, and pandas DataFrames.

By the end of the session, you’ll have an understanding of how to approach modeling and solving complex decision problems — and how optimization can be used to support reliable, data-driven decisions.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Sponsored Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/333HDN/</url>
            <location>Europium [3rd Floor]</location>
            
            <attendee>Silke Horn</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>QVLTKD@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-QVLTKD</pentabarf:event-slug>
            <pentabarf:title>Python in Climate Tech: Vehicle-to-Grid</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260414T163000</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Python in Climate Tech: Vehicle-to-Grid</summary>
            <description>At The Mobility House Energy, our mission is to enable a zero-emission future by connecting the worlds of mobility and energy. By intelligently integrating electric vehicle batteries into the power grid, we unlock flexibility that supports renewable energy expansion, enhances grid stability, and makes electric mobility more accessible and affordable.

In this talk, we share how Python became a key enabler on our journey to delivering Vehicle-to-Grid solutions at scale. From early simulations and prototyping to operating production-grade energy systems, Python supports us across the entire development lifecycle. It allows us to rapidly validate ideas, process and analyze complex energy and mobility data, and deploy robust services that are battle-tested in the real-world and on energy markets.

We will also explore how adopting Python in production reshaped our collaboration model. Data scientists and software engineers now work closer together, sharing tools, codebases, and responsibilities. At the same time, we will openly discuss the technical and organizational challenges we encountered—from performance bottlenecks to system integration—and the practical solutions that helped us overcome them.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Sponsored Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/QVLTKD/</url>
            <location>Europium [3rd Floor]</location>
            
            <attendee>Christopher Sedlaczek-Bock</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>GVHZW9@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-GVHZW9</pentabarf:event-slug>
            <pentabarf:title>Solving Marketplace Cold Start at Scale with Ranking</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260414T114500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Solving Marketplace Cold Start at Scale with Ranking</summary>
            <description>Cold start cripples two‑sided marketplaces: new items lack behavioral signals and social proof, ranking models under‑expose them, which delays the very signals needed to rank them well. This talk shares our journey to break down this loop at GetYourGuide, a marketplace for travel experiences. We evolved our exploration/activation framework over the past three years with three complementary interventions: guaranteed exposure at strategic positions, a real‑time reranker to allocate that exposure efficiently under tight latency budgets, and guardrail boosting for unactivated items when primary assessment slots are empty. 

The talk is a pragmatic case study: we’ll show how experiment‑led exploration shaped the system over the last 3 years. We will share what worked, what did not, and how we managed trade-offs between short-term revenue and long-term marketplace health. Attendees will leave with a blueprint for safely accelerating early traction in their own marketplaces, combining learning‑to‑rank with exposure guarantees without sacrificing overall business health.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/GVHZW9/</url>
            <location>Palladium [2nd Floor]</location>
            
            <attendee>Theodore Meynard</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>DVCKHF@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-DVCKHF</pentabarf:event-slug>
            <pentabarf:title>Personalized Restaurant Recommendations at Scale combining Transformer with Gradient-Boosted Ranking</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260414T122500</dtstart>
            <dtend></dtend>
            <duration>004500</duration>
            <summary>Personalized Restaurant Recommendations at Scale combining Transformer with Gradient-Boosted Ranking</summary>
            <description>Personalized restaurant ranking is a core machine learning problem in food delivery platforms, requiring models to balance relevance, exploration, latency, and robustness across highly heterogeneous markets. In this talk, we present UVR (Universal Venue Ranker), Wolt’s production ranking model for restaurant recommendations, currently deployed in more than 30 countries.

UVR unifies the capabilities of three previously separate models—Neural Collaborative Filtering (NCF), a second-pass ranker, and a first-time-user (FTU) model—into a single, sequence-aware ranking approach. Beyond improving recommendation quality, this consolidation significantly reduced model complexity, operational overhead, and long-term maintenance cost.

The model follows a two-stage architecture implemented using widely adopted Python-based machine learning technologies, including PyTorch, CatBoost, and Flyte. The first stage is an encoder-style transformer trained with a classification loss on a next-purchase prediction task. It learns a compact user state representation from historical restaurant purchase sequences enriched with spatiotemporal information, such as purchase time and user location. This stage outputs a personalized venue relevance score.

The second stage is a CatBoostRanker, trained with a learning-to-rank loss on grouped venue requests. It combines the transformer-derived score with a rich set of additional features, including user-specific attributes, venue metadata, user–venue interaction features, and delivery-related signals. This separation of objectives—classification for representation learning and ranking for final scoring—proved critical for both model performance and training stability.

We will walk through the end-to-end training and evaluation pipeline, covering feature construction, offline validation using ranking metrics, and a multi-country online A/B testing setup. UVR delivered significant and substantial improvements in global conversion rate and new venue trial rate, a key driver of long-term user retention. We will discuss how offline improvements translated into online gains.

A dedicated section of the talk focuses on production and serving architecture, including low-latency inference and orchestration of training and deployment workflows using Flyte. We also share hard-earned lessons from training a multi-stage ranking model, such as preventing data leakage between models trained with different objectives and on different data as well as handling cold-start.

Finally, we outline our roadmap toward extending UVR into a cross-domain ranking model for both restaurants and stores, enabling knowledge transfer across domains while preserving strong personalization guarantees.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk (long)</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/DVCKHF/</url>
            <location>Palladium [2nd Floor]</location>
            
            <attendee>Marcel Kurovski</attendee>
            
            <attendee>Steffen Klempau</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>LYCBNT@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-LYCBNT</pentabarf:event-slug>
            <pentabarf:title>What Breaks When Automatic Speech Recognition Systems Go Multilingual</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260414T151000</dtstart>
            <dtend></dtend>
            <duration>004500</duration>
            <summary>What Breaks When Automatic Speech Recognition Systems Go Multilingual</summary>
            <description>In a multilingual Automatic Speech Recognition (ASR) dataset containing over 440,000 audio samples, preprocessing methods that were effective for one language often failed silently for others. This resulted in shifts in acoustic features, misleading validation outcomes, and prolonged jobs that failed due to assumptions that held true only in monolingual contexts. This presentation examines the issues that arise when extending ASR systems to multilingual data, using a real-world deepfake detection system that includes Hindi, Korean, Mandarin, and German. It addresses the engineering challenges encountered while developing and operating a Python-based pipeline at scale.

The session will discuss practical issues in large-scale audio processing, including the creation of memory-efficient data loaders, the design of workflows that support resumable preprocessing and feature extraction, and strategies for managing long-running jobs to avoid redundant computations. Additionally, it will cover validation strategies for multilingual ASR systems, emphasizing that language imbalance and shared pipelines can lead to cross-lingual leakage, which skews evaluation results if not explicitly addressed.

Key takeaways include:
1. Multilingual ASR pipelines reveal language-specific issues that are not present in monolingual systems.
2. Scalable audio processing requires memory-efficient and resumable Python workflows.
3. Cross-lingual evaluation necessitates explicit control over language imbalance and leakage.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk (long)</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/LYCBNT/</url>
            <location>Palladium [2nd Floor]</location>
            
            <attendee>Rashmi Nagpal</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>99UMEL@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-99UMEL</pentabarf:event-slug>
            <pentabarf:title>When Space Weather Breaks Your GPS: Building an Explainable Early Warning System</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260414T163000</dtstart>
            <dtend>20260414T170000</dtend>
            <duration>003000</duration>
            <summary>When Space Weather Breaks Your GPS: Building an Explainable Early Warning System</summary>
            <description>**Space Weather** doesn’t just produce beautiful auroras: it can silently disrupt navigation systems, radio links, and satellite-based technologies we rely on every day.

Travelling Ionospheric Disturbances (TIDs) are wave-like structures in the ionosphere that affect GNSS accuracy and HF communications. From an ML perspective, forecasting TIDs is a challenging rare-event prediction problem involving imbalanced data and heterogeneous physical inputs.

In this talk, I will present an operational machine learning approach developed within the T-FORS project to forecast TID occurrence over Europe. The model is built using **CatBoost** and integrates data from space- and ground-based observations.

The talk focuses on **model design and evaluation choices**. In particular, I will show how **SHAP** can be used to debug model behaviour, validate feature relevance, and build trust in predictions in a high-risk operational context.

Along the way, I’ll share practical engineering lessons on:
- handling class imbalance,
- incorporating domain knowledge into ML pipelines,
- producing **uncertainty-aware outputs** via **Conformal Prediction**, and
- running **interpretable models in real-time forecasting systems**.

The talk is aimed at data scientists and ML practitioners interested in applied forecasting, interpretable models, uncertainty quantification and ML at the boundary between data and physics.

---

**Talk outline**
- 0-4: What is Space Weather and why should we care
- 4-7: Framing TID forecasting as an ML problem
- 7-10: Model design with CatBoost
- 10-13: Explainability with SHAP
- 13-18: Uncertainty quantification with Conformal Prediction
- 18-22: Cost-sensitive learning and real-time operations
- 22-25: Lessons learned
- 25-30: Q&amp;A</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/99UMEL/</url>
            <location>Palladium [2nd Floor]</location>
            
            <attendee>Vincenzo Ventriglia</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>3UHPZB@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-3UHPZB</pentabarf:event-slug>
            <pentabarf:title>It Works on My Machine: Why LLM Apps Fail Users (Not Tests)</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260414T171000</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>It Works on My Machine: Why LLM Apps Fail Users (Not Tests)</summary>
            <description>You&#x27;ve deployed an LLM application. Your tests show that it&#x27;s working. The metrics look good. Then a user says **it&#x27;s broken.**

This happens more often than you would expect.

In this talk, we&#x27;ll share our experience of building and maintaining LLM applications, and discuss what we&#x27;ve learned about the discrepancy between evaluation results and user experience.

We will explore three dimensions of evaluation through the lens of user experience:

## Expectations: What does &#x27;working&#x27; actually mean to your users?

Sometimes the gap between tests and reality comes down to expectations. Questions that seem obviously hard to users turn out to be easy for the LLM—and vice versa. Understanding this mismatch is the first step to building systems that users actually trust.

## Functional: Does the system do what it&#x27;s supposed to do?

When you&#x27;re working with LLMs, individual components might pass tests while the whole system fails. With prompts, model parameters, evaluation criteria, metadata, and ever-growing datasets all interacting, the complexity compounds quickly.

## Operational: Does it remain reliable in real-world conditions?

In this section, we&#x27;ll share practical lessons from operating LLM applications in production: how we use observability tools like Opik to monitor model behavior, how telemetry helps us understand actual usage patterns, and how dedicated validation endpoints allow us to detect issues in on-premises deployments before users do.

We&#x27;ll discuss real-life scenarios we&#x27;ve encountered, such as when users expected different results to those delivered by our system, when external changes affected the system silently, and when performance drifted in ways that our metrics didn&#x27;t detect.

This isn&#x27;t a talk about frameworks or tools (even though we&#x27;ll mention a few). It&#x27;s about the human element of evaluation: **ensuring that the system we built serves the people using it.**

Whether you&#x27;re just starting out with LLM applications or running them at scale, you&#x27;ll probably recognize these scenarios. We&#x27;ll share the strategies and patterns that we&#x27;ve developed, not as prescriptive rules, but as a starting point for your own approach.

## Outline

1. Why users report the LLM application is broken while it passes every test
2. Three dimensions of the problem
    * Expectations
    * Functional
    * Operational
3. Real-life scenarios
4. Our current strategies and patterns
5. Evaluation = understanding if the system serves users, not proving it&#x27;s good</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/3UHPZB/</url>
            <location>Palladium [2nd Floor]</location>
            
            <attendee>Thomas Prexl</attendee>
            
            <attendee>Frank Rust</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>RRLTBU@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-RRLTBU</pentabarf:event-slug>
            <pentabarf:title>From Prompt to Production: How to use AI Code Assistants for Python Data Systems</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260414T114500</dtstart>
            <dtend></dtend>
            <duration>013000</duration>
            <summary>From Prompt to Production: How to use AI Code Assistants for Python Data Systems</summary>
            <description>This **90-minute hands-on tutorial** shows how to **design, build, and deploy Python data pipelines and data agents** using AI coding assistants in a **supervised engineering workflow**.

### Outline

- **The state of AI code generation** for data engineering  
- Designing **collaborative Human/LLM development loops**  
- Building a **data pipeline with structured AI assistance**  
- Creating a **simple data agent**  
- Deploying and operating **Python workloads in production** using **Tower.dev** 
- Using **logs, observability, and runtime feedback** to guide AI-driven refactoring  
- **Best practices, risks, and guardrails**

Participants will leave with **practical patterns for integrating AI into real-world data engineering workflows**, from **prototype to production**.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Tutorial</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/RRLTBU/</url>
            <location>Ferrum [2nd Floor]</location>
            
            <attendee>Serhii Sokolenko</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>KKCYJN@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-KKCYJN</pentabarf:event-slug>
            <pentabarf:title>Your First Open Source Contribution in Python: From Fork to Pull Request</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260414T143000</dtstart>
            <dtend></dtend>
            <duration>013000</duration>
            <summary>Your First Open Source Contribution in Python: From Fork to Pull Request</summary>
            <description>Open source is a core pillar of the Python ecosystem, yet many developers struggle to make their first contribution. The barriers are often not technical ability, but uncertainty around workflows, expectations, and collaboration practices.

This 90-minute hands-on tutorial guides participants through their first real contribution to an open source Python project, focusing on clarity, safety, and reproducibility. Rather than working on toy examples, attendees will contribute to ScanAPI, an actively maintained open source Python library used for automated API integration testing and live documentation.

The tutorial is designed to demystify the contribution process while remaining technically grounded and respectful of real-world open source practices.

What participants will learn:

1. Understanding an Open Source Python Project
- How to quickly navigate an unfamiliar Python repository
- Reading project structure, tests, and documentation
- Understanding contribution guidelines and expectations

2. Open Source Workflow in Practice
- Forking and cloning a repository
- Creating a local development environment
- Working with branches and commits

3. Making a First Contribution
- Working on a well-scoped, beginner-friendly issue
- Writing or updating Python code, tests, or documentation
- Running tests locally and validating changes

4. Opening a Pull Request
- Writing a clear and respectful pull request description
- Understanding automated checks (CI)
- Responding to maintainers’ feedback

5. Contributing Sustainably
- How to continue contributing after the workshop
- Common mistakes to avoid
- How open source communities scale through good engineering and collaboration

All tutorial tasks are carefully scoped and prepared in advance to ensure a smooth experience within the 90-minute timeframe. Participants will leave with a forked repository, a commit, and a pull request opened or ready, as well as the confidence to contribute to other Python open source projects.

Why ScanAPI?

ScanAPI is a production-grade Python library distributed via PyPI and maintained in the open. It has been recognized by GitHub as part of initiatives focused on securing the open source supply chain, making it an excellent real-world example of sustainable Python open source development. The project is supported by the Cumbuca Dev open source community, which focuses on building inclusive, contributor-friendly environments through strong engineering practices.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Tutorial</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/KKCYJN/</url>
            <location>Ferrum [2nd Floor]</location>
            
            <attendee>Camila Maia</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>ZYUJH3@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-ZYUJH3</pentabarf:event-slug>
            <pentabarf:title>How to Search Through 800 Billion Records in Real Time</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260414T163000</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>How to Search Through 800 Billion Records in Real Time</summary>
            <description>Large-scale distributed systems rarely produce clean data streams. In practice, hundreds of services continuously emit overlapping updates, retries, corrections, and partial state. Turning that constant stream of noisy events into a reliable, searchable dataset in real time, while processing hundreds of billions of records per day, requires careful architectural choices. 

This talk shares practical lessons from building a Kafka-based ETL pipeline that transforms massive volumes of events into a coherent dataset suitable for real-time search. After a brief overview of the system architecture, we focus on several key techniques: reducing redundant processing through key deduplication and short-lived buffers, defining when messages can be safely acknowledged without risking data loss, and keeping long-running ETL services healthy under heavy Kafka workloads.

The session emphasizes concrete engineering trade-offs and operational realities rather than theory. Attendees will leave with practical patterns for building more reliable and efficient streaming pipelines.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/ZYUJH3/</url>
            <location>Ferrum [2nd Floor]</location>
            
            <attendee>Mirano Tuk</attendee>
            
            <attendee>Filip Bacic</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>BAXEXY@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-BAXEXY</pentabarf:event-slug>
            <pentabarf:title>Agent-Based Hyperparameter Optimization for Gradient Boosted Trees</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260414T171000</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Agent-Based Hyperparameter Optimization for Gradient Boosted Trees</summary>
            <description>### Why This Problem Matters in Practice
                                                                    
  Hyperparameter tuning consumes a disproportionate amount of experimentation time, yet most tuning failures stem from recurring structural issues — not random chance. Experienced practitioners can spot these patterns, but automated optimizers only see scalar objective values.
                                                                                                                                                               
###  What Is New or Different                                                                                                                                     
                                                                                                                                                               
  This work reframes hyperparameter optimization as an iterative reasoning process rather than a pure search problem. Intermediate diagnostic artifacts (parameter importance, generalization gaps, plateau signals) become first-class inputs that guide subsequent decisions. Encoding this reasoning via agents enables systematic reuse of expert heuristics that are otherwise applied informally.                                                                         
                  

###  Scope and Limitations

The case study uses LightGBM as the sample demo, but the architecture is generic and can be applied to any ML model. The talk explicitly discusses scenarios where agent-based optimization adds limited value or introduces unnecessary complexity.                                                                                                              
                  
###  Audience Takeaways

  Attendees will gain:
  - A blueprint for putting an LLM in any decision loop with guardrails
  - If you do ML: a new way to think about HPO                         
  - If you don&#x27;t: a reusable pattern for agent-driven automation</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/BAXEXY/</url>
            <location>Ferrum [2nd Floor]</location>
            
            <attendee>Huijo Kim</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>AF9DNH@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-AF9DNH</pentabarf:event-slug>
            <pentabarf:title>SQL is Dead, Long Live SQL: Engineering reliable analytics agent from scratch</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260414T114500</dtstart>
            <dtend></dtend>
            <duration>013000</duration>
            <summary>SQL is Dead, Long Live SQL: Engineering reliable analytics agent from scratch</summary>
            <description>This session is a &quot;reality check&quot; for AI analytics. We combine theory with engineering to answer one question: Where are the limits of Text-to-SQL? Participants will experience the frustration of a hallucinating LLMs and the satisfaction of fixing it with a realistic minimalist local setup.

Learning objectives:
1. Map the limits: Identify exactly where LLMs break (e.g., complex joins, specific business logic, non-standard schemas).
2. Bridge the gap: Learn how a semantic layer translates fuzzy English into deterministic SQL.
3. Modern architecture: Overview and hands-on on DuckDB Model Context Protocol (MCP) to give agents standard, safe tools to do analytics.
4. The verdict: Understand why SQL is becoming the &quot;Assembly Language&quot; of the AI era, and why you still need to be fluent in it and what is still missing to just &quot;chat with our data&quot;.

Prerequisites:
- Laptop with Python 3.10+.
- Beginner SQL knowledge (joins, aggregations).
- No prior AI/LLM experience required.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Tutorial</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/AF9DNH/</url>
            <location>Dynamicum [Ground Floor]</location>
            
            <attendee>Mehdi Ouazza</attendee>
            
            <attendee>Dumky de Wilde</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>9ZKYRD@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-9ZKYRD</pentabarf:event-slug>
            <pentabarf:title>A minimalist introduction to Ansible</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260414T143000</dtstart>
            <dtend></dtend>
            <duration>013000</duration>
            <summary>A minimalist introduction to Ansible</summary>
            <description>[Ansible](https://docs.ansible.com/) is a popular Python package for declarative configuration of servers that includes batteries (for example, encrypted vault for secrets and Jinja template engine). As a Swiss Army knife, Ansible is capable of solving my problems but come with many features that novices will not know how to use. This tutorial is hands-on and will guide attendees to learn the core features of Ansible. Attendees must have Podman or Docker installed in the machine they will use during the tutorial.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Tutorial</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/9ZKYRD/</url>
            <location>Dynamicum [Ground Floor]</location>
            
            <attendee>Raniere Silva</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>3JLSEF@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-3JLSEF</pentabarf:event-slug>
            <pentabarf:title>Catch the LLM if you Can: Watermarking LLMs</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260414T163000</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Catch the LLM if you Can: Watermarking LLMs</summary>
            <description>During the talk we will cover:
1. Why Watermarking Matters?
     - What can go wrong when AI-generated content becomes indistinguishable from human writing
     -  Why provenance and transparency are becoming essential to trust and safety.
2. How LLM Watermarking Works?
     - What is a watermark and what isn&#x27;t
     - The core idea behind statistical watermarking
3. Two Key Algorithms implemented using Python&#x27;s established frameworks
     - EXP Watermark: modifying logits with pseudo-random perturbations.
     - KGW Green-List Watermark: partitioning tokens into “green” and “red” lists to bias sampling.
     - Python implementation of the KGW method and comparing it with the EXP method.
4.  How you can use MarkLLM (open-source toolkit)
     - How to use the toolkit for experiments in your own workflows.
5. Real-World Challenges and Limitations
     - How robust and evasive are the current algorithms

Key Takeaways:
     - Watermarking is a promising tool for provenance.
     - Understanding these methods helps build more transparent and trustworthy AI systems.
This talk is for people who:  
   - Care about ethics and privacy in AI and want to understand what watermarking can (and cannot)  solve.
   - Build applications using LLMs and want mechanisms for verifying generated text.
   - Are ML researchers or hobbyists interested in how watermarking algorithms function at a technical level.
   - Work in AI safety, trust &amp; transparency, or responsible AI and need practical tools for content provenance.

Note: No prior experience with LLM architecture is required, basic familiarity with probability is recommended; no advanced math needed.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/3JLSEF/</url>
            <location>Dynamicum [Ground Floor]</location>
            
            <attendee>Subhosri Basu</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>7JXYKH@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-7JXYKH</pentabarf:event-slug>
            <pentabarf:title>Offline Fallback for a Mobile LoRaWAN Gateway</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260414T171000</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Offline Fallback for a Mobile LoRaWAN Gateway</summary>
            <description>LoRaWAN (Long Range Wide Area Network) is widely used for IoT sensor deployments due to its long range and low power consumption. Operating at 868MHz across Europe, it&#x27;s ideal for remote monitoring applications—from water level sensors to asset tracking and personnel location systems. However, traditional LoRaWAN deployments rely on cloud-based network servers, making them vulnerable to internet outages.

**The Challenge**

While networks like The Things Stack provide good geographic coverage, gaps remain—particularly in remote areas where emergency response units operate. A mobile gateway can close these gaps, but standard configurations still require internet connectivity. You could deploy a completely local network with your own network server, but this sacrifices the existing infrastructure&#x27;s coverage. 

**The Solution**

This talk presents a hybrid architecture that combines cloud-based operation with local resilience. The system primarily operates through The Things Stack Sandbox, leveraging its network coverage. Simultaneously, a Raspberry Pi-based mobile gateway decodes all messages from your devices locally in parallel. During normal operation, you benefit from cloud features. When internet connectivity fails, your sensor data remains accessible locally on the gateway.

**Technical Implementation**

The solution consists of:

1. **Raspberry Pi Gateway**: Configured as a mobile LoRaWAN gateway for The Things Stack Sandbox, suitable for vehicle deployment
2. **Session Key Management**: Python service retrieving session keys for your devices via The Things Stack API
3. **Local Message Processing**: Real-time decryption and decoding of LoRaWAN messages without internet dependency
4. **Data Storage**: SQLite-based local storage for reliable data persistence

**Python and JavaScript Integration**

The core implementation uses Python for gateway orchestration, API integration, and data management. For LoRaWAN encryption/decryption and payload decoding, the system leverages existing JavaScript libraries—specifically `lora-packet` and community-maintained device decoders. This talk demonstrates practical patterns for Python/JavaScript interoperability.

**Real-World Context**

Drawing from volunteer emergency response experience, this solution addresses operational requirements where sensor data must remain available regardless of infrastructure status. The system ensures continuity of critical information during incidents.

**What You&#x27;ll Learn**

- Designing resilient edge computing architectures for IoT
- Integrating Python with JavaScript libraries
- LoRaWAN security fundamentals (session keys, encryption)
- Building offline-first systems with SQLite
- API integration with The Things Stack

**Open Source**

Complete implementation available on GitHub, providing a reproducible setup valuable for volunteer organizations, research projects, and scenarios requiring IoT infrastructure that remains operational during connectivity disruptions.

**Target Audience**

Python developers interested in IoT and edge computing. No prior LoRaWAN experience required.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/7JXYKH/</url>
            <location>Dynamicum [Ground Floor]</location>
            
            <attendee>Jannis Lübbe</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>CMDHUN@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-CMDHUN</pentabarf:event-slug>
            <pentabarf:title>&quot;Honey, I vibe coded some crypto&quot; - Security in the age of LLMS</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T090500</dtstart>
            <dtend>20260415T095000</dtend>
            <duration>004500</duration>
            <summary>&quot;Honey, I vibe coded some crypto&quot; - Security in the age of LLMS</summary>
            <description>What only a few years ago started out as smart tab completion turned into a way of working in which a growing number of programmers don&#x27;t even bother to open up an IDE anymore. Let&#x27;s take a moment to contemplate the changing nature of software engineering as a profession, and to explore chances to avoid looming disaster. ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Keynote</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/CMDHUN/</url>
            <location>Merck Plenary (Spectrum) [1st Floor]</location>
            
            <attendee>Gabriela Bogk</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>BLC7FS@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-BLC7FS</pentabarf:event-slug>
            <pentabarf:title>Demystifying Containers with Python: Building a Minimal Engine from Scratch</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T101500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Demystifying Containers with Python: Building a Minimal Engine from Scratch</summary>
            <description>In modern software development, containers have become a standard tool for deploying code. However, they are frequently misunderstood and described as &quot;lightweight virtual machines.&quot; For many developers - especially those transitioning from academia, like myself - the layer between their python code and the operating system kernel is often overlooked. This talk is based on the idea that the best way to understand a concept is to implement it in its simplest form. By bypassing the complexity of modern container orchestrators, we can focus on the fundamental system calls that make isolation possible.

During the session, we will demonstrate the core mechanics of containerization by building a minimal engine in python. We will begin by preparing a root filesystem to show what a container image actually is at its most basic level. We will implement isolation using the `os.chroot()` function to trap a process in a specific directory and will talk about linux namespaces, which isolate what a process can see, and `cgroups`, which limit how much of the hardware resources a process can use.

The main takeaways of this talk include a clear technical distinction between virtual machines and containers and the realization that a container is essentially a process with a restricted view of the host system. You will gain practical knowledge of the `os` module for system-level tasks and the confidence to explore low-level computer science concepts by implementing them in python. By the end of this session, you will have a practical understanding of the basic principles that make containerization possible.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/BLC7FS/</url>
            <location>Merck Plenary (Spectrum) [1st Floor]</location>
            
            <attendee>Alexander Zaytsev</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>GBKUNF@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-GBKUNF</pentabarf:event-slug>
            <pentabarf:title>How to create effective data visualizations</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T105500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>How to create effective data visualizations</summary>
            <description>In this talk, you will learn about:

- **Fundamental principles** of data visualization
    - The Grammar of Graphics
    - Visual hierarchy
    - Data storytelling
- **Best practices** regarding:
    - Which colors to use
    - Visual comparability 
    - Pros/cons of several chart types
    - Context and audience: Adding text and annotations
- The **data visualization landscape in Python**
    - What libraries exist: matplotlib, plotly, altair etc., including add-ons and lesser-known ones
    - What are their differences and strengths?
    - Which library is suited for which usecase?


Equipped with the knowledge presented in this talk, you will understand why certain charts are more aesthetically pleasing and more effective at conveying information than others. Apply the shown principles, take into account best practices and choose the right tools in Python to create more beautiful and impactful data visualizations.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/GBKUNF/</url>
            <location>Merck Plenary (Spectrum) [1st Floor]</location>
            
            <attendee>Dominik Haitz</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>PMMEAG@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-PMMEAG</pentabarf:event-slug>
            <pentabarf:title>Production ML across 2015-2035: A Journey to the Past and the Future</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T113500</dtstart>
            <dtend>20260415T120500</dtend>
            <duration>003000</duration>
            <summary>Production ML across 2015-2035: A Journey to the Past and the Future</summary>
            <description># Outline

1) Motivations;
2) MLOps Foundations;
3.1) The Past - 2015 - Genesis;
3.2) The Past - 2018 - Messy Innovation;
3.3) The Past - 2023 - LLMOps;
4) The Future - 2025-2035 Outlook;
5) Reflections.

# Description

The lifecycle of a machine learning model only begins once it’s in production. In this talk we take a practical journey through the last decade of production ML, tracing back the early beginnings of MLOps to the respective research and projects that helped drive the movement forward. We cover how the ecosystem went through explosive growth through COVID with a broad range of tools and vendors tacking similar problems in very different ways. We then talk about the most recent trends in LLMOps which has shifted the stack from training-centric to inference-centric as pre-trained models have become broadly available. Namely on how the locus of engineering moves to the application layer (ie inference time), introducing new artifacts such as prompts, vector databases, and tool metadata, and accelerating another wave of ecosystem heterogeneity. 

With those lessons in place, we look forward to 2035 through a set of pragmatic milestones for consolidation and standardization: how monitoring and observability become more ubiquitous, how MLOps and LLMOps stacks align, how time-to-production compresses, and how operations gradually evolves toward more autonomous patterns (progressive rollouts, agent-assisted RCA, and early self-healing behaviors). 

Finally, we close with actionable guidance grounded in production reality: how to right-size platform complexity to organizational scale, where to invest early to reduce future operational debt, and how to increase the scale of ML delivery while actively reducing system complexity. Attendees should leave with a coherent mental model of the MLOps landscape, a sharper understanding of why production ML remains hard, and a concrete set of engineering priorities for building reliable ML systems through the next decade.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/PMMEAG/</url>
            <location>Merck Plenary (Spectrum) [1st Floor]</location>
            
            <attendee>Alejandro Saucedo</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>LPUC9T@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-LPUC9T</pentabarf:event-slug>
            <pentabarf:title>The Multimodal Era of Machine Learning (and How Python Made It Possible)</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T132500</dtstart>
            <dtend>20260415T141000</dtend>
            <duration>004500</duration>
            <summary>The Multimodal Era of Machine Learning (and How Python Made It Possible)</summary>
            <description>Multimodal learning—systems that combine vision, language, audio, and other sensory inputs—has moved from a niche research topic to a central paradigm in modern machine learning. Today’s most influential models no longer operate on a single modality but instead learn rich representations by combining language with images, videos, sound. This shift has fundamentally changed how we build, train, and evaluate current machine learning systems. Python has played a decisive role in this transformation. Acting as a unifying layer across modalities, Python enabled researchers and practitioners to seamlessly combine computer vision, natural language processing, and speech within a single ecosystem. Python-based frameworks lowered the barriers between research communities, and accelerated the rise of large-scale, weakly supervised, and foundation models. However, this success has also introduced new challenges. The ease of experimentation masks growing issues around scalability, reproducibility, and evaluation. Multimodal systems increasingly depend on complex Python-based stacks whose abstractions can obscure underlying assumptions and costs.
This keynote will reflect on the current state of multimodal learning, examine how Python shaped its trajectory, and critically discuss the technical and conceptual challenges that lie ahead aiming to provide a perspective on where machine learning in general and multimodal learning in particular is succeeding, where it is struggling, and what role the Python community can play in shaping its next phase.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Keynote</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/LPUC9T/</url>
            <location>Merck Plenary (Spectrum) [1st Floor]</location>
            
            <attendee>Hilde Kühne</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>QBJRBJ@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-QBJRBJ</pentabarf:event-slug>
            <pentabarf:title>PyLadies Fireside Chat</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T142000</dtstart>
            <dtend>20260415T152000</dtend>
            <duration>010000</duration>
            <summary>PyLadies Fireside Chat</summary>
            <description>Join us for this fireside chat, where Tereza Iofciu sits down with Dawn Gibson Wages, community and DevRel lead at Anaconda with a passion for local-first AI and Python environments,  and Jessica Greene, Senior ML Engineer at Ecosia and PyLadies community lead, for a candid conversation about building with Python in the age of AI. Careers, craft, community, and the questions the hype tends to skip.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Panel</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/QBJRBJ/</url>
            <location>Merck Plenary (Spectrum) [1st Floor]</location>
            
            <attendee>Tereza Iofciu</attendee>
            
            <attendee>Dawn Wages</attendee>
            
            <attendee>Jessica Greene (she/her)</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>CNNUZC@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-CNNUZC</pentabarf:event-slug>
            <pentabarf:title>Start-Ups &amp; Investors</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T161500</dtstart>
            <dtend>20260415T172500</dtend>
            <duration>011000</duration>
            <summary>Start-Ups &amp; Investors</summary>
            <description>The Python and AI community is full of people who build tools, train models, and solve hard problems — but the leap from project to product often feels like a different world entirely. This panel closes that gap.

Four women from very different backgrounds — a former SAP SVP turned startup investor, a TU Darmstadt researcher turned Forbes 30 Under 30 founder, a venture capital managing partner, and an AI startup ecosystem builder — share what founding and funding a company actually looks like. 

No polished success stories, no pitching. Just real talk about first steps, financing, team building, and the support systems that exist but few people know about.

Why this panel at PyCon DE &amp; PyData? Because the people in this room are exactly who Germany&#x27;s AI and open-source startup ecosystem needs. You understand the technology. You work with data. You build things that work. 

What&#x27;s often missing isn&#x27;t the idea or the skill — it&#x27;s the confidence, the network, and the knowledge of how to start. This panel provides all three.

We&#x27;ll cover five themes: the spark that starts a founding journey, the reality behind startup clichés, what technical founders need beyond code, how to find the right networks and funding, and concrete first steps anyone can take.

The panel is especially aimed at developers considering turning a side project into a startup, researchers exploring the path from paper to product, and professionals in industry wondering whether the leap from a corporate career is right for them. We also want to actively encourage more women to see themselves as founders — which is why representation on this stage matters.

Whether you leave with a concrete next step, a new contact, or simply a more realistic picture of what founding looks like — this session is designed to make the startup world feel less like an exclusive club and more like a path that&#x27;s open to you.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Panel</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/CNNUZC/</url>
            <location>Merck Plenary (Spectrum) [1st Floor]</location>
            
            <attendee>Ina Schlie</attendee>
            
            <attendee>Carlina Bennison</attendee>
            
            <attendee>Sara Jourdan</attendee>
            
            <attendee>Jovana Walter</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>UFGB7T@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-UFGB7T</pentabarf:event-slug>
            <pentabarf:title>Lightning Talks 2</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T181000</dtstart>
            <dtend>20260415T192500</dtend>
            <duration>011500</duration>
            <summary>Lightning Talks 2</summary>
            <description>Lightning Talks 2</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Lightning Talks</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/UFGB7T/</url>
            <location>Merck Plenary (Spectrum) [1st Floor]</location>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>V8DNCL@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-V8DNCL</pentabarf:event-slug>
            <pentabarf:title>Wetterdienst: Fast, Unified Access to Open Weather Data with Polars</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T101500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Wetterdienst: Fast, Unified Access to Open Weather Data with Polars</summary>
            <description>## Problem
Accessing weather data means wrestling with inconsistent APIs, formats, and units—slowing down data engineering and making pipelines hard to reproduce.

## Solution
Wetterdienst is a Python library providing a unified, Polars-first interface to multiple open weather services (DWD, ECCC, EA, NOAA/NWS, Geosphere Austria, IMGW, Eaufrance, WSV, and more). It standardizes request patterns, returns tidy long-format data in SI units, and handles caching, timezones, and retries—so teams can focus on analysis instead of plumbing.

## Core concepts:
- **Polars-first** — All data operations use Polars (v1.15+); pandas supported for some I/O
- **Declarative request pattern** — Provider → stations → values; tidy/long output by default
- **Sensible defaults** — UTC timestamps, SI units, humanized parameter names
- **Reliability** — Disk-based caching via diskcache, stamina-based retries, timezone handling
- **Provider architecture** — Consistent interfaces across DWD, ECCC, EA, NOAA/NWS, Geosphere, IMGW, Eaufrance, WSV, and more
- **Multiple interfaces** — Python API, CLI, and REST

## Outline
- Introduction
- Journey — How Wetterdienst came to life
- Wetterdienst — Architecture, concepts, and request patterns
- Value — What wetterdienst offers you, me and everyone else
- Demo — Live: station discovery, timeseries retrieval, station metadata, climate stripes and more via app

## Target Audience
Data engineers, scientists, and platform teams who need reliable weather data for analytics, ML, and operations.

## Prerequisites
Basic Python and DataFrame experience (Polars or pandas); familiarity with ETL/ML pipelines helpful.

## Key Takeaways
- A unified, Polars-first workflow to access and normalize open weather data
- Practical patterns for station discovery, timeseries retrieval, unit conversion, and caching
- How to integrate Wetterdienst via Python, CLI, and REST, and export to common formats and databases

## Links
📦 Repo https://github.com/earthobservations/wetterdienst
📖 Docs https://wetterdienst.readthedocs.io/
🌐 App https://wetterdienst.eobs.org/
💡 Examples https://github.com/earthobservations/wetterdienst/tree/main/examples</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/V8DNCL/</url>
            <location>Titanium [2nd Floor]</location>
            
            <attendee>Benjamin</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>PARU7X@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-PARU7X</pentabarf:event-slug>
            <pentabarf:title>From Struggling to Mastery: A Practical Guide to Data Pipeline Operations</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T105500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>From Struggling to Mastery: A Practical Guide to Data Pipeline Operations</summary>
            <description>The Problem: The &quot;it works on my machine&quot; trap. As data teams grow, ad-hoc processes that worked for a single engineer crumble under the weight of production requirements. Teams often know they need to improve, but they lack a unified definition of success. Without clear standards, it is impossible to measure progress.

This talk presents a comprehensive Operational Excellence Maturity Pyramid, designed to guide data teams from chaos to stability. We will explore a 5-level classification system (Struggling, Basic, Decent, Strong, and Mastery) applied across three foundational pillars of data engineering.

1. Orchestration Maturity We will move beyond simple cron jobs and local scripts.

- Struggling: Manual scheduling, no dependency management, lack of idempotency.

- Mastery: Dynamic DAGs, event-driven triggers, automated backfills, modular infrastructure-as-code, and self-healing pipelines and more.


2. Data Quality Maturity Data trust is hard to gain and easy to lose. We will define how to shift from reactive to proactive quality management.

- Struggling: No testing program; quality issues are discovered by stakeholders downstream.

- Mastery: Comprehensive coverage (Write-Audit-Publish patterns), automated anomaly detection, and &quot;circuit breakers&quot; that stop bad data before it hits the warehouse.

3. Data SLOs (Service Level Objectives) Maturity You cannot improve what you do not measure.

Struggling: Undefined targets; &quot;best effort&quot; delivery.

Mastery: Fully measurable SLIs (Service Level Indicators), defined Error Budgets, and automated alerting on burn rates.

-- What You Will Learn: This session is not just theoretical; it is a practical guide for data engineers, platform leads, and managers. By the end of this talk, you will be able to:

- Audit your current stack: Use the provided scorecard to classify your team&#x27;s maturity level in each pillar.

- Identify gaps: Understand exactly why you are stuck at the &quot;Basic&quot; or &quot;Decent&quot; levels.

- Plan your roadmap: Walk away with actionable steps to advance to the next level, turning your data operations into a competitive advantage rather than a maintenance burden.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/PARU7X/</url>
            <location>Titanium [2nd Floor]</location>
            
            <attendee>Akif Cakir</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>9MUDUY@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-9MUDUY</pentabarf:event-slug>
            <pentabarf:title>Empowering Data Scientists with Zero Platform Friction: Deploying Streamlit &amp; Friends in 3 Minutes</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T113500</dtstart>
            <dtend></dtend>
            <duration>004500</duration>
            <summary>Empowering Data Scientists with Zero Platform Friction: Deploying Streamlit &amp; Friends in 3 Minutes</summary>
            <description>This session is for anyone who has built a Streamlit (or Dash, R Shiny, FastAPI, React) prototype and then hit the wall when it needed to be shared with real users: access to live data, SSO, permissioning, deployment, and operational guardrails.

We will present the workflow and the architecture from both sides: as a data scientist shipping an app, and as a platform admin operating the service safely at scale.

## What we will demo
We will demo the end-to-end workflow from zero to a running app using our internal app service. The platform includes a web console for self-service provisioning and configuration and the deployment runtime managing the state of the application.

- Using the web console to create and configure a new app from a framework template (Streamlit, Dash, R Shiny, FastAPI, React).
- How a Git repository is created and the first version is deployed behind the scenes, including a working starter app with example pages.

## Key design decisions (the parts that are usually hard)
- Identity propagation: the app receives the signed-in user identity from SSO and uses it for downstream authorization.
- Authorization at the data layer: dataset permissions are scoped to use-case resource, making sure tokens can not be exploited.
- Safe multi-tenancy: per-app isolation plus resource limits to prevent noisy-neighbor problems.
- Repeatable delivery: templates plus CI/CD conventions so a new app starts from a working, deployable baseline.
- Day-2 operations: guardrails like quotas, rate limiting, and idle shutdown to keep the platform reliable and cheap.

## Running at scale
- Production usage: 750+ active apps and 8k+ unique end users (2025).
- Infrastructure run rate under 10k USD per month (excluding engineering time).

## Who should attend
- Data scientists and analysts who want to ship apps beyond a demo.
- Data platform and DevOps engineers building self-service tooling for governed environments.
- Teams standardizing how internal data &amp; AI products are delivered to business users.

## Takeaways
- For data scientists: what a good internal app hosting platform should provide, and which requirements you should ask your platform team for (governed on-behalf of data access, templates, CI/CD, guardrails).
- For platform teams: a blueprint you can adapt beyond AWS, including the architecture and tradeoffs necessary to operate fine-grained authorization and a multi-tenant runtime at scale.

**If you do not have such an app platform in your company yet, use this talk as a checklist to start the conversation with your IT or platform teams. :-)**</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk (long)</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/9MUDUY/</url>
            <location>Titanium [2nd Floor]</location>
            
            <attendee>Bernhard Schäfer</attendee>
            
            <attendee>Nicolas Renkamp</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>ZESFRG@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-ZESFRG</pentabarf:event-slug>
            <pentabarf:title>Learnings Building DevOps as a Software Engineer</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T142000</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Learnings Building DevOps as a Software Engineer</summary>
            <description>What do you do when you join a company as a software engineer, and there’s zero DevOps in place—but product delivery can’t wait? In this talk, I’ll share firsthand insights from building core DevOps infrastructure from the ground up, while simultaneously delivering the first software products under tight deadlines.
I’ll outline the key priorities and quick wins that enabled rapid, reliable releases—such as setting up basic CI/CD pipelines, introducing automated tests, and using containerization for reproducible deployments. Rather than aiming for “perfect” infrastructure from day one, I’ll show how to build DevOps foundations incrementally and pragmatically, integrating automation step by step as part of everyday development work.
Through practical examples, I’ll discuss how to achieve reliability without losing agility, how to avoid common pitfalls in “build as you go” DevOps, and how to balance product delivery with infrastructure improvements. Attendees will leave with actionable tips on how to bootstrap DevOps quickly, so teams can ship software confidently—even when starting from scratch.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/ZESFRG/</url>
            <location>Titanium [2nd Floor]</location>
            
            <attendee>Gaweng Tan</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>ZLRFR9@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-ZLRFR9</pentabarf:event-slug>
            <pentabarf:title>Architecture Under Constraints: Designing Systems That Still Evolve</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T150000</dtstart>
            <dtend></dtend>
            <duration>004500</duration>
            <summary>Architecture Under Constraints: Designing Systems That Still Evolve</summary>
            <description>Modern systems rarely exist in ideal conditions. They grow over years, integrate with legacy services, operate under regulatory or security constraints, and are shaped by organizational boundaries just as much as by code. Yet architectural guidance often assumes greenfield projects and unlimited freedom.

This talk focuses on architectural decision-making under real-world constraints, using  systems as the primary lens. Rather than discussing specific frameworks or patterns, it presents a practical way of thinking about architecture when trade-offs are unavoidable and decisions must hold up over time.

Drawing from experience in regulated production environments, we will explore how to distinguish true constraints from accidental ones, how to think in terms of long-lived capabilities rather than short-lived components, and how to preserve optionality even when systems appear “locked in.” Examples will touch on Python-heavy platforms such as backend services, internal tools, data pipelines, and automation systems.

The session also addresses the human side of architecture: how Staff+ engineers and technical leaders communicate trade-offs, document decisions in a way that survives team changes, and align engineering, product, and compliance perspectives without over-engineering.

This talk is aimed at experienced engineers, tech leads, and engineering leaders who want to design systems that can evolve - even when constraints dominate the problem space.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk (long)</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/ZLRFR9/</url>
            <location>Titanium [2nd Floor]</location>
            
            <attendee>Eduard Thamm</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>EXXWMV@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-EXXWMV</pentabarf:event-slug>
            <pentabarf:title>Black Hole Stars: An Astronomical Mystery (Mostly) Solved with NumPyro and JAX</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T161500</dtstart>
            <dtend>20260415T164500</dtend>
            <duration>003000</duration>
            <summary>Black Hole Stars: An Astronomical Mystery (Mostly) Solved with NumPyro and JAX</summary>
            <description>The James Webb Space Telescope (JWST) has transformed extragalactic astronomy. In particular, it has uncovering a puzzling population of compact, red objects in the distant universe known as &quot;Little Red Dots&quot; (LRDs). These sources exhibit distinctive features not seen in typical galaxies and therefore their nature remains a subject of intense debate. Are they supermassive black holes hiding behind screens of dust? Massive dead galaxies appearing too early in the universe? Or something entirely new? To find out, we need to perform computationally heavy statistical analysis on the astronomical data. However traditional tools have been too slow or made many assumptions to reduce the complexity of the JWST data that can lead to inaccurate results.  

In this talk, I will introduce the modern Python stack that now makes this possible: JAX and NumPyro. JAX allows you to write standard Python code that runs on GPUs and automatically computes derivatives, while NumPyro leverages that power for incredibly fast statistical modeling. We will start with the basics, using simple examples to demonstrate how JAX can speed up existing workflows and how NumPyro makes Bayesian inference accessible. 

Then, we will look at the &quot;Little Red Dot&quot; mystery as a case study. I will show how we built a custom inference engine (`unite`) to process thousands of JWST observations. By leveraging JAX&#x27;s speed and NumPyro&#x27;s flexibility, we were able to efficiently and accurately test complex physical models against the data, uncovering evidence that these unique may in fact be supermassive black holes embedded in dense gas clouds: essentially, stars powered not by fusion but by black holes.

This talk is for anyone interested in high-performance Python and especially for (data) scientists interested in modern scientific methods in designing scalable inference pipelines. You will leave with a solid introduction to JAX and NumPyro and an appreciation for how these tools are already helping solve the Universe&#x27;s greatest mysteries.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/EXXWMV/</url>
            <location>Titanium [2nd Floor]</location>
            
            <attendee>Raphael Hviding</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>LVRLSU@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-LVRLSU</pentabarf:event-slug>
            <pentabarf:title>&quot;You are an intelligent business analyst&quot;: how i learned to talk to business</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T165500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>&quot;You are an intelligent business analyst&quot;: how i learned to talk to business</summary>
            <description>I never planned to become a business analyst. In fact, I avoided it. I imagined endless meetings, unclear requirements, and conversations that had nothing to do with “real” technical work. I wanted to stay hands-on as a developer and data scientist.

But reality proved something important: you can&#x27;t escape the business side if you want to build meaningful solutions. And once I learned how to talk to business stakeholders, everything changed: my impact, my influence, and the outcomes of the projects I worked on.

In this talk, we’ll explore the practical business skills every developer needs but is rarely taught:
• How to identify key stakeholders and understand what they really want
• How to navigate communication in international, cross-functional teams
• How to uncover business pain points before they become blockers
• How to fix broken communication loops
• How to become the go-to technical partner the business trusts

By the end, you won’t just see yourself as a strong technical contributor: you’ll see how to position yourself as an essential part of the broader business ecosystem, shaping better decisions and delivering solutions that truly matter.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/LVRLSU/</url>
            <location>Titanium [2nd Floor]</location>
            
            <attendee>Darya Petrashka</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>QB7VLW@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-QB7VLW</pentabarf:event-slug>
            <pentabarf:title>Ty mypy:  The New Generation of Python Type Checking</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T173500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Ty mypy:  The New Generation of Python Type Checking</summary>
            <description>Static typing in Python has matured significantly over the past decade, with mypy becoming the de facto standard for many teams. At the same time, developers continue to struggle with slow feedback loops, noisy errors, and friction in CI and local workflows. ty, a new type checker from Astral.sh, aims to address these issues with a fundamentally different set of design priorities—and it has now reached a post-alpha, production-ready stage.

This talk takes a practical, experience-based look at ty from the perspective of a Python developer using it on real code. We’ll start by briefly reviewing the current state of Python type checking and the problems that motivated ty’s design. From there, we’ll dive into ty’s feature set, performance characteristics, and developer experience, focusing on what actually changes when type checking becomes fast and ergonomic enough to feel “always on.”

A central part of the talk will be a direct comparison with mypy: where ty already excels, where it behaves differently, and where mypy remains the better choice today. Rather than framing this as a replacement story, we’ll explore the trade-offs between the two tools and what kinds of teams benefit most from each.

By the end of the session, attendees will have a clear mental model of how ty works, how mature it is today, and whether it’s a good fit for their own projects. More broadly, we’ll look at what ty signals about the future direction of Python’s typing ecosystem.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/QB7VLW/</url>
            <location>Titanium [2nd Floor]</location>
            
            <attendee>Stefan Kraus</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>93SXWY@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-93SXWY</pentabarf:event-slug>
            <pentabarf:title>Ship Data with Confidence: Declarative Validation for PySpark &amp; Pandas</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T101500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Ship Data with Confidence: Declarative Validation for PySpark &amp; Pandas</summary>
            <description>This session introduces a practical, open-source solution to a critical challenge facing data engineers and scientists: how to proactively guarantee data quality. In today&#x27;s fast-paced development cycles, data pipelines are increasingly complex and reliant on numerous upstream sources, elevating the risk of data quality issues that have the potential to cause production failures. While monitoring and alerting systems are essential for flagging these failures, they are fundamentally reactive; their value is entirely dependent on the quality and coverage of the underlying validation logic that engineers must build and maintain. The true goal is to shift from reactive clean-up to proactive prevention. This talk demonstrates a more effective approach: stopping bad data from ever reaching production by embedding clear, declarative validation directly into your data pipelines. This provides immediate visibility into errors, allowing you to catch and fix data quality issues at the earliest possible stage of development.

[dataframe-expectations](https://github.com/getyourguide/dataframe-expectations) is an attempt to address this problem through a lightweight, open-source Python library designed for declarative data validation in both PySpark and Pandas. This session will explore the key design choices behind its implementation and architecture, including its lightweight nature, which ensures the library doesn&#x27;t become a bottleneck by impacting CI/CD run times or bloating container image sizes, making it ideal for data pipelines, unit tests and end-to-end tests alike. Through examples, we will walk through its fluent, chainable API and showcase its extensive list of reusable, parameterized expectations. We will then dive into advanced features, including powerful decorator-based validation that seamlessly integrates quality checks into your existing code, and a flexible tag-based filtering system that allows you to dynamically decide which expectations to run at runtime.

Attendees will leave with a clear, actionable strategy for integrating declarative data quality checks into their pipelines, understanding how a simple, extensible tool can dramatically increase the reliability of their data products and, ultimately, their development velocity.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/93SXWY/</url>
            <location>Helium [3rd Floor]</location>
            
            <attendee>Ryan Sequeira</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>B8KVNJ@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-B8KVNJ</pentabarf:event-slug>
            <pentabarf:title>Accuracy Is Overrated: Ship Stable Forecasts (Without Lying to Yourself)</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T105500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Accuracy Is Overrated: Ship Stable Forecasts (Without Lying to Yourself)</summary>
            <description>Forecasting talks love a clean ending: “and then we improved WMAPE by 3.7%.”
Nice. Now put that model into production without suffering from instability.

Because here is what users actually see: the forecast changes every week. The “one-year view” jumps 15 to 20 percent because you retrained on three extra Mondays. Planning teams redo decisions. Operations loses trust. Your model becomes an expensive random-number generator with excellent dashboards.

This talk is about forecast stability: how much your future forecast moves when you add a small amount of new data, retrain, and run the same pipeline again. Not error versus actuals. Forecast versus forecast.

You will see a simple but uncomfortable experiment: 

- Taking a demand-style time series dataset with seasonality, promotions, and noise (Kaggle competition style).
- Training a model and produce a one-year-ahead forecast.
- Adding a few recent weeks of data, retrain, forecast again.
- Measuring how much the overlapping horizon changed.

We repeat this across model families people actually use:

- Statistical baselines like ETS and ARIMA
- Prophet
- Feature-based ML with lag features such as XGBoost
- AutoML and ensembles with AutoGluon TimeSeries
- Neural and global models where relevant
- And yes, what happens when you add an API model like TimeGPT into the mix (no hype, just behaviour under updates)

You will see something totally &quot;unexpected&quot;: a model can be “accurate” and still be operationally useless because its forecast revisions are chaotic. And you will see the opposite too: models with slightly worse headline accuracy that people actually trust, because next year does not get rewritten every week.

This is not a philosophical debate. It is a measurable property of forecasting systems that most teams never track.

So what do we do about it?
We focus on techniques that improve stability without turning forecasts into fossils:

1) Reconciliation
Hierarchical and temporal reconciliation as a stabiliser, not just a coherence tool. If SKU-level forecasts panic while higher-level signals stay calm, reconciliation can prevent nonsense from propagating into decisions.

2) Ensembling and origin ensembling
Combining models is not only about accuracy. Averaging forecasts across models and across forecast origins dampens noise and makes forecast updates behave like signals instead of mood swings.

Who this talk is for:

Forecasting practitioners, data scientists working on demand forecasting, and anyone who has ever heard: “Your model looks good, but I don’t trust it.”

What you’ll take away:

- A methodology to measure forecast stability using forecast-to-forecast change.
- A mental model for when forecast revisions are useful and when they are just noise.
- Practical patterns you can implement immediately in Python to make forecasts calmer without hiding real change.

If you optimise only accuracy metrics, you are grading homework.
If you care about stability, you are building a forecasting product.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/B8KVNJ/</url>
            <location>Helium [3rd Floor]</location>
            
            <attendee>Illia Babounikau</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>Q9DU8N@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-Q9DU8N</pentabarf:event-slug>
            <pentabarf:title>Causal Inference through the lens of probabilistic programming</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T113500</dtstart>
            <dtend></dtend>
            <duration>004500</duration>
            <summary>Causal Inference through the lens of probabilistic programming</summary>
            <description>Why should you use a Probabilistic Programming Language (PPL) for Causal Inference? Because causal problems are inherently about uncertainty and structure—two things PPLs handle natively.

In this session, we will demonstrate how to translate causal diagrams (DAGs) directly into code, using PyMC and NumPyro to estimate causal effects with rigorous uncertainty quantification. We will cover three distinct levels of complexity, drawing on real-world examples and recent research:

1. The &quot;Simple&quot; Case: Enhancing A/B Tests Even in randomized experiments, PPLs provide massive value. We will show how to:

    - Use Prior Predictive Checks to prevent &quot;silly&quot; estimates (Twyman&#x27;s Law) by incorporating domain knowledge into priors (e.g., preventing the model from predicting a 1000% lift). We also describe how to perform a *power* analysis in a Bayesian framework.

    - Implement Bayesian CUPED to reduce variance and increase statistical power without collecting more data. We can combine these variance-reduction methods with smarter priors as described above.

2. The Observational Challenge: Confounding &amp; Structure When we can&#x27;t randomize, we must adjust. We will explore (through concrete examples):

    - Backdoor Adjustment: Show how PPLs implement the &quot;do-operator&quot; to estimate Average Treatment Effects (ATE) in the presence of observed confounders.

    - Multilevel Causal Models: Demonstrate how to use multilevel models to account for time-invariant unobserved confounders. We discuss the pros and cons compared with similar methods, such as fixed effects. 

3. The Frontier: Deep Latent Variable Models: What if confounders are unobserved? We will introduce advanced methods combining Deep Learning with Probabilistic Programming:

    - An introduction to the Causal Effect Variational Autoencoder (CEVAE).

By the end of this talk, you will understand how to view causal inference not as a collection of isolated statistical tricks, but as a coherent modeling process powered by probabilistic programming.

### References

* **A/B Testing &amp; Priors:** [Prior Predictive Checks for Metric Lift](https://juanitorduz.github.io/prior_predictive_ab_testing/) &amp; [Power Analysis](https://juanitorduz.github.io/power_sample_size_exclude_null/)
* **Variance Reduction:** [Bayesian CUPED](https://juanitorduz.github.io/bayesian_cuped/)
* **Observational Data:** [Introduction to Causal Inference with PyMC](https://juanitorduz.github.io/intro_causal_inference_ppl_pymc/)
* **Hierarchical Models:** [Multilevel Causal Inference](https://juanitorduz.github.io/ci_multilevel/)
* **CEVAE Paper:** Louizos, C., Shalit, U., Mooij, J., Sontag, D., Zemel, R., &amp; Welling, M. (2017). [Causal Effect Inference with Deep Latent-Variable Models](https://arxiv.org/abs/1705.08821).
* **Code Reference:** Adapting concepts from *CausalML* (Robert Osazuwa Ness), specifically [Chapter 11: Bayesian Causal Graphical Inference](https://github.com/altdeep/causalML/blob/master/book/chapter%2011/Chapter_11_Bayesian_Causal_Graphical_Inference.ipynb).</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk (long)</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/Q9DU8N/</url>
            <location>Helium [3rd Floor]</location>
            
            <attendee>Dr. Juan Orduz</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>AZ7GD3@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-AZ7GD3</pentabarf:event-slug>
            <pentabarf:title>Small Language Models for Tool Calling Are Better Than You Think</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T142000</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Small Language Models for Tool Calling Are Better Than You Think</summary>
            <description>Large language models have been widely used in tool-calling workflows thanks to their strong performance in generating appropriate function calls. However, due to their size and cost, they are inaccessible to small-scale builders, and server-side computing makes data privacy challenging. Small language models (SLMs) are a promising, affordable alternative that can run on local hardware, ensuring higher privacy.

Unfortunately, SLMs struggle with this task - they pass wrong arguments when calling functions with many parameters, and make mistakes when the conversation spans multiple turns. On the other hand, for production applications with specific API sets, we often don&#x27;t need general-purpose LLMs - we need reliable, specialized models.

This talk demonstrates how to increase the accuracy of SLMs (under 8B parameters) for custom tool calling tasks. We will share how leveraging knowledge distillation helps to get the most out of SLMs in low-data settings - they can even outperform LLMs! We will present the whole pipeline from data generation, fine-tuning, and local deployment.

**What you&#x27;ll learn:**

1. **Tool calling:** Different tool calling settings (single and multi-turn)
2. **Distillation**: Using large models as teachers to train specialized, compact models that maintain reliability with lower computational cost.
3. **Tool calling data generation:** Challenges in generating diverse tool calling data.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/AZ7GD3/</url>
            <location>Helium [3rd Floor]</location>
            
            <attendee>Gabi Kadlecova</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>M33SNJ@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-M33SNJ</pentabarf:event-slug>
            <pentabarf:title>Building Non-Biased Synthetic Datasets: What Actually Works (and What Fails)</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T150000</dtstart>
            <dtend></dtend>
            <duration>004500</duration>
            <summary>Building Non-Biased Synthetic Datasets: What Actually Works (and What Fails)</summary>
            <description>This talk focuses on the engineering side of synthetic dataset creation, treating data as a first-class artifact rather than a byproduct of modeling. It presents a concrete, reusable pipeline for building synthetic datasets that are reproducible, bias-aware, and suitable for evaluation.

1. Why Synthetic Data Is Not Automatically “Safe”
We begin by examining common assumptions about synthetic data. While synthetic datasets avoid privacy issues, they often introduce hidden bias, distribution collapse, or label leakage. This section highlights real-world failure modes and explains why many synthetic datasets perform well in benchmarks but fail in practice.

2. What are the Main Properties of Synthetic Data
		1. Simulated Data
		2. Anonymized
		3. Not Copied
		4. Compliant
		5. It is based on statistical property of real data.

3. Defining the Task Before Generating Any Data
A dataset pipeline must start with a clear task definition. We discuss how ambiguous task definitions lead to incoherent data and misleading results, and how to formally specify label semantics, constraints, and negative space before generation begins.

4. Template-Based vs. Free-Form Generation
This section compares controlled template-based generation with unconstrained LLM prompting. We show why decomposing generation into templates, placeholders, and curated value lists dramatically improves consistency, debuggability, and bias control.

5. Bias Control by Construction
Rather than detecting bias after the fact, we show how to prevent it during generation. Topics include balanced entity lists, randomized substitution, avoiding demographic collapse, and preventing unintended correlations between labels and surface patterns.

6. Pipeline Architecture and Tooling
We walk through a practical Python-based pipeline, covering modular generation stages, deterministic sampling, versioning, and reproducibility. Emphasis is placed on making dataset generation repeatable and auditable, just like code.

7. Filtering, Validation, and Quality Gates
Synthetic data must be filtered aggressively. This section covers structural validation, label consistency checks, distributional sanity checks, and lightweight heuristics that catch most generation errors before model training.

8. Measuring Dataset Difficulty and Coverage
We discuss simple, task-agnostic ways to estimate dataset diversity and difficulty, ensuring that synthetic data does not collapse into trivially easy examples or overly clean language.

9. What Did Not Work (and Why)
This section summarizes failed approaches, including direct JSON generation, inline annotation, and large one-shot prompts. Understanding these failures helps avoid repeating common mistakes.

10. When Synthetic Data Is the Right Tool and When It Is Not
We close with guidance on appropriate use cases for synthetic datasets, their limitations, and how they should complement, not replace, real data and human evaluation.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk (long)</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/M33SNJ/</url>
            <location>Helium [3rd Floor]</location>
            
            <attendee>Shiva Banasaz Nouri</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>39MHWT@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-39MHWT</pentabarf:event-slug>
            <pentabarf:title>From Ticket to Draft: How Munich Automates Citizen Inquiries with AI</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T161500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>From Ticket to Draft: How Munich Automates Citizen Inquiries with AI</summary>
            <description>## 3. Session Outline (30 Minutes)

### I. Context &amp; The Pre-Study | **5 min**
* **The Shift:** Transitioning from legacy email communication to **Zammad** within Munich&#x27;s city administration.
* **Proving the Case:** Utilizing LLMs to analyze historical ticket data to calculate automation potential and project significant time savings before development began.

### II. Architecture: Integration &amp; Pipeline | **6 min**
* **Event-Driven Design:** Connecting to Zammad via the city-internal **Kafka** message bus.
* **Real-time Processing:** How new tickets are captured and routed to the AI component seamlessly.

### III. The Two-Stage Process | **12 min**
* **Step 1: Classification &amp; Extraction:** Analyzing thematic context through rule-based logic and LLM-powered information extraction.
* **Step 2: Response Generation:** A **RAG (Retrieval-Augmented Generation)** approach leveraging a knowledge base maintained by subject matter experts.
* **Human-in-the-Loop:** Integrating response drafts into the agent UI for review vs. automated **&quot;dark processing&quot;** for high-confidence categories.

### IV. Scaling &amp; Lessons Learned | **4 min**
* **Multi-Tenant Capability:** Designing for configurability and deployment across various city departments.
* **Key Benefits:** Efficiency gains, response consistency, and establishing a &quot;Single Voice of the City.&quot;

### V. Q&amp;A | **3 min**
* Open discussion on technical tooling, model selection, and legal/privacy frameworks.

---

## 4. Key Takeaways for Attendees

* **Validating Automation:** Techniques for using LLMs to audit historical data and justify development through projected time savings.
* **Practical AI Integration:** How to integrate AI services into existing enterprise infrastructures like Zammad and Kafka.
* **Modular Workflow:** The importance of separating classification from generation for higher system reliability.
* **Operational Insights:** Lessons from scaling AI solutions across diverse governmental branches.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/39MHWT/</url>
            <location>Helium [3rd Floor]</location>
            
            <attendee>Leon Lukas</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>CVPVPK@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-CVPVPK</pentabarf:event-slug>
            <pentabarf:title>Beyond Vibe-Coding: A Practitioner&#x27;s Guide to Spec-Driven Development in AI Engineering</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T165500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Beyond Vibe-Coding: A Practitioner&#x27;s Guide to Spec-Driven Development in AI Engineering</summary>
            <description>AI Engineering is fundamentally about system building. It is the transition from demos to production-grade Python systems that must be scalable, reliable, and testable. In my experience, one way to achieve this consistently with AI-generated code is to stop coding first — and start specifying first.

Spec-Driven Development is a practical methodology for AI-assisted development. It is not about heavy bureaucracy; it&#x27;s about creating a &quot;Single Source of Truth&quot; that both humans and AI agents can rely on.

In this talk, I will walk through a realistic feature in a production-grade retrieval-augmented generation system. I will demonstrate how I used SpecKit — one example of a structured spec workflow, usable with different AI coding assistants — to move from a feature request to a reviewable spec, a research document, interface contracts, and a phased task plan — all before writing a single line of implementation code.

**What You Will Learn:**

- *What is Spec-Driven Development?*
- *The Paradigm Shift:* Why &quot;specifying&quot; may be the new &quot;coding&quot; in a world of Large Language Models.
- *How to use SpecKit as one example of a structured spec workflow* — usable with different AI coding assistants.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/CVPVPK/</url>
            <location>Helium [3rd Floor]</location>
            
            <attendee>Alina Dallmann</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>7PNT37@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-7PNT37</pentabarf:event-slug>
            <pentabarf:title>To nest, or not to nest? Nested data types in Polars with big data</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T173500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>To nest, or not to nest? Nested data types in Polars with big data</summary>
            <description>If you’ve ever designed or used SQL databases in your data science projects perhaps you’ve cringed at the lack of relational structure and data duplication in the design of big data storage and processing. On the other hand, if you’ve spent any considerable time getting dirty with Polars’ vectorized and columnar processing, you’ll also know that this can be somewhat of a moot point. So why bother?

Outline of the talk:

5 minutes: Introduction &amp; origin story. What are Polars nested types? How do they work? Why do they matter?
5 minutes: Back to the future. Advanced queries on nested types, past &amp; present.
5 minutes: Query structure - “Group by” forever baby, versus element-wise.
5 minutes: Storage comparison and the gigabyte scrooge - how a miser decides on a nested Polars structure.
5 minutes: Time is money – How performance stacks up.
5 minutes: Q&amp;A

By the end of the talk, participants will have seen several straightforward examples, as well more advanced illustrations of nested structures in Polars using real-world data. They will be able to identify some key considerations informing their use of nested structures, including query logic, storage and performance.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/7PNT37/</url>
            <location>Helium [3rd Floor]</location>
            
            <attendee>Daniel Finnan</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>DQDEES@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-DQDEES</pentabarf:event-slug>
            <pentabarf:title>State of In-Browser ML: WebAssembly, WebGPU, and the Modern Stack</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T101500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>State of In-Browser ML: WebAssembly, WebGPU, and the Modern Stack</summary>
            <description>Over the last few years, the tooling has matured enough to make &quot;ML in a tab&quot; worth taking seriously. Today, you can execute Python code in a sandboxed environment, ship interactive demos as a single URL, and even run LLM inference entirely on-device, without installations, servers, or sending data anywhere. In this talk, we will give a practical overview of the current in-browser ML stack, focusing on what is realistically possible today and the practical limits you still have to design around.

We will start with interactive environments such as JupyterLite and explain how they work under the hood via Pyodide: what it means to run CPython compiled to WebAssembly, how the filesystem and networking model differ from &quot;normal&quot; Python, and what that implies for performance, I/O, and package support. 

We will then move from notebooks to applications with PyScript, showing how the same building blocks can be used to create shareable browser-based tools. We will also briefly cover the lower-level approach: using Pyodide directly and orchestrating it with JavaScript for granular control over loading, packaging, and data interchange.

Finally, we will cover in-browser inference workflows for both traditional and deep learning models (via ONNX), and LLMs (via wllama and WebLLM), and discuss how WebGPU can accelerate these pipelines.

By the end of the talk, attendees will have a clear overview of the in-browser ML ecosystem and the practical intuition to decide whether it&#x27;s the right choice for your next project.

**Target Audience:** 
This talk can be relevant for a broad audience. However, at least intermediate knowledge of ML / familiarity with Python ML ecosystem is required.

**Outline:**
- Introduction + Motivating examples [4 min]
- Running Python in WebAssembly [6 min]
    - Overview of Pyodide [2 min]
    - Package management [3 min]
    - Runtime and memory constraints [1 min]
- Overview of interactive dev environments / JupyterLite [4 min] 
- Building applications with PyScript and direct Pyodide bindings [7 min]
- On-device ML inference using ONNX, WebGPU, WebLLM, and wllama [5 min]
- Q&amp;A [4 min]</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/DQDEES/</url>
            <location>Platinum [2nd Floor]</location>
            
            <attendee>Oleh Kostromin</attendee>
            
            <attendee>Iryna Kondrashchenko</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>VY3CY7@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-VY3CY7</pentabarf:event-slug>
            <pentabarf:title>Leveraging Hexagonal Architecture When Building Applications</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T105500</dtstart>
            <dtend>20260415T112500</dtend>
            <duration>003000</duration>
            <summary>Leveraging Hexagonal Architecture When Building Applications</summary>
            <description>This talk will cover the following related to hexagonal architecture:

**Introduction**
The hexagonal architecture design pattern, also known as “Ports and Adapters”, was introduced by Alistair Cockburn in the early 2000s. With the increase in usage of LLMs as software development tools, this design pattern can help create clear boundaries within applications and make code more understandable and modifiable by AI tools.

**Core principles and concepts**
In this section, I will discuss the fundamental concepts that make hexagonal architecture effective. This includes, the central application core (business logic/domain), ports (interfaces that define contracts), and adapters (implementations that handle external interactions, for example interaction with a database or external services).

**Benefits and problem-solving capabilities**
The discussion will highlight benefits including enhanced testability, improved maintainability by reducing coupling, and easier technology migration. I&#x27;ll demonstrate how hexagonal architecture addresses common development pain points such as database lock-in, framework dependencies, and the challenge of writing effective unit tests.

**Implementation and real-world case study**
Included in this presentation will be a real-world case study of how hexagonal architecture is implemented in a production application. This example will demonstrate how to handle common scenarios such as database functionality, external API integration, and user management. The case study will show actual Python code, highlighting patterns for repository implementations, service layers, and adapter configurations.

**Conclusion and Q&amp;A**
The presentation concludes with the key takeaways, resources for further learning, and an interactive Q&amp;A session.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Sponsored Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/VY3CY7/</url>
            <location>Platinum [2nd Floor]</location>
            
            <attendee>Luke Gerstner</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>F79RG9@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-F79RG9</pentabarf:event-slug>
            <pentabarf:title>Scaling Data Processing for Training Workloads at DeepL Research with Rust</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T150000</dtstart>
            <dtend></dtend>
            <duration>004500</duration>
            <summary>Scaling Data Processing for Training Workloads at DeepL Research with Rust</summary>
            <description>We set out to replace an inefficient internal file format with an industry standard - a seemingly straightforward task. What we got instead was a descent into memory leak hell.

This talk will walk you through our journey of scaling DeepL&#x27;s data preprocessing and model training pipelines to handle petabyte-scale corpora. When open-source C++-based Python libraries proved too unstable and memory-inefficient, we invested time and resources into developing our own Rust-based tooling and, compared to our previous internal file format, decreased memory load by a factor of 10 and latency until first byte read by a factor of 50.

What we&#x27;ll cover:
• **Why Rust&#x27;s memory safety guarantees matter in practice:** We will provide a direct comparison of our results using C++-based vs Rust-based implementations for data processing libraries.
• **The Rust ecosystem advantage for Python interop:** While C++ offers a fragmented landscape of build systems and tooling choices, Rust provides a canonical path with cargo, maturin, and PyO3—providing a clean interface for everything from GIL management to readable, zero-copy conversions between Rust and Python objects
• **Rust&#x27;s surprisingly friendly features:** Despite its reputation for having a steep learning curve, Rust offers language features that make it genuinely pleasant to work with, even for beginners coming from a Python background: from enums to pattern matching, error handling with Result, and cargo&#x27;s canonical, ergonomic tooling.
• **Rust&#x27;s impact on the arrow ecosystem and data engineering with Python in general:** Besides the well-known impact that Rust-based data processing libraries like polars, Daft, and datafusion are having on the engineering ecosystem, we we will show how the Rust implementation of Arrow called arrow-rs is having a growing impact and expanding the data engineering toolkit by powering an increasing number of great and contributor-friendly processing and introspection tools built in Rust.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk (long)</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/F79RG9/</url>
            <location>Platinum [2nd Floor]</location>
            
            <attendee>Jonas Dedden</attendee>
            
            <attendee>Johanna Goergen</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>BFYYQG@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-BFYYQG</pentabarf:event-slug>
            <pentabarf:title>Build a web coding platform with Python, run in WebAssembly</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T161500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Build a web coding platform with Python, run in WebAssembly</summary>
            <description>### The Problem
Building interactive Python learning platforms traditionally requires server infrastructure to execute user code, creating security risks and operational overhead. What if we could run Python entirely in the browser?

### The Solution
This talk presents a coding platform built with Holoviz Panel that executes Python through WebAssembly via Pyodide. The entire application – UI and code execution – runs client-side, eliminating backend complexity while providing safe, isolated Python execution.

### Architecture Overview
The platform combines three key technologies:
- **Holoviz Panel** for building the interactive interface with its built-in code editor component
- **Pyodide** for secure Python execution via WebAssembly in the browser
- **LocalStorage** for persisting student progress without a database

### Key Features
The platform supports multiple learning modalities:
- Coding exercises validated against pre-defined test cases – from simple variable assignments to complete functions with return values or print statement
- Interactive playground that evaluates expressions and captures output
- Single and multiple-choice questions for concept checks

### What You&#x27;ll Learn
This talk covers the technical integration between Panel&#x27;s UI framework and Pyodide&#x27;s execution environment – the critical piece that makes browser-based Python coding work. Attendees will understand:
- How to architect client-side Python applications
- Running Panel components with Pyodide
- Trade-offs between client-side and server-side execution
- Handling code execution, output capture, and state management

### Target Audience
Data scientists, educators, and developers interested in building interactive Python tools without server infrastructure. Basic familiarity with Python web frameworks is helpful but not required.

### Background
This work originated from my bachelor thesis exploring Python education. The resulting platform demonstrates that WebAssembly enables entirely new architectures for Python applications – shifting from traditional server models to fully client-side execution.

The takeaway: You can build sophisticated Python applications that run anywhere there&#x27;s a browser, with no backend server setup, no security concerns about arbitrary code execution, and no additional infrastructure costs.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/BFYYQG/</url>
            <location>Platinum [2nd Floor]</location>
            
            <attendee>Maris Nieuwenhuis</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>X3KQMQ@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-X3KQMQ</pentabarf:event-slug>
            <pentabarf:title>Before You Ship Your Agent: An Agent Builder’s Primer on Jailbreaking Attacks</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T165500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Before You Ship Your Agent: An Agent Builder’s Primer on Jailbreaking Attacks</summary>
            <description>AI agents are rapidly moving from demos and copilots into production systems that browse the web, call APIs, execute workflows, and take real‑world actions. As this transition happens, a critical truth is becoming unavoidable: any agent with meaningful capability will be attacked—and most are easy to break. Jailbreaking and prompt injection attacks are not theoretical research topics or rare edge cases; they are an inevitable outcome of deploying autonomous, instruction‑following systems in adversarial environments.

This talk is a practical, engineering‑focused primer on how AI agents fail under real‑world pressure, and what organizations must understand before shipping an agent into production. Rather than focusing on sensational examples or hypothetical risks, we will examine the concrete mechanisms that attackers use today, why they work, and why many popular defenses provide little real protection.

We begin with a clear, accessible overview of jailbreaking and prompt injection attacks. Attendees will learn how attackers manipulate model instructions, context windows, and tool‑calling behavior to override intended safeguards. We’ll cover both direct prompt injection (explicitly malicious instructions) and indirect prompt injection, where hostile content is embedded in webpages, documents, emails, or user‑generated data that agents are designed to consume. These attacks are especially dangerous because they exploit normal, expected behavior rather than software bugs.

From there, we’ll explore several recurring failure modes that appear across nearly all production agent architectures:

Excessive agency: Agents are often given broader permissions and autonomy than necessary, turning minor instruction hijacks into high‑impact incidents.
Prompt leakage: System prompts, policies, secrets, and internal instructions are frequently exposed or inferable, providing attackers with a roadmap for further exploitation.
Vector and embedding weaknesses: Retrieval‑augmented generation systems can be poisoned or manipulated, allowing malicious content to outrank trusted sources and influence agent decisions.
Tool and browser abuse: Agents that browse the web or execute actions are uniquely vulnerable to hostile environments intentionally crafted to manipulate them.
A key focus of the talk is why AI guardrails don’t work the way many teams expect. We’ll examine common approaches—prompt‑based restrictions, content filters, and policy‑layer defenses—and explain why they are brittle, bypassable, and often fail silently. Rather than stopping attacks, these mechanisms frequently create a false sense of security that masks deeper architectural risks.

We’ll also address a common misconception in the industry: “If these vulnerabilities are so serious, why haven’t we seen major AI security incidents yet?” The answer is not that systems are safe, but that most deployments are still constrained—limited autonomy, limited blast radius, and cautious rollout. As organizations move toward browser agents, long‑running autonomous workflows, and systems with real operational authority, the conditions that have so far prevented large‑scale incidents will disappear. When that happens, these attack classes will move from curiosity to crisis.

The final section of the talk focuses on what actually works. Instead of recommending yet another AI security product or guardrail framework, we will outline practical, proven steps organizations can take today, grounded in decades of security engineering experience:

Applying least privilege and minimizing agent capabilities
Isolating tools, credentials, and execution environments
Designing for failure and containment, not perfect prevention
Monitoring agent behavior for abuse patterns rather than policy violations
Performing threat modeling that treats prompts and context as untrusted input
Attendees will leave with a clear mental model of how AI agents are attacked, why these attacks succeed, and how to reduce risk without relying on ineffective silver bullets. This talk is intended for engineers, security practitioners, and technical leaders building or deploying AI agents who want to understand the real risks—and take responsible action—before putting these systems into production.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/X3KQMQ/</url>
            <location>Platinum [2nd Floor]</location>
            
            <attendee>Simonas Černiauskas</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>V7LQGR@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-V7LQGR</pentabarf:event-slug>
            <pentabarf:title>Don’t Let Imposter Syndrome Win: U Can Do Big Things from a Small Place, A 7-Year African AI Journey</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T173500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Don’t Let Imposter Syndrome Win: U Can Do Big Things from a Small Place, A 7-Year African AI Journey</summary>
            <description>Imposter syndrome affects engineers worldwide, but for underrepresented professionals, geographic location, limited access to resources, and systemic biases can amplify its impact. Many talented engineers outside major tech hubs face self-doubt, missed opportunities, and barriers to career growth, even when they have the skills and vision to make meaningful contributions globally. Recognizing and addressing these challenges is crucial for building inclusive and diverse tech ecosystems.

This topic is particularly relevant for early- to mid-career engineers, community builders, and professionals navigating global tech ecosystems. Many in these groups experience self-doubt, uncertainty about career paths, and difficulty gaining visibility and recognition. Understanding how to overcome these challenges can empower them to take bold steps, make an impact beyond their immediate environment, and thrive despite systemic limitations.

In this talk, I share actionable strategies that helped me overcome imposter syndrome and build influence from any location. Drawing on my 7-year journey as an African AI engineer, I highlight how leveraging community building, open source contributions, mentorship, media engagement, and strategic partnerships can create meaningful opportunities. I illustrate these strategies through real-world examples, including founding DataFestAfrica, growing AI and MLOps communities with limited funding in emerging regions and collaborating with global organizations. Attendees will leave with practical insights to overcome self-doubt, expand their reach, and make a global impact from wherever they are.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/V7LQGR/</url>
            <location>Platinum [2nd Floor]</location>
            
            <attendee>Gift  Ojeabulu</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>3XDMXS@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-3XDMXS</pentabarf:event-slug>
            <pentabarf:title>From Research Models to SLAs: Operationalizing TSFMs with Python</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T101500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>From Research Models to SLAs: Operationalizing TSFMs with Python</summary>
            <description>### Motivation

Time series foundation models promise rapid prototyping and strong performance across domains, but many teams struggle to move beyond notebooks and benchmarks. In practice, the hardest problems are not model accuracy or architecture, but integration, operability, and developer experience.

This talk addresses a common but under-discussed question:

How do you operationalize time series foundation models inside a large organization with real users, real constraints, and real SLAs?

### Case study context

The talk is based on hands-on experience building and operating Siemens KPI Forecast, a Python-based forecasting platform that exposes multiple TSFMs through stable APIs. The platform integrates:

- Chronos ([https://arxiv.org/abs/2308.16103](https://arxiv.org/abs/2403.07815))
- Lag-Llama ([https://arxiv.org/abs/2310.08268](https://arxiv.org/abs/2310.08278))
- TimesFM ([https://arxiv.org/abs/2310.10688](https://arxiv.org/pdf/2310.10688))
- GTT, a Siemens-developed large-scale time series model (https://arxiv.org/pdf/2402.07570.pdf)

Chronos, Lag-Llama, and TimesFM are open-source research models, while GTT is a proprietary Siemens model. The platform is designed to treat both open and closed-source models uniformly from a developer and user perspective.

### Topics covered

- Why TSFMs are easy to prototype but hard to operationalize
- Designing Python APIs that unify multiple foundation models
- Supporting zero-shot inference, fine-tuning jobs, and fine-tuned inference in one system
- Integrating open-source and proprietary models consistently
- Making forecasting services accessible to different user personas
- Challenges related to operating ML services in a B2B environment including monitoring, versioning, and governance considerations

### What attendees will learn

- How to structure Python services around foundation models
- How to avoid fragmentation when supporting multiple models and workflows
- Practical MLOps patterns for operating ML services beyond notebooks
- Lessons learned from running TSFMs at organizational scale

This session focuses on engineering and operational lessons that are broadly applicable to teams building Python-based ML platforms in both enterprise and open-source contexts. Model references are included for transparency; the talk focuses on system design and operational patterns rather than proprietary details.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/3XDMXS/</url>
            <location>Europium [3rd Floor]</location>
            
            <attendee>Jeyashree Krishnan</attendee>
            
            <attendee>Catarina Filipe</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>YZM8TA@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-YZM8TA</pentabarf:event-slug>
            <pentabarf:title>Demystifying Agentic AI Using Small Language Models</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T105500</dtstart>
            <dtend>20260415T112500</dtend>
            <duration>003000</duration>
            <summary>Demystifying Agentic AI Using Small Language Models</summary>
            <description>The Agentic Buzz  -  What’s Real, What’s Marketing

- The explosion of “agentic” frameworks and the confusion it causes
- What an agent really is at its core: planning, acting, and reasoning

Anatomy of an Agent

- The three basic functions: task decomposition, tool use, and code synthesis
- How frameworks like LangChain and Python make it easy to chain these together


Why Small Models Are Catching Up
- Review of research from NVIDIA and Georgia Tech
- Benchmarks showing SLMs matching or exceeding performance of larger LLMs
- Cost, latency, and deployability tradeoffs

Hands-On Demo: Building and Running an Agent on a Laptop

- Using LangChain and Python to orchestrate reasoning, tool calls, and code execution
- Example workflow: “Plan a dataset cleanup pipeline” using an SLM
- Observing resource use, latency, and performance in real time


Key Takeaways and Open Research Directions

- Opportunities for local and edge deployments
- The emerging role of SLMs in allowing everyone to experiment with agents
- Future questions: scaling reasoning vs. scaling models</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/YZM8TA/</url>
            <location>Europium [3rd Floor]</location>
            
            <attendee>Serhii Sokolenko</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>3BYLZU@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-3BYLZU</pentabarf:event-slug>
            <pentabarf:title>Building Secure Environments for CLI Code Agents</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T113500</dtstart>
            <dtend>20260415T120500</dtend>
            <duration>003000</duration>
            <summary>Building Secure Environments for CLI Code Agents</summary>
            <description>AI-powered code agents like Claude Code can autonomously edit files, run commands, and interact with your development environment. This power comes with risks: unrestricted filesystem access, exposed credentials, and unmonitored API usage. How do you harness this capability safely?

This talk presents a practical containerization approach for running CLI code agents in complete isolation from your host system. You&#x27;ll learn how to build secure environments that maintain persistent authentication, enable workspace access through volume mounts, and provide full API request logging, all while keeping the agent sandboxed.

I&#x27;ll demonstrate a production-ready setup using Docker containers that includes credential management, an API proxy for request logging and monitoring, and Datasette integration for analyzing API usage patterns. You&#x27;ll see how to structure volumes for security, implement network isolation, and maintain developer productivity while enforcing safety boundaries.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/3BYLZU/</url>
            <location>Europium [3rd Floor]</location>
            
            <attendee>Harald Nezbeda</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>MJTQEJ@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-MJTQEJ</pentabarf:event-slug>
            <pentabarf:title>Mastering the Hex: A Case Study in Reinforcement Learning for Strategy Games</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T142000</dtstart>
            <dtend>20260415T145000</dtend>
            <duration>003000</duration>
            <summary>Mastering the Hex: A Case Study in Reinforcement Learning for Strategy Games</summary>
            <description>### Context and Motivation

This talk emerged from a year-long journey that began with a simple curiosity: could I teach a computer to play strategy games by itself? It started as a college seminar project, but the topic was chosen purely out of personal interest in reinforcement learning and game AI — this was a hobby from the start. Rather than working with pre-built environments like CartPole or Atari games, the goal was to understand the entire pipeline—from implementing game mechanics to training a neural network that actually learns to win.

The game chosen was Antiyoy, a minimalist turn-based strategy game where players control territories on hexagonal grids, build units and structures, manage resources, and compete for dominance. While the game is simple enough to understand, it presents genuine strategic depth—exactly the kind of challenge that makes reinforcement learning both difficult and rewarding.

The talk walks through the complete development process, focusing not on implementation minutiae but on the fundamental questions and design decisions that anyone building similar systems would encounter. You won&#x27;t see walls of code or detailed mathematical derivations. Instead, you&#x27;ll hear about the thinking process, the challenges faced, and the solutions that emerged—all with the goal of demystifying what it actually takes to build a learning agent for complex games.

---

### What Will You Learn?

The talk is structured around three core challenges that define this kind of project, presented as questions that the work had to answer:

**How do you turn a game into something a neural network can understand?**

Strategy games aren&#x27;t naturally suited for machine learning. Antiyoy is played on hexagonal grids, uses discrete turn-based actions, and involves complex state information—territory ownership, unit positions, economic resources, and more. The talk explores how to bridge this gap: representing hexagonal coordinates in ways that computers can efficiently process, encoding complete game state into multi-channel observations similar to those used in AlphaZero, and designing observation spaces that preserve spatial relationships for convolutional networks. You&#x27;ll hear about the choice between different coordinate systems, the challenge of maintaining game history for temporal reasoning, and how to normalize diverse information types (positions, money, turn counts) into a coherent input for neural networks.

**How do you handle massive action spaces without overwhelming your AI?**

When your agent has more than 4,000 possible actions at any given moment—moving units to different positions, building various types of units and structures, or ending the turn—training becomes a serious challenge. Most of these actions are illegal at any given time, yet a naive approach would force the agent to learn this the hard way. The talk discusses how action masking solves this problem by dynamically filtering the action space to only legal moves, dramatically improving learning efficiency. You&#x27;ll understand why this technique is crucial for games with complex rules and how it fundamentally changes the training dynamics compared to environments where every action is always available.

**How do you design rewards that actually teach strategy?**

Perhaps the most subtle challenge in reinforcement learning is reward design. Give an agent only a +1 for winning and -1 for losing, and it may take forever to figure out what behaviors lead to victory. But add too many intermediate rewards, and you risk the agent exploiting shortcuts rather than learning genuine strategy. The talk shares the experimentation process: starting with sparse rewards as a baseline, carefully introducing intermediate signals for meaningful actions like territory expansion and economic development, and ultimately landing on a reward structure that accelerated learning while still encouraging strategic play. You&#x27;ll see how reward shaping influenced training speed and final performance, and learn to think about reward design as a crucial part of the development process rather than an afterthought.

---

### What Are the Results and Takeaways?

After training over several thousand episodes—which took about eight hours on a consumer-grade GPU—the agent learned to win approximately nine out of ten games against a baseline random opponent. To be precise about what that means: the baseline picks uniformly from legal moves, so the bar is not high. The trained agent makes progress through a game, expands territory, and occasionally does things that look like they could be intentional. It also makes plenty of moves that defy easy explanation. &quot;Strategy&quot; might be a generous word; &quot;learned to flail more purposefully&quot; is closer to the truth.

The talk concludes by reflecting on what worked well and what proved unexpectedly difficult—and what is still unresolved. Action masking emerged as perhaps the single most impactful technique for managing complexity. The choice of observation space design—borrowing ideas from AlphaZero&#x27;s approach to representing board games—turned out to be well-suited for the problem. Training infrastructure using MLflow provided invaluable insight into the learning process and made experimentation much more manageable. On the challenging side: reward design required multiple iterations and still produced an agent that plays competently but not strategically. The gap between &quot;beats random&quot; and &quot;actually plays well&quot; is humbling and, it turns out, enormous.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/MJTQEJ/</url>
            <location>Europium [3rd Floor]</location>
            
            <attendee>Simon Hedrich</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>JBFGCA@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-JBFGCA</pentabarf:event-slug>
            <pentabarf:title>Building Agentic Systems with Python, LangGraph, MCP, and A2A</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T150000</dtstart>
            <dtend></dtend>
            <duration>004500</duration>
            <summary>Building Agentic Systems with Python, LangGraph, MCP, and A2A</summary>
            <description># What we are going to show

- A live demo of a Python-based multi-agent system that retrieves, aggregates, and evaluates company information in real time.
- The overall solution architecture: how LangGraph, MCP, A2A, and custom Python components fit together.
- Key implementation lessons from building the system, covering both technical and business challenges.

# What problem is our talk addressing

AI analysis depends heavily on data. When systems cannot rely on pre-collected or curated datasets, developers must find, collect, and validate data of sufficient quality.

At the same time, emerging technologies such as MCP, A2A, and LangGraph are evolving quickly, with limited documentation, occasional breaking changes, and examples that rarely scale beyond minimal tutorials. Applying these tools to real-world Python applications introduces challenges in design, orchestration, versioning, and error handling that are not yet widely discussed.

# Why is the problem relevant to the audience

Many Python developers and data practitioners will soon need to build systems that combine LLMs, external data sources, and multi-agent logic, without relying on static datasets. This talk provides practical guidance for designing such systems using open-source Python tooling.

The presented solution is designed with a modular, scalable component approach. MCP and A2A protocols facilitate the connection between AI-related solutions, and this design demonstrates re-usable patterns for implementation.

By sharing our approach, design choices, and implementation pitfalls, the talk equips attendees to anticipate challenges early, evaluate whether MCP/A2A are appropriate for their own projects, and build more robust agentic systems.

# What is our solution to the problem

Our solution has split responsibilities in several blocks, though the overall idea is to present with code examples a Python system that combines LangGraph, A2A and MCP:

- Data access via MCP servers
MCP servers retrieve data from multiple sources (e.g., LinkedIn APIs, web scraping endpoints, Perplexity research). Using MCP makes it easy to plug in new data sources and manage them consistently. We demonstrate how to build and connect MCP servers in Python.
- Data processing via LangGraph agents
A set of agents implemented in LangGraph handle tasks such as coordinating the workflow, collecting company data, calculating evaluation scores, and validating results. These agents operate in a hub-and-spoke pattern centered around a coordinator agent. We show how this is implemented in Python using LangGraph.
- Inter-agent communication via A2A
Agents exposes capability “cards,” which the coordinator aggregates into a registry. An intent-detection step determines which agents should be invoked to answer a user&#x27;s request. We demonstrate how A2A can be applied in Python to orchestrate agents effectively.
- Data validation agent
A dedicated validation agent checks retrieved data against defined rules to ensure quality. While no internet-sourced data is perfect, this approach significantly increases reliability. We show how validation logic is implemented within the LangGraph flow.
- Scalability through configuration and deployment
A centralized configuration file and simple Docker-based deployment make the system easy to scale and adapt. We explain how environment variables and shared configuration patterns can coordinate the various Python components.

# What are the main takeaways from our talk

Attendees will learn:

- How to design and implement a practical multi-agent architecture using Python, LangGraph, MCP, and A2A.
- How to acquire and validate external data dynamically without relying on curated datasets.
- Common pitfalls and lessons from using MCP and A2A in larger-scale systems.
- How to structure agent roles, orchestration flows, and validation strategies for scalable, extendable AI systems.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk (long)</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/JBFGCA/</url>
            <location>Europium [3rd Floor]</location>
            
            <attendee>Holger Nösekabel</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>XGL37G@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-XGL37G</pentabarf:event-slug>
            <pentabarf:title>Tidy Finance in Practice: How Explicit Assumptions Avoid Bad Investment Strategies</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T161500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Tidy Finance in Practice: How Explicit Assumptions Avoid Bad Investment Strategies</summary>
            <description>Many investment strategies look great because they performed well in the past. However, it is often unclear why they work or whether they would still work in the future. Strong backtest results are frequently driven by hidden assumptions, unclear data handling, or unrealistic rules rather than real skill or insight.

In this talk, I show how Tidy Finance principles help people better understand what is actually happening inside a financial backtest. Tidy Finance has become a popular open-source teaching and learning platform for empirical financial research. Its core idea is simple: financial analyses should be built from clear, well-structured data that makes assumptions easy to see and results easy to reproduce. 

Using explicit examples from Tidy Finance with Python during the talk, I go through a real backtesting workflow and show how it changes when assumptions are written down clearly instead of being hidden inside the code. I demonstrate how small, often overlooked choices can have a large impact on results, and how these effects become visible when the analysis is structured cleanly. The focus is on learning how to read and question backtests, not on presenting new models or strategies.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/XGL37G/</url>
            <location>Europium [3rd Floor]</location>
            
            <attendee>Christoph Frey</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>8YTYEN@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-8YTYEN</pentabarf:event-slug>
            <pentabarf:title>Octopus AutoML: Extracting Signal from Small and High-Dimensional Data</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T165500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Octopus AutoML: Extracting Signal from Small and High-Dimensional Data</summary>
            <description>Many machine learning tools are based on the quiet assumption that data is plentiful, independent, and identically distributed, and that a random training/testing split, plus a little cross-validation, is “good enough”. In application-driven domains such as pharmaceutical development and industrial materials science, however, this is often not the case. Synthesizing a new compound can take months and early phase clinical trials are small, so we often work with fewer than 1,000 samples and several thousands of features. In this context, standard AutoML practice can be dangerously optimistic.

On small datasets, performance can vary significantly depending on the random seed used for splitting the data. Working with a single split exposes us to this randomness: with an unlucky seed we might prematurely abandon promising experiments, while a particularly favorable seed can lead to overestimating the true performance. Another major risk is data leakage, such as performing feature selection before splitting the data, or distributing correlated samples (e.g., repeated measurements from the same patient or material batch) across both training and test sets. Such leakage inflates evaluation metrics and produces models that fail to generalize to new data.

Octopus is an open-source Python AutoML library designed specifically for small and high-dimensional datasets. Its core idea is simple: make statistically honest evaluation the default. Octopus enforces strict nested cross-validation, with an inner loop for model and hyperparameter selection and an outer loop that provides generalization performance estimates. Thanks to this nested setup, users also obtain an estimate of how much performance varies across multiple data splits; low variation increases trust in the reported results. Furthermore, because Octopus handles the entire data-splitting process and is carefully designed to avoid information leakage, the reported metrics are far less likely to be inflated.

Our library provides a robust drop-in replacement for existing machine learning workflows, ensuring a principled implementation of nested cross-validation while leveraging advanced machine learning techniques in the background. Adopting a modular architecture, the library offers a dedicated, internally developed ML module, seamless integration of several feature selection methods (e.g., MRMR, Boruta), and support for external ML solutions such as AutoGluon. This modular design makes Octopus a powerful platform for benchmarking different methods and solutions on specific datasets and use cases, helping users systematically compare and select the most suitable approach for their problem

Octopus also supports time-to-event (survival) problems, which are common healthcare (e.g. time to progression or death) and in materials science (e.g. time to failure or degradation). Survival models are evaluated using appropriate metrics within the same nested cross-validation framework. 

This talk will demonstrate, using realistic small-scale datasets, how standard AutoML pipelines can report deceptively strong performance and how these metrics change when proper nested cross-validation and domain-aware splits are applied. Attendees will learn where typical mistakes originate and how Octopus establishes practical safeguards against them. The goal is straightforward: to produce better models and more reliable conclusions when data are scarce and every sample matters.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/8YTYEN/</url>
            <location>Europium [3rd Floor]</location>
            
            <attendee>Nils Haase</attendee>
            
            <attendee>Andreas Wurl</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>K9LCNQ@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-K9LCNQ</pentabarf:event-slug>
            <pentabarf:title>Heat: scaling the Python scientific stack to HPC systems</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T173500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Heat: scaling the Python scientific stack to HPC systems</summary>
            <description>**Memory bottleneck in scientific computing (4 minutes)**
 - Limitations of single-node libraries
 - Complexity of existing workarounds: trade-offs between manual MPI programming (high developer effort) and task-parallel frameworks 
 - The data-parallel alternative: performing uniform operations on distributed slices of a global tensor.

**Architecture and implementation (8 minutes)**
- The DNDarray structure: Technical breakdown of the distributed n-dimensional array, which provides a global logical view while managing local physical storage across MPI ranks.
- The split axis concept: How data is partitioned along specific dimensions (e.g., rows or columns) to optimize communication for different mathematical operations.
- Backend synergy: 
  - PyTorch as the compute engine for high-performance local tensor operations and GPU acceleration.
  - mpi4py for communication in cluster environments.
- Hardware interoperability: Transparent execution across CPUs and GPUs, including NVIDIA (CUDA) and AMD (ROCm) accelerators.

**Algorithmic building blocks for distributed memory (8 minutes)**
- Communication-aware linear algebra: Distributed matrix-matrix multiplication and its communication costs. Advanced matrix decomposition methods, such as hierarchical and randomized SVD (hSVD), for massive datasets.
- Scalable machine learning and statistics: Example: clustering (K-Means) and Principal Component Analysis (PCA) on distributed arrays.
- Temporal analysis using Dynamic Mode Decomposition (DMD) on large-scale scientific data like global wind speeds.

**Performance and scaling efficiency (7 minutes)**
- Scaling methodologies: strong scaling (speedup for a fixed problem size) and weak scaling (efficiency as both problem size and resources grow).
- Memory wall removal: Utilizing the cumulative RAM of many cluster nodes to process datasets that are otherwise impossible to load.
- Case studies: Reviewing performance results from large-scale runs

**Summary and project roadmap (3 minutes)**
- Key takeaways
- Upcoming features
- Open-source community</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/K9LCNQ/</url>
            <location>Europium [3rd Floor]</location>
            
            <attendee>Claudia Comito</attendee>
            
            <attendee>Thomas Saupe</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>APWGQB@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-APWGQB</pentabarf:event-slug>
            <pentabarf:title>From Pixel to Payouts: A Multi-Agent System for Real-Time Insurance Claims Processing</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T101500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>From Pixel to Payouts: A Multi-Agent System for Real-Time Insurance Claims Processing</summary>
            <description>**Project Goal and Business Impact**
Imagine filing an auto insurance claim. Instead of waiting days for a damage evaluation, photograph the car with your phone and, within minutes, receive a detailed assessment.
The primary objective of this project is to drastically improve the efficiency and objectivity of the initial auto insurance claim process. Current methods rely heavily on human adjusters and manual estimates, resulting in delays and potential cost inflation. By deploying a sophisticated Multi-Agent System, the aim is to provide a fastly, data-driven assessment that benefits both the insurer and the customer.
**The Multi-Agent Architecture**
At the heart of this solution, there is an orchestrated system of specialized AI agents, each with a distinct role. The architecture follows a sketch where an Orchestrator Agent works as the brain, creating execution plans, managing agent lifecycle, coordinating the execution, and aggregating results into coherent outputs.
The Vision Agent, powered by OpenAI GPT-5.2, acts as the system&#x27;s eyes. It analyzes uploaded damage photos with technical precision, identifying specific damaged parts (bumpers, panels, headlights, etc.), classifying severity levels (minor, moderate, severe), categorizing damage types (collision, scratch, dent, paint damage), and generating detailed technical assessments. 
Two specialized Cost Estimation Agents run, representing different repair philosophies. The OEM (Original Equipment Manufacturer) Agent focuses on premium repairs using manufacturer-certified parts from authorized dealers, while the Aftermarket Agent explores cost-effective alternatives using quality certified aftermarket parts from independent shops. Both agents are powered by Perplexity&#x27;s sonar-pro model, which provides access to current market data and pricing information. 
The Shop Finder Agent searches for repair facilities near the user&#x27;s location, provides contact information, ratings, and availability, and adapts its search strategy based on the information retrieved.
**Technical Highlights**
The system is built in Python, leveraging several key technologies. The Gradio/Streamlit framework provides an intuitive web interface for image upload, location input, and real-time results display. OpenAI&#x27;s GPT-5.2 handles computer vision tasks. Perplexity&#x27;s sonar-pro model accesses current market data for repair costs and local business information. 
A sophisticated state management system provides each agent with memory of past interactions, confidence scores to assess decision quality, performance tracking to optimize the system, and context-aware autonomous decision-making. 
At the core of each agent&#x27;s execution is the ReAct loop: a Reasoning, Action, Observation cycle. Each agent doesn&#x27;t just call an API and return a result; it first records a thought explaining why it&#x27;s taking an action, executes the action, and then logs its observations. This trace is accumulated across all agents and surfaced in the UI as a collapsible reasoning log, making every decision in the pipeline fully auditable and transparent.
**Generative AI vs. Manual/Traditional Tools**
While traditional automated tools rely on rigid, rule-based computer vision and static databases, this Multi-Agent System introduces a modular reasoning layer that bridges the gap between raw data and decision-making. According to the industry research from McKinsey (2025) the agentic workflows reduce claim cycle times from days to seconds with consistency in claim evaluations.
Traditional tools are often &quot;black boxes&quot; or monolithic scripts, instead this modular architecture give the opportunity to develop in the future every task as a swappable module for an hybrid framework where every single agent can be replaced by a non Generative AI tool, for flexible, custom and scalable solution.  
**The Future of Insurance Claims**
This multi-agent architecture is a robust, scalable blueprint for automating complex decision-making business processes, such as insurance claims. It leverages the strengths of several large language models (LLMs) and specialized agents to deliver a fast, transparent, and comprehensive output that far exceeds the capabilities of a single model. The project demonstrates practical, real-world applications of multi-agent systems in production environments.

**Links:**
[Article](https://medium.com/@c.giancaterino/from-pixel-to-payouts-a-multi-agent-system-for-real-time-insurance-claims-processing-d647298c4eb8)
[Hugging Face App](https://huggingface.co/spaces/towardsinnovationlab/AI_Car_Damage_Evaluation)
[Repository](https://github.com/claudio1975/AI_Car_Damage_Evaluation)</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/APWGQB/</url>
            <location>Palladium [2nd Floor]</location>
            
            <attendee>Claudio Giorgio Giancaterino</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>S7KYEE@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-S7KYEE</pentabarf:event-slug>
            <pentabarf:title>No, you can&#x27;t &#x27;eval&#x27; your way to fairness</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T105500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>No, you can&#x27;t &#x27;eval&#x27; your way to fairness</summary>
            <description>**Cold open**
Fairness is fundamentally not tractable to classic optimisation techniques.

**The exposition**
Fairness is not a state of the world, it&#x27;s an experience of it. No technology is fair in a vacuum. Fairness can only be understood when a technical system collides with humans in the world. It is felt as much as it is calculated. We can look at statistical results in aggregate to understand patterns, but these do not tell the story of the individual.

Further, attempting to optimise numerical fairness metrics is fundamentally coercive and technocratic: putting our thumb on the scale globally, injecting &quot;positive bias&quot; into single dimensions, framing fairness as a data problem rather than a problem of human dignity. It&#x27;s a &quot;one metric to rule them all&quot; approach that fails to acknowledge differences in preference, culture, experience. To build systems that support human agency we must first abandon our idea of a single moral machine which consistently outputs correct answers from inputs and algorithms. Any system treating people as fungible or undifferentiated is structurally unfair.

What might consent-based fairness look like instead? Asking &quot;Do you want extra help?&quot;, making sure individual preferences and self-reported disadvantage can add a layer of human respect into the equation. But we rarely see even this. Instead we see universalist design that decides what&#x27;s good for people without consulting them - the same pattern that Design Justice critiques as erasing those who experience intersectional disadvantage.

What does this have to do with evals? We&#x27;re seeing a wave of off-the-shelf libraries measuring bad behaviours in LLM outputs, often simplifications of older fairness metrics. And yes, they can catch obvious failure modes like slurs in outputs. But this is one failure mode among many. Installing a library and calling the job done is fairness washing. The harder, more fruitful approach is to explore the space of failure modes, consider what an ideal world would look like, and design measures, mitigations, and feedback loops accordingly. It also means grappling with the fact that we cannot avoid doing harm. What we can do is harm reduction, humility, and striving toward something better while acknowledging the impossibility of the task.

**Third act**
This talk won&#x27;t offer easy answers. Attend if you want to grapple with the gnarly problems of building systems for humans. We&#x27;ll borrow ideas from Design Justice and the disability rights movements: nothing about us without us. Let&#x27;s ask and answer better questions. You&#x27;ll leave with sharper mental models and tools for the next tricky conversation at work.

**Outline (30 minutes):**
The problem (10 min): 
- Fairness as experience, not state. 
- Why optimisation fails. 
- The individual vs the aggregate. 
- Why treating people as fungible is structurally unfair.

The critique (10 min): 
- Off-the-shelf fairness evals as fairness washing. 
- The temptation to install a library and call it done. 
- What these tools can and cannot catch without further analysis.

The alternative (10 min)
- Borrowing from Design Justice and disability rights. 
- Exploring failure modes rather than optimising metrics. 
- Harm reduction over false perfection. 
- Transparency, explanation, empowerment.

**What you&#x27;ll take home**
You&#x27;ll leave with sharper mental models for thinking about fairness in technical systems, frameworks borrowed from Design Justice and disability rights movements, and tools for the next tricky conversation at work about what fairness actually means. There are no easy answers here, but there are better questions.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/S7KYEE/</url>
            <location>Palladium [2nd Floor]</location>
            
            <attendee>Laura Summers</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>RNT9FV@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-RNT9FV</pentabarf:event-slug>
            <pentabarf:title>PyTorch and CPU-GPU Synchronizations</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T142000</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>PyTorch and CPU-GPU Synchronizations</summary>
            <description>PyTorch gets its speed from asynchronous execution: the CPU launches operations quickly while the GPU executes them later. CPU–GPU (host-device) synchronizations break this pipeline by blocking the host until the GPU reaches a specific point. The result is often counterintuitive: even if kernels are fast, the GPU develops idle gaps, throughput drops, and latency rises because the CPU can no longer run ahead and keep the GPU fed with work.

This talk builds intuition with a minimal loop that alternates a slow GPU operation with a quick “bookkeeping” operation, a pattern that resembles many inference and training pipelines. By adding a seemingly harmless action—such as printing a CUDA tensor—we’ll see how easily a synchronization can be introduced and why the slowdown can be disproportionate to what the code appears to do.

We’ll then walk through a practical profiling workflow in NVIDIA Nsight Systems. The key technique is to correlate GPU utilization gaps with long CPU-side CUDA API calls (for example cudaStreamSynchronize) that indicate the host thread is waiting. Comparing a healthy trace to a sync-heavy trace makes it clear where the pipeline stalls and which code region triggers it.

Beyond the usual suspects (.item(), printing device tensors, explicit device transfers), the talk highlights dynamic shapes as a common synchronization trigger. Patterns like boolean indexing with a GPU mask or slicing with a GPU-resident index can force PyTorch to fetch information back to the CPU to determine output sizes and allocations. We’ll discuss how to recognize these cases and how to restructure code toward shape-stable alternatives when possible.

Finally, we’ll cover how to prevent regressions. Instead of relying on profiling alone, we’ll use PyTorch’s experimental API `torch.cuda.set_sync_debug_mode()` in unit tests to surface synchronizations early, while keeping production code unchanged. We’ll close with guidance on when a small Triton kernel is worth considering to avoid sync-inducing patterns and to fuse multiple small ops into a single, fully asynchronous kernel.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/RNT9FV/</url>
            <location>Palladium [2nd Floor]</location>
            
            <attendee>Tomas Ruiz</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>HP7DLX@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-HP7DLX</pentabarf:event-slug>
            <pentabarf:title>Beyond Kafka and S3: Python Data Pipelines with HTTP-Native Bytestreams</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T150000</dtstart>
            <dtend></dtend>
            <duration>004500</duration>
            <summary>Beyond Kafka and S3: Python Data Pipelines with HTTP-Native Bytestreams</summary>
            <description>**TL;DR** *Streaming data between systems — whether across organizations, from secured environments, isolated networks, or even home setups — remains a common challenge in modern data engineering and data sharing workflows. This talk introduces the ZebraStream Protocol: an open, HTTP-based bytestream protocol designed specifically for decoupled systems, where both sides act as clients — no server hosting, no exposed endpoints.*

### Talk Outline (45 minutes)

**Opening — The Shape of the Solution (3 min)**

The talk opens with a UNIX pipe: opaque, minimal, composable. Any program that reads from stdin and writes to stdout already fits — no negotiation, no shared infrastructure. Two real-world use cases introduce the challenge: a supplier pushing inventory to a buyer&#x27;s pipeline, and a hospital sharing trial data with a contract research organization. The question the talk sets out to answer: can the pipe&#x27;s properties work across organizational boundaries, over HTTP?

**Part 1 — Why the Problem Is Hard (8 min)**

Sharing data across organizational boundaries requires sharing infrastructure, trust, protocol, and format. Every crossing is a negotiation, and the cost is ongoing. The coupling spectrum — from function calls to cross-org transfers — sets up a precise vocabulary for what &quot;strong decoupling&quot; actually means. A well-composed protocol owns only transport and access, leaving structure and format to the caller.

**Part 2 — What Already Exists (4 min)**

Kafka, S3, and HTTP APIs each fail at strong decoupling in a specific and diagnosable way. Kafka requires the other side to adopt a platform. S3 is a storage abstraction, not a transfer abstraction — no presence signal, no cleanup. An HTTP API permanently makes one side a server. Reading each failure as a requirement, a named pipe already satisfies all three — within a machine. The open question: can this work over HTTP?

**Part 3 — The ZebraStream Protocol (5 min)**

The basic protocol and its Data API are revealed: a bytestream channel over HTTP where both sides are clients. A stateless relay sits in the middle — exclusive channel, HTTPS outbound only, separate read and write tokens. The difference between a message and a bytestream is made precise: no opinions on size, structure, or format. A raw HTTP example using `requests` shows the Data API in full — producer streams a generator over PUT, consumer reads a streaming GET response.

**Part 4 — Presence and Coordination (5 min)**

HTTP connects immediately, without knowing whether the other side is there. Two failure modes show the consequence: a consumer holding a silent GET with no way to tell if the producer is slow or absent; a producer writing into a PUT with no signal that nobody is reading. The Connect API resolves this with an explicit waiting room — the first client waits, the second triggers the transfer. Push and pull are runtime choices, not architectural ones: whoever arrives first waits.

**Demo 1 — Push and Pull** (3 min): the supplier/buyer inventory use case, both modes shown live; the rendezvous is the point.

**Part 5 — Python Integration (8 min)**

`zebrastream-io` implements `io.IOBase`. Any library that accepts a file — pandas, loguru, tarfile, csv, pickle — works immediately, with no changes to existing code. Because there is no intermediate file, the producer&#x27;s write and the consumer&#x27;s read are the same operation: an early disconnect on either side raises immediately. No silent failures, no orphaned files, no copy cascades.

**Demo 2 — Log Streaming** (5 min, notebook): two lines added to a loguru producer; the consumer is the ZebraStream CLI. The application logs normally — transport is invisible.

**Part 6 — Design Decisions and Security (5 min)**

Three deliberate choices — HTTP, bytestream, stateless relay — are named alongside what each costs. The security model follows from the relay design: TLS and scoped tokens require trusting the relay; end-to-end encryption does not. The relay moves ciphertext and has no key. Per-chunk encryption keeps live streams encrypted without buffering the full payload. The hospital/CRO use case from the opening gets its resolution: pull mode, on-demand EHR query, one extra argument — the relay operator sees nothing.

**Closing — Open Protocol (1 min)**

The protocol specification is open and community-focused. The Python client is open source. ZebraStream.io is the managed relay and protocol sponsor. The talk closes where it opened: opaque, minimal, composable — across organizational boundaries.

**Q&amp;A (5–10 min)**</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk (long)</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/HP7DLX/</url>
            <location>Palladium [2nd Floor]</location>
            
            <attendee>Johannes Dröge</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>EL7X8C@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-EL7X8C</pentabarf:event-slug>
            <pentabarf:title>Hierarchical Models in MMM: Can Structure beat data size?</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T161500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Hierarchical Models in MMM: Can Structure beat data size?</summary>
            <description>## What we are going to show

- Country-specific marketing Data that is, unfortunately, never good.
- Function and Python transform like Adstock and saturation (with the tests so that you can see it in action).
- Differentiation between pooled, unpooled, and partial pooling.
- Meaningful diagnostics.
- Wins and losses of hierarchical modeling.

## Why this is interesting and relevant

How do you model marketing effectiveness when you only have 12 months of data per country, some channels are interrupted for weeks, and your manager wants reliable ROAS estimates yesterday?
Most teams think: &quot;We need more data.&quot; But getting more data takes time, costs money, and sometimes isn&#x27;t even possible (or the quality is bad).

What if you could get better estimates by changing how you model the problem?
This is where hierarchical modeling and partial pooling come in. Instead of treating each market as separate (unpooled) or pretending they&#x27;re all identical (pooled), we let markets share information through partial pooling. Countries with thin data borrow strength from the group, while markets with strong signals pull away from the mean. You get stability where you need it and flexibility where the data supports it. We show this end-to-end in Python: from building testable transform functions (Adstock, saturation curves, lag effects) to assembling three different model architectures in PyMC, to evaluating which one gives you calibrated intervals and stable ROAS estimates. You&#x27;ll see the good, the bad, and the ugly.

## Main challenges
- Making transforms reusable and testable_ Marketing transformations like adstock and saturation are usually hidden in modeling code. It is generally very difficult to imagine how they look, how they change the data. We pull them out as pure Python functions with clear signatures, unit tests (pytest), and property-based checks (hypothesis). This makes them composable, debuggable, and easy to understand and even improve.
- Building fair model comparisons: We construct pooled, unpooled, and hierarchical models with identical priors where appropriate so the comparison isolates the effect of structure, not prior choice. We walk through the PyMC code, show how partial pooling works mathematically, and run short MCMC chains that still demonstrate the key differences.
We go beyond &quot;we reached 90% R2&quot; to actual decision metrics:
     * Posterior predictive checks: Does the model generate realistic data?
     * ROAS stability: how much do channel estimates vary across groups?

We use ArviZ throughout to visualize traces, compare models, and compute these metrics. You&#x27;ll see exactly when hierarchical structure pays off and when it doesn&#x27;t.

## Practical lessons and the repo

**We share what we learned building this:**
- Data checks and control using Pydantic, so you catch errors before MCMC runs for hours
- Test your transforms independently: Yes, for unit tests!
- Use synthetic data with known ground truth to validate the whole pipeline
- Calibration metrics matter more than posterior predictive RMSE alone

**The repo will include:**
- Typed transform functions (Adstock, saturation, lag) with unit tests
- Three PyMC models with matching priors
- ArviZ evaluation scripts (calibration, PPC)
- A Typer CLI to run everything on a predefined CSV

**When hierarchical lose (and what to do about it):**
Partial pooling isn&#x27;t magic. If your groups are genuinely wildly different and you have almost no data per group, hierarchical models can still produce overconfident nonsense. We show a scenario where this happens and discuss alternatives: stronger priors, splitting the hierarchy, or just admitting you don&#x27;t have enough signal.
The takeaway: structure beats volume in the right conditions. We help you recognize those conditions and build models that respect them.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/EL7X8C/</url>
            <location>Palladium [2nd Floor]</location>
            
            <attendee>Mohamed Amine Jebari</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>B8GQ9Z@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-B8GQ9Z</pentabarf:event-slug>
            <pentabarf:title>Metashade: Compilerless Immediate-Mode Shader Generation in Pure Python</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T165500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Metashade: Compilerless Immediate-Mode Shader Generation in Pure Python</summary>
            <description>The area of shader programming offers many tough problems to solve. The range of target platforms is vast: from CPU path-tracers to mobile GPUs - served by a zoo of incompatible languages: from GLSL to HLSL, from OSL to WGSL.

 Common challenges include portability, managing specializations, and a lack of abstraction mechanisms. The solutions for these include the archaic C Preprocessor, templates/generics, visual graph frameworks, transpilers and, finally, embedded domain-specific languages (EDSLs).

Python is an ideal host for Embedded Domain-Specific Languages (EDSLs). Warp, Taichi, Numba, and Triton evolved to target GPU compute. All of them share common architectural decisions. They capture the program&#x27;s logic by inspecting the Python source code, generate an internal representation and compile that IR to the target format.

The above approach comes with significant disadvantages. Only a subset of Python is supported, debugging with standard tools is impossible, integration with external Python code is limited, metaprogramming requires special syntax, and heavy compiler infrastructure needs to be implemented in a language like C++.

This talk proposes an alternative architecture. Instead of introspection, we capture the program&#x27;s logic by tracing execution with proxy objects at Python runtime, similar to JAX and PyTorch. Instead of building an IR, we emit target code eagerly, line-by-line, similar to how PyTorch Eager Mode launches computations. And because we don&#x27;t implement a compiler, the implementation remains 100% Python.

We discuss in detail how core elements of Python syntax can be overloaded to implement such an architecture:
* Operator overloading to capture expressions.
* Context managers to simulate C-like scopes.
* `__setattr__`/`__getattr__` to capture variable names.
* Function decorators to capture function signatures.

Attendees will leave with a toolbox of Python mataprogramming patterns empowering them to write a code generator in Python without having to implement a compiler.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/B8GQ9Z/</url>
            <location>Palladium [2nd Floor]</location>
            
            <attendee>Pavlo Penenko</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>HRFYVS@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-HRFYVS</pentabarf:event-slug>
            <pentabarf:title>AI Is Changing the Game: Building Modular, AI-Ready Platforms on Top of Legacy Systems</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T173500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>AI Is Changing the Game: Building Modular, AI-Ready Platforms on Top of Legacy Systems</summary>
            <description># AI Is Changing the Game: Building Modular, AI-Ready Platforms on Top of Legacy Systems

AI is no longer a future topic—it is actively reshaping expectations inside organizations. Domain and business teams can now prototype new rules, validations, and analytical logic themselves, often within days. While this accelerates innovation, it puts enormous pressure on existing IT architectures, especially in environments dominated by legacy systems and monolithic platforms.

This talk explores how software architecture must evolve to absorb this pressure instead of breaking under it.

Rather than embedding AI capabilities directly into legacy systems, the presented approach introduces a modular, AI-ready platform built around independent, stateless apps orchestrated by a central control layer. These apps can represent classical reporting logic, risk calculations, or AI agents, all treated as first-class architectural components.

The talk is highly relevant for the **PyCon track “Programming, Software Engineering &amp; Testing”**, because it demonstrates how to design, orchestrate, and integrate AI-driven workflows in complex Python-based platforms. The central control layer, implemented using Python and optionally Django, provides workflow orchestration, security, tenant management, and self-service registration of new components. This allows domain teams to deploy AI agents or agents written with the help of AI within days, while IT retains governance, auditability, and operational stability.

By showing how AI-driven pressure can be turned into an architectural advantage, the talk provides patterns and practical lessons that apply far beyond finance, making it relevant for any domain dealing with legacy systems, modular design, and AI integration.


## Architectural Concepts Covered

The talk introduces the key architectural principles behind the platform:

- **Independent, stateless apps** that declare their data needs and outputs but remain unaware of infrastructure, environments, or other apps  
- **Strict separation of concerns** between domain logic, orchestration, persistence, and presentation  
- **Technology-indifferent design**, allowing apps to run on different databases, reporting tools, or compute backends  
- **Parallel and distributed execution** as a default, not an optimization  

This architecture allows legacy systems to coexist with modern components instead of blocking innovation.

## The Control Layer as an Enabler for AI

A central part of the talk is the control layer that orchestrates all components. Implemented using Python and optionally Django, this layer is responsible for:

- workflow orchestration and dependency management  
- authentication, authorization, and tenant isolation  
- self-service registration of apps and AI agents  
- resource allocation, monitoring, and auditability  

Django is not used as a traditional CRUD backend, but as governance infrastructure: providing APIs, admin and self-service portals, and security mechanisms that allow fast innovation without losing control.

## Example: Integrating an AI Agent into a Regulated Platform

A concrete example demonstrates the architecture in action: integrating an AI agent for e.g. anomaly detection in regulatory reporting.

The example walks through:

- developing the agent as an independent, containerized app  
- registering it via standardized APIs  
- declaring required data and produced results  
- orchestrating it within existing workflows  
- testing, monitoring, and scaling it without touching legacy systems  

This shows how new AI capabilities can be deployed within days while maintaining stability and compliance.

## Why This Matters Beyond Finance

While the example comes from regulatory reporting, the patterns discussed apply to many domains facing similar challenges: data-heavy systems, long-lived platforms, and increasing pressure to integrate AI safely.

The talk concludes with lessons learned and architectural patterns that help future-proof systems as AI continues to raise the bar for flexibility, speed, and modularity.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/HRFYVS/</url>
            <location>Palladium [2nd Floor]</location>
            
            <attendee>Werner Gothein</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>DDVW3W@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-DDVW3W</pentabarf:event-slug>
            <pentabarf:title>Getting Career Clarity in Uncertain Times</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T101500</dtstart>
            <dtend>20260415T114500</dtend>
            <duration>013000</duration>
            <summary>Getting Career Clarity in Uncertain Times</summary>
            <description>The data &amp; AI field is evolving faster than ever. New tools, new roles, and constant “next big things” can make even experienced professionals feel unsure about where they are heading, and how to make intentional career decisions in the middle of all this change.

You might be doing well, feeling comfortable. Interesting work, steady progress, recognition.
And still, there’s that question in the background: Where is this actually going?

A lot of career advice in tech assumes there is a clear path to follow. In reality, most data &amp; AI careers don’t work that way. Roles shift, organisations change, and what used to feel like a logical next step often isn’t anymore.

In this workshop, we’ll slow things down and focus on direction rather than decisions. The goal is not to figure out “the next job”, but to get clearer on the kind of work you want to do.

This is a practical, hands-on session.

We’ll use exercises such as odyssey planning to explore a few possible future paths you could take from here. 

You’ll work through:
    •    Different ways your career could evolve
    •    The trade-offs each direction comes with
    •    What feels worth exploring further, and what doesn’t

And, just as importantly, how you feel about these plausible futures.

You’ll leave with a clearer sense of what matters to you now, and a stronger sense of direction

By the end of the session, you will:
    •    Have more clarity about the kind of work and influence you want going forward
    •    See more than one possible future, instead of feeling stuck with a single “right” option
    •    Feel more confident navigating uncertainty without rushing into decisions

Who this session is for
    •    Data &amp; AI professionals with a few years of experience or more
    •    People who feel “in between” stages, roles, or directions
    •    Individual contributors and leaders alike
    •    Anyone who wants more intentionality in their work</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Tutorial</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/DDVW3W/</url>
            <location>Ferrum [2nd Floor]</location>
            
            <attendee>Tereza Iofciu</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>N8QVT8@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-N8QVT8</pentabarf:event-slug>
            <pentabarf:title>Accelerate FastAPI Development with OpenAPI Generator</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T142000</dtstart>
            <dtend>20260415T155000</dtend>
            <duration>013000</duration>
            <summary>Accelerate FastAPI Development with OpenAPI Generator</summary>
            <description>Machine learning models are often deployed as APIs, where we have an endpoint that generates predictions given some input. For example, we can send a POST request specifying a color, a length, and a number of legs, and the endpoint predicts the best fitting animal. The description of the endpoint, the schema of the request, and the response acts as a form agreement between the consumer and the service. In practice, the restrictions on the API are not well defined. How does the consuming app know if a parameter is optional or required? 
In this tutorial you will learn to define an API contract as an OpenAPI specification (OAS). OAS is a standardized description of the API endpoints and data models. We will demonstrate how to use the OpenAPI Generator to automatically generate the API endpoints and strictly typed Pydantic data models, by only designing the OAS in YAML format, without GenAI. OpenAPI Generator utilizes mustache templates to translate the specification into actual code. We will demonstrate use cases for customizing the template for specific needs of the resulting API stubs.  
By generating code from the contract, you ensure that the deployed application always reflects the agreed-upon specification. It automates the writing of repetitive code, such as Pydantic models and endpoint definitions, allowing developers to focus on the implementation logic. It enforces standard patterns and structures, ensuring consistency and maintainability across different projects.

Expect fun mystic creatures after deploying the resulting API in your local environment.

#### Target Audience
Engineers and data scientists looking to standardize their FastAPI development workflow. We expect you to have basic knowledge in Python, virtualenv, Pydantic data models and FastAPI.

**To attend this workshop, please install the openapi generator v7.20.** 
For details, please visit the README.md of https://gitlab.com/Eeffee/pycon26

#### Technical Setup
* _Operating system:_ We recommend using Unix OS (Mac or Linux)
* _Python:_ Version 3.10+
* _OpenAPI Generator:_ Version 7.20
   * Installation Guide: https://openapi-generator.tech/docs/installation/

For details, please visit the README.md of https://gitlab.com/Eeffee/pycon26

#### Outline
1. Introduction (10 min)
* The philosophy of Contract-First development
* Overview of the OpenAPI specification and Pydantic data models
* Introduction to the OpenAPI generator tool
2. Design (20 min)
* Introduction to the unicorn service logic (Input: Real Life Problems,  Output: Mystic Creatures)
* Definition of the openapi specification, focusing on the Request and Response schemas
3. Generate (30 min)
* Running the standard vanilla OpenAPI generator
* Introduction to mustache templates
* Customization of the default mustache to inject our specific dependencies
4. Implementing (15 min)
* We will connect the generated API stubs to a predict() function that calls our unicorn generation service.
5. Demo &amp; QA (15m)
* Running the server via u**v**icorn and testing our u**n**icorn service endpoint using the Swagger UI.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Tutorial</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/N8QVT8/</url>
            <location>Ferrum [2nd Floor]</location>
            
            <attendee>Dr. Evelyne Groen</attendee>
            
            <attendee>Kateryna Budzyak</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>N98BQT@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-N98BQT</pentabarf:event-slug>
            <pentabarf:title>Practical Refactoring with Syntax Trees</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T161500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Practical Refactoring with Syntax Trees</summary>
            <description>Modern Python tooling relies heavily on syntax trees. In this talk, we take a practical look at Python&#x27;s Abstract Syntax Tree (AST) and how Python code can be treated as structured data rather than plain text.

We&#x27;ll start from first principles: how Python source code is parsed, what an AST represents, and how to reason about code as a tree. This builds a clear mental model that makes syntax-tree-based tooling easier to understand and work with.

From there, we&#x27;ll explore how syntax trees enable automated refactoring across large codebases using scripts to rewrite code (sometimes called codemods). 
Using a realistic refactoring scenario, we&#x27;ll implement a small refactoring tool using libCST.

The talk also shares practical tips from writing codemods. This includes how to use test-driven development when writing refactoring tools, where AI can help in refactoring tasks, and strategies for dealing with formatting.

Attendees will leave with a solid understanding of how syntax trees work in Python and a concrete starting point for writing their own automated refactoring tools.


Outline:

Minutes 0-5: Primer on Python syntax trees and the AST mental model
Minutes 5-12: From syntax trees to codemods and automated refactoring
Minutes 12-22: Implementing a refactoring codemod with libCST
Minutes 22-27: Test-driven codemods, formatting strategies, and AI assistance
Minutes 27-30: Conclusion


EDIT: 
- [Slides](https://ldirer.github.io/talk_pycon_de_2026_libcst/)
- [Code for example](https://github.com/ldirer/codemod-rename-pytest-fixtures)</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/N98BQT/</url>
            <location>Ferrum [2nd Floor]</location>
            
            <attendee>Laurent Direr</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>R7TT3E@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-R7TT3E</pentabarf:event-slug>
            <pentabarf:title>Simplifying RAG Document Pipelines with Multimodal Embeddings</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T165500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Simplifying RAG Document Pipelines with Multimodal Embeddings</summary>
            <description>This talk provides an overview of how document processing for RAG systems can be simplified using multimodal embeddings, grounded in benchmarks on real-world enterprise documents.

What the talk covers

1. **Motivation: Why RAG Is Still Hard**  
   Why PDFs remain challenging in enterprise RAG systems, and where current document processing approaches break down—especially for presentations and visually structured documents.

2. **The Classical Approach: PDF → Text → Chunks**  
   An overview of traditional OCR- and layout-based pipelines, including their strengths, typical failure modes, and why they tend to grow into complex and fragile systems over time.

3. **A New Paradigm: Multimodal Page Embeddings**  
   How embedding entire PDF pages as images changes the ingestion model, what information is preserved compared to text-only approaches, and what this means for retrieval quality and system simplicity.

4. **Benchmark Setup**  
   How the benchmark comparing classical pipelines and multimodal page embeddings was designed, using anonymized, real-world enterprise documents across multiple document types. Different models and vendors are referenced only as examples, not as the focus.

5. **Results and Key Findings**  
   Where multimodal page embeddings outperform text-based pipelines, where they do not, and how hybrid approaches can emerge as a practical solution.

6. **Production Best Practices**  
   Practical guidance for deploying these approaches in real systems, including index design, quality monitoring, cost control, and how to integrate multimodal retrieval cleanly into Python-based RAG architectures.

Attendees will leave with a clear understanding of when multimodal embeddings are a strong replacement for classical PDF pipelines, and how to reason about the trade-offs involved.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/R7TT3E/</url>
            <location>Ferrum [2nd Floor]</location>
            
            <attendee>Arne Grobrügge</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>P8Y9TD@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-P8Y9TD</pentabarf:event-slug>
            <pentabarf:title>The Day the Agent Started Lying (Politely)</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T173500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>The Day the Agent Started Lying (Politely)</summary>
            <description>In this talk, we will walk through a concrete production-style example of an LLM-based agent that automatically classifies and routes incoming customer support tickets. The agent takes raw ticket text as input, predicts a priority label, and routes the ticket to the appropriate support queue. A human override is possible but expected to be rare.

At deployment time, the system performs well. Classification confidence is high, fallback usage is low, and manual corrections are infrequent. Over time, however, the environment changes: new products are launched, outages introduce new failure modes, terminology evolves, and internal definitions of ticket priorities shift. Nothing crashes, latency remains stable, and traditional service-level metrics stay green; yet the agent’s decisions slowly degrade.

This talk focuses on how to observe, measure, and act on that degradation.

Using recorded ticket data and a demo, I will show how to instrument an LLM-based agent with continuous evaluation signals, including:

- Tracking class-probability entropy over time to detect increasing uncertainty
- Monitoring the rate of “unknown” or fallback predictions as an early warning signal
- Measuring embedding distribution drift between historical and recent tickets
- Quantifying disagreement between current agent decisions and historical routing outcomes or human corrections

I will demonstrate how these signals can be computed in rolling time windows, visualised on simple dashboards, and connected to alert thresholds. Rather than relying on a single accuracy number, the talk shows how multiple weak signals together reveal silent failure modes that would otherwise go unnoticed.

The focus is deliberately not on training new models or tuning prompts. Instead, we concentrate on operating LLM-based agents safely after deployment. You will see how to build a continuous evaluation pipeline, how to distinguish normal variation from meaningful drift, and how to decide when intervention is required whether that means retraining, prompt changes, label redefinition, or temporary rollback to human routing.

By the end of the talk, attendees will have a clear, practical blueprint for monitoring LLM-based agents in production and for detecting quiet, confident failure modes before they affect users or business operations.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/P8Y9TD/</url>
            <location>Ferrum [2nd Floor]</location>
            
            <attendee>Asya Melnik</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>VWCZXS@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-VWCZXS</pentabarf:event-slug>
            <pentabarf:title>Process, Analyze, and Transform Python Code with ASTs</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T101500</dtstart>
            <dtend></dtend>
            <duration>013000</duration>
            <summary>Process, Analyze, and Transform Python Code with ASTs</summary>
            <description>This tutorial will be a roughly 50/50 split of lecture and exercises. Attendees will get hands-on experience working with ASTs in Python, using only the standard library. By recreating common code-quality checks from scratch, attendees will both learn how common tools work under the hood and how to work with the AST in an easy-to-understand fashion.

Topics covered:
- Introduction to the term and concept of Abstract Syntax Trees (ASTs)
- Some of the ways ASTs are used by Python itself and by popular tools
- Parsing code into an AST and inspecting it
- Walking the tree: `ast.iter_fields()`, `ast.iter_child_nodes()`, `ast.walk()`
- Modifying the code before running it
- Converting an AST into source code again with `ast.unparse()` and its caveats
- Finding missing docstrings
- `ast.NodeVisitor` and `ast.NodeTransformer`
- `generic_visit()` method  — what it does and why we need it using animation
- 4 exercise breaks spread throughout accounting for ~45 minutes</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Tutorial</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/VWCZXS/</url>
            <location>Dynamicum [Ground Floor]</location>
            
            <attendee>Stefanie Molin</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>AGYLTV@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-AGYLTV</pentabarf:event-slug>
            <pentabarf:title>Array-Oriented Programming in Python: Libraries, Techniques, and Trade-offs</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T142000</dtstart>
            <dtend></dtend>
            <duration>013000</duration>
            <summary>Array-Oriented Programming in Python: Libraries, Techniques, and Trade-offs</summary>
            <description>## Material
https://github.com/ikrommyd/2026-04-15-pyconde-and-pydata-2026-tutorial-array-oriented-programming
What you need: Your laptop, and the repository cloned and the environment set up as explained in the README. Alternatively, an internet connection during the tutorial to set up the environment live or follow along on MyBinder.
The setup is needed to do the problems/puzzles which are part of the tutorial.

## Overview

Python&#x27;s dominance in scientific computing and data science stems from its powerful array libraries that enable high-performance numerical computation. This 90-minute tutorial introduces array-oriented programming as a paradigm and surveys the modern Python array ecosystem, helping you understand which tools to use and when.

## What is Array-Oriented Programming?

Array-oriented programming is a paradigm that separates problems into lightweight Python bookkeeping and heavy numerical computation handled by vectorized operations in fast, precompiled libraries. We&#x27;ll demonstrate how this approach combines Python&#x27;s ease of use with near-compiled-language performance.

Through live examples, you&#x27;ll see how array operations can be orders of magnitude faster than explicit loops. This mindset shift—thinking about operations on entire arrays rather than individual elements—is fundamental to effective scientific Python programming.

## The Array Library Landscape

We&#x27;ll survey the modern Python array ecosystem and when to use each tool:

- **NumPy**: The foundation for general-purpose array operations
- **Numba &amp; JAX**: JIT compilation approaches—when and why to use each
- **Awkward Array**: Handling nested and ragged data structures
- **Large dataset tools**: Brief overview of Dask, Xarray, Zarr, and Blosc2 for distributed computing, labeled arrays, and compression

We&#x27;ll demonstrate the strengths and limitations of each through live coding examples, showing trade-offs between different approaches.

## Understanding Limitations and Trade-offs

A critical part of choosing the right tool is understanding when array-oriented programming has limitations. We&#x27;ll discuss challenges like intermediate array overhead and algorithms that don&#x27;t naturally vectorize, and show how different libraries address these problems.

## What You&#x27;ll Learn

By the end of this tutorial, you will:

1. **Understand array-oriented programming** as a paradigm and how it differs from imperative programming
2. **Know which library to choose** for different problems: NumPy vs. Numba vs. JAX vs. specialized tools
3. **Recognize when array-oriented approaches have limitations** and how to address them with JIT compilation
4. **Handle non-rectilinear data** using libraries like Awkward Array
5. **Work with large datasets** using chunking, compression, and labeled arrays
6. **Write more performant Python code** by applying array-oriented thinking to your own problems

## Prerequisites

Familiarity with Python (loops, functions, if statements) and basic NumPy exposure (what an array is and how to use it). No deep expertise required.

## Target Audience

Data scientists, researchers, and engineers who want to write more efficient Python code, understand the modern array ecosystem, or choose the right tools for their problems.

## Outline
* **0:00‒0:10 (10 min)** Lecture 1: Array-oriented programming and its benefits. Simple and complex (3 body problem) examples of imperative, functional, and array-oriented styles. Speed and memory advantages in Python. What the array-oriented paradigm emphasizes/is good for: interactive analyses of distributions. Path length as a worked example.
* **0:10‒0:25 (15 min)** NumPy puzzles and solutions. Alternating between hands-on puzzles and walkthrough of solutions: array slicing, consecutive differences, curve length, and image downscaling with reshape.
* **0:25‒0:35 (10 min)** Lecture 2: Disadvantages of array-oriented programming. (1) The problem of intermediate arrays, shown using the quadratic formula, with timing, compared to pre-compiled C code. (2) The “iterate until converged” problem, shown using a one-dimensional minimizer (Newton’s method) for an array of initial states; talk about epochs in ML.
* **0:35‒0:45 (10 min)** Lecture 3: JIT-compilation with Numba and JAX. Describe JIT-compilation as the solution to the intermediate array problem (1). First Numba then JAX on the quadratic formula. Show that Numba only accelerates if you write imperative code, unlike JAX, and show that JAX can’t follow if-branches or loops of unknown length.
* **0:45‒0:55 (10 min)** Project 3: JIT-compilation of the Mandelbrot set. Walk through imperative Python, array-oriented NumPy, Numba, and JAX implementations with timings. Note that array-oriented programming is advantageous for GPU programming, even beyond Python.
* **0:55‒1:05 (10 min)** Lecture 4: Ragged and deeply nested arrays. Show examples of ragged, nested, missing, and heterogeneous data, and how it can still make sense to treat them as arrays. Conversion to and from “tidy” data (tabular with references) to compare and contrast.
* **1:05‒1:20 (15 min)** Lecture 5: Working with large datasets. Overview of tools for chunking, compression, and labeled arrays: Dask, Zarr, Blosc2, and xarray.
* **1:20‒1:30 (10 min)** Wrap-up and Q&amp;A.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Tutorial</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/AGYLTV/</url>
            <location>Dynamicum [Ground Floor]</location>
            
            <attendee>Iason Krommydas</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>S9VSCV@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-S9VSCV</pentabarf:event-slug>
            <pentabarf:title>Roll for Architecture: DungeonPy – A D&amp;D Companion as Server + Thin Clients</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T161500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Roll for Architecture: DungeonPy – A D&amp;D Companion as Server + Thin Clients</summary>
            <description>## **Roll for Architecture: DungeonPy – A D&amp;D Companion as Server + Thin Clients**
Many tiny, personal projects reach a point where “it works on my machine” is no longer the interesting part, and it becomes more about making it *scale in structure*: clean boundaries, explicit state, testable behaviour and room for new features. This session is a case study of that journey using a D&amp;D assistant for remote playing written in Python, that turned into something completely off-scale.
The starting point is a few desktop clients:
- a **Pygame** (battle) *map* (grid, tokens, map objects, movement), and
- a **PySimpleGUI** *tracker* for initiative, HP and conditions, with a clear “active combatant” concept,

initially synchronized with lightweight TCP messages.
This already exposes real engineering questions: avoiding GUI thread violations, preventing feedback loops, and deciding what the “source of truth” is when both ends can initiate updates.
The evolved version introduces an **authoritative server**. Players connect as clients and can interact in real time – moving tokens and manipulating shared objects – while the DM client keeps full visibility and control. Clients do not share state with each other: they submit *intents* (move here, end turn, toggle condition), the server validates, updates state, and broadcasts events plus periodic snapshots. The key architectural move is *role-scoped state*: the server owns the full truth and projects different “views” to each client (DM omniscience vs per-player information), so fog-of-war and hidden details are enforced by design. In other words:
- **DM client**: full map + all combatants + hidden details.
- **Player client**: a filtered view (only the player’s character sheet details, their token, and whatever the DM has revealed).

The authoritative server runs on a small VPS with a public endpoint. Clients connect over secure WebSockets (wss://) on port 443, so players can join from anywhere without port forwarding. TLS is terminated by a standard reverse proxy, and the server speaks a small JSON message protocol (snapshots + events) over WebSocket frames.

### **Open source software used**
- `Pygame` (map rendering + input)
- `PySimpleGUI` (initiative/conditions UI)
- `asyncio` (multi-connection handling)
- `websocket` (client/server transport)
- (non-python) `NGINX` (reverse proxy, TLS)

### **Detailed talk outline**
1. - **Intro: D&amp;D, remote play, and why am I doing this?**
2. - **Setting up the table**
        - PyGame
        - PySimpleGUI
3. - **State model and serialisation**
        - Turning GUI objects into explicit data (combatants, map, doors, initiative order).
        - JSON snapshots and versioning.
4. - **Protocol design: events vs snapshots**
        - Event messages for responsiveness (“token moved”, “condition added”).
        - Snapshot sync for recovery and late joiners.
        - Idempotency and ordering: simple sequence numbers, replay safety and conflict avoidance.
5. - **Role-based filtering (the privacy boundary)**
        - A single canonical server state.
        -  Server-side “view projection”: DM view vs per-player view.
        - Practical examples: hidden enemies, secret doors, private notes, fog-of-war style reveals.
6. - **Concurrency and UI integration**
        - Socket threads feeding GUI event loops safely (posting events into the GUI thread rather than touching widgets directly).
        - Keeping the map smooth under network jitter: optimistic UI vs confirmed updates (and when not to).
7. - **Testing strategy for a networked hobby project**
        - Unit tests for pure state transitions (“apply damage”, “advance turn”, “illegal move rejected”).
        - Protocol tests with simulated clients.
        - Logging that helps during live play without drowning you in noise.
8. - **Extensibility hooks**
        - Adding new client types (spectator screen, mobile character sheet).
        - Plug-in style rules (different systems, homebrew conditions).
        - Future improvements: authentication for remote play, persistence, and reconnection.

### **Takeaways**
Attendees will leave with a practical blueprint for:
- Designing a tiny, testable message protocol in Python.
- Updating GUIs safely from background network threads.
- Enforcing “who can see what” without duplicating logic everywhere.
- More importantly, they’ll be given an example on how python can be used for highly non-standard tasks (like allowing remote role-playing gaming).

### **Intended audience**
Basic and intermediate Python developers comfortable with basic classes and modules, and curious about architecture and networking. This talk will be as much about *Python* as it will be about *nerd culture*: the goal is learning something while keeping a light heart.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/S9VSCV/</url>
            <location>Dynamicum [Ground Floor]</location>
            
            <attendee>Francesco Conte</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>BHJERV@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-BHJERV</pentabarf:event-slug>
            <pentabarf:title>Django-Q2: Async Tasks Made Simple</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T165500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Django-Q2: Async Tasks Made Simple</summary>
            <description>Handling asynchronous tasks and cron jobs in Django is essential for features like sending emails or generating periodic reports. However, the industry standard Celery often comes with significant configuration overhead and infrastructure dependencies like Redis or RabbitMQ.

If you have ever struggled with that complexity or looked for a more intuitive way to manage background processes, Django-Q2 is the answer. It is a lightweight solution that leverages your existing database, eliminating the need for complex brokers. Its native integration makes it perfect for small to medium-sized projects that need to move fast.

This talk will guide you through integrating Django-Q2 to simplify your workflow:

- Problem Solving: We will look at how to use Django-Q2 to solve real-world task management issues.

- Feature Deep Dive: We will explore key features, such as using the database as a backend and monitoring tasks directly from the Django Admin interface.

- Live Demo: We will configure Django-Q2 from scratch to handle asynchronous email sending and schedule a recurring maintenance job</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/BHJERV/</url>
            <location>Dynamicum [Ground Floor]</location>
            
            <attendee>Moin Uddin</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>GPV9SM@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-GPV9SM</pentabarf:event-slug>
            <pentabarf:title>Holistic Optimization: Implementing &quot;Pipeline-as-a-Trial&quot; HPO with Ray and Cloud Infra</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T173500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Holistic Optimization: Implementing &quot;Pipeline-as-a-Trial&quot; HPO with Ray and Cloud Infra</summary>
            <description>Have you ever tuned a model to perfection, only to have it fail once integrated into your production pipeline? This is the &quot;local optimization&quot; trap: fixing a component while unintentionally breaking the complex system around it. 
At Zalando, where we manage hundreds of forecasting models across 25 countries, local wins often lead to global failures.In this talk, we move beyond single-model tuning to explore Holistic Optimization. 
We will detail how our team implemented a &quot;Pipeline-as-a-Trial&quot; architecture, 

What We’ll Cover:
* An explanation of what &quot;local optimization&quot; problem is, and how it appears everywhere from tech products to day-to-day life.
* How we leveraged Ray’s distributed capabilities to manage high-concurrency Machine Learning workloads.
* Infrastructure Comparison: A candid, battle-tested breakdown of running HPO across AWS SageMaker, Databricks, and Internal EC2/Metaflow clusters.
* Operational Trade-offs: Real-world insights into the performance, cost, and traceability of different cloud implementations.
*Configuration Driven Development: How an abstract library layer allows us to scale experimentation across hundreds of production models.

Stop chasing local solutions. Join me to learn how to build a distributed HPO framework that optimizes for your global business objectives.

PS: if you are a &quot;Rick and Morty&quot; fan, definitely join to see how Rick fell into the local optimization problem!</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/GPV9SM/</url>
            <location>Dynamicum [Ground Floor]</location>
            
            <attendee>Abdullah Taha</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>KBNKAP@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-KBNKAP</pentabarf:event-slug>
            <pentabarf:title>Innovation Day: Startup Lounge [no-video]</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T100000</dtstart>
            <dtend>20260415T120000</dtend>
            <duration>020000</duration>
            <summary>Innovation Day: Startup Lounge [no-video]</summary>
            <description>## What to expect

This is not a talk. There are no slides.

The Startup Lounge is an interactive, participatory format built around open conversations and peer exchange. Whether you&#x27;re actively building a company, thinking about it, or just want to understand how open source and AI actually connect to real business — this is your space.

You&#x27;ll be in a room with 2,000+ Python developers, engineers, and data scientists. That&#x27;s a rare concentration of technical depth and practical experience. Use it.

The session combines short impulse talks, guided table discussions, and open networking.

## Program

### 10:00 – 10:15 — Welcome &amp; Introduction

An introduction to the format and an overview of the session.

### 10:15 – 10:30 — Impulse Talks

Three short talks to spark discussion, including topics such as:

* Open source &amp; business models  
* Developer tools and coding with AI  
* Additional community-driven topics

### 10:30 – 11:30 — Table Discussions

The core of the session — three parallel discussion tables:

* **&quot;Hair on Fire&quot;** — urgent founder challenges (product, hiring, funding, tech)  
* **Startup Curious** — for future founders and those exploring the startup world  
* **Tech Stuff** — where Python meets product: scaling, architecture, AI/ML in production

Participants are free to move between tables and join different conversations.

### 11:30 – 12:00 — Open Networking

Unstructured time to continue conversations, connect, and follow up.

## Why join

Because the best conversations at a conference never happen on stage.

The Startup Lounge is intentionally small, intentionally unrecorded, and intentionally unscripted. You&#x27;ll meet founders, engineers, and domain experts — and actually talk to them. Share a challenge, get direct input, form your own picture.

If you&#x27;re building something, thinking about building something, or just want to understand how open source and AI are reshaping what&#x27;s possible — come find out for yourself.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Open Space</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/KBNKAP/</url>
            <location>Lounge [1st Floor]</location>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>7YA98N@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-7YA98N</pentabarf:event-slug>
            <pentabarf:title>Workshop: What do we still need to learn? [no-video]</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260415T142000</dtstart>
            <dtend>20260415T155000</dtend>
            <duration>013000</duration>
            <summary>Workshop: What do we still need to learn? [no-video]</summary>
            <description>These are troubling times. Interesting times. Exciting times. Times that challenge us to reflect on where we should be focusing our energy.

In this interactive workshop, we will start by collectively answering four thought-provoking questions on shared boards — every voice, every perspective, visible in the room. Then we vote with dots on the topics that matter most to us. The winning topics go straight into a fishbowl: short, focused five-minute discussions where a small circle talks while everyone else listens, reacts, and rotates in.

The insights and themes from this session will flow directly into the panel discussion on Thursday afternoon, so your voice carries further than this room.
Come in, participate, find a seat, and let us figure this out together.

_Open community space · Non-recorded · All experience levels welcome_</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Open Space</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/7YA98N/</url>
            <location>Lounge [1st Floor]</location>
            
            <attendee>Paula Gonzalez Avalos</attendee>
            
            <attendee>Sebastian Neubauer</attendee>
            
            <attendee>Dr. Kristian Rother</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>LSJ3CN@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-LSJ3CN</pentabarf:event-slug>
            <pentabarf:title>A View of Sovereignty from The Cloud</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T090500</dtstart>
            <dtend>20260416T095000</dtend>
            <duration>004500</duration>
            <summary>A View of Sovereignty from The Cloud</summary>
            <description>While The Cloud is just someone elses computer, those computers come together from many places and many, many someone elses. The constituent parts to connect, power, house, and ultimately operate those computers are from many more places and someones still! We explore what these infrastructure pieces of The Cloud are explicitly; and how the many definitions of digital sovereignty can be viewed from the viewpoint high up in The Cloud.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Keynote</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/LSJ3CN/</url>
            <location>Merck Plenary (Spectrum) [1st Floor]</location>
            
            <attendee>Aaron Glenn</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>EPASS8@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-EPASS8</pentabarf:event-slug>
            <pentabarf:title>Building MCP at the Speed of Hype: Principles That Outlast the Trends</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T101500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Building MCP at the Speed of Hype: Principles That Outlast the Trends</summary>
            <description>Every week, AI brings us another groundbreaking release, another model version, another must-have integration. Among these developments, agentic systems have emerged as a key component. Introduced at the end of 2024, the Model Context Protocol (MCP) has become an important enabler of this change and has established itself as the standard for connecting AI agents with external data sources and tools. In this rapidly shifting landscape, how does one build production systems that won&#x27;t be obsolete by the time you deploy them?

This talk shares practical lessons from building two real-world MCP applications with FastMCP and PydanticAI: JobmonitorMCP, which leverages the jobmonitor.de API to create intelligent regional labor market reports, and a tool for an international non-profit combining multiple agents into a powerful question and answer application. 

During development, we faced multiple challenges: MCP clients and models that interpret the same protocol differently, emerging features with limited documentation and trying to evaluate non-deterministic outputs. Stakeholders repeatedly asked &quot;Why does it behave differently today?&quot; and &quot;Are we using the newest model yet?&quot;

What we learned: The antidote to AI hype isn&#x27;t avoiding new technology, it&#x27;s anchoring development in trusted engineering principles. Separation of concerns and focused components helped us design for the protocol rather than specific clients. Rigorous evaluation approaches combined LLM-as-Judge with manual review and user feedback. Transparent communication helped us manage expectations around AI capabilities without undermining confidence.

This session targets intermediate Python developers building or planning to build AI-powered applications. You&#x27;ll leave with concrete strategies for building AI systems that adapt to new models while maintaining production stability, reflection questions for your own projects, and perhaps a little more confidence in your existing knowledge.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/EPASS8/</url>
            <location>Merck Plenary (Spectrum) [1st Floor]</location>
            
            <attendee>Rahkakavee Baskaran</attendee>
            
            <attendee>Friederike Bauer</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>37AESH@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-37AESH</pentabarf:event-slug>
            <pentabarf:title>In Praise of Documentation: Tools, Tips &amp; Techniques for Literate Programming in the AI Age</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T105500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>In Praise of Documentation: Tools, Tips &amp; Techniques for Literate Programming in the AI Age</summary>
            <description># Introduction

# In Praise of Documentation

- The Promise of &quot;Literate Programming&quot;
- A Lamentation on the Death of Literate Programming
- Bad things that happen when you don&#x27;t document

# Why You Should Document

- Code is Communication
- Accessible, Maintainable, Sustainable Code
- Version Control (e.g. GitHub)

# Examples

## Examples of Open Source Documentation:

- Python `help`
- Docstrings
- Unix `man` pages
- `README.md`
- `Readthedocs.com`

## Documentation Framework Example: 

- Diátaxis

## Python Documentation Tool Examples: 

- Sphinx
- `cookiecutter`

## Scientific Publishing Tool Examples:

- Jupyter
- Quarto

# The Bit About AI (yes, I know, and I&#x27;m sorry)

- The importance of text for AI Code Assistant and Agentic Coding workflows

## Documentation in &quot;Spec-Driven&quot; Development:

- `AGENTS.md`
- Agent-OS with Claude Code

# Calls to Action: 

- Writing Tips by George Orwell
- Please Document Your Code!</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/37AESH/</url>
            <location>Merck Plenary (Spectrum) [1st Floor]</location>
            
            <attendee>Stephen</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>RUSUYF@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-RUSUYF</pentabarf:event-slug>
            <pentabarf:title>7 Anti-Lessons from Building a PydanticAI Agent: Mistakes We Made So You Don&#x27;t Have To</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T113500</dtstart>
            <dtend></dtend>
            <duration>004500</duration>
            <summary>7 Anti-Lessons from Building a PydanticAI Agent: Mistakes We Made So You Don&#x27;t Have To</summary>
            <description>## The Domain: Where Mistakes Are Expensive

[Qualio](https://www.qualio.com/) builds quality management software for life sciences companies — the ones making medical devices, pharmaceuticals, and biotech products. Our customers navigate FDA 21 CFR Part 11, ISO 13485, EU MDR, and SOC 2. In this world, compliance isn&#x27;t optional. Audit trails are mandatory. Documentation gaps mean warning letters, import bans, or product recalls.

When we decided to build an AI agent to help users manage compliance gaps, create remediation plans, and handle documentation — we knew the stakes. An agent that hallucinates a regulatory requirement or skips an approval step isn&#x27;t just annoying. It&#x27;s a liability.

So we built carefully with PydanticAI and Claude. And we still made every mistake possible. Here are 7 anti-lessons from the trenches.

---

### Anti-Lesson 1: &quot;We need a multi-agent system&quot;

It seemed obvious: separate agents for documents, compliance, and events. Clean architecture. We built it, shipped it, and spent weeks debugging coordination failures and inconsistent responses. In a domain where consistency matters, multi-agent chaos was unacceptable. The fix? Delete it. One agent with dynamic capabilities. Simpler, faster, and — according to our evals — more accurate.

### Anti-Lesson 2: &quot;Agents need sophisticated planning&quot;

Compliance workflows are complex. Surely the agent needs workflow graphs, state machines, planning frameworks? We tried. The agent got confused, skipped steps, invented procedures. The fix? A todo list. Add a task, check it off, see what&#x27;s next. In a regulated environment, simple and auditable beats clever and opaque.

### Anti-Lesson 3: &quot;Give the agent lots of specific tools&quot;

We built dozens of tools using PydanticAI&#x27;s tool registration: `create_document`, `update_control`, `get_gap_details`, `list_frameworks`, `submit_for_review`... The tool descriptions bloated the context. The agent picked wrong tools. The fix? Two high-level tools: `call API` (with OpenAPI specs for the details) and `read instruction` (load a markdown file). Fewer tools, better results, easier to audit.

### Anti-Lesson 4: &quot;Encode workflows in code&quot;

How does the agent know how to remediate a compliance gap? How to create a controlled document? At first, it was buried in prompts and Python. The fix? Markdown files — like Claude&#x27;s skills system. The agent reads them at runtime. Engineers can review them. Knowledge belongs in documents your compliance team can actually read.

### Anti-Lesson 5: &quot;It works when I test it&quot;

Our early tests passed. The agent handled every case we threw at it. Then real users arrived — and everything broke. The problem? Our test cases were simple, synthetic, and predictable. Real user journeys are messy, multi-step, and full of context we didn&#x27;t anticipate. The fix? Realistic evaluation data. We capture actual user sessions, anonymize them, and run them through pydantic-evals with LLM-as-judge rubrics. Does the agent follow the procedure? Does it hallucinate requirements? Does it handle the weird tangents users take? A 95% pass threshold in CI means nothing if your test data doesn&#x27;t reflect reality.

### Anti-Lesson 6: &quot;Automate everything&quot;

We built a fully automated feedback loop: user feedback creates a Jira ticket, a dev triages it, a Claude instance picks it up, raises a PR, responds to review comments. The dream? The fix: keep the human in the driver&#x27;s seat. PydanticAI&#x27;s `DeferredToolRequests` pattern lets our agent propose actions and pause for approval — the same principle applies to our dev workflow. In compliance software, someone is always accountable. The automation handles grunt work. Humans make decisions. Assisted development, not autopilot.

### Anti-Lesson 7: &quot;Apply what made you successful before&quot;

This is the meta anti-lesson. Good engineering habits — upfront design, comprehensive APIs, handling every edge case — _can_ slow you down with agents. The LLM will surprise you. Your assumptions will be wrong. The fix? Start scrappy, iterate fast, let evals tell you what&#x27;s working. The hardest part isn&#x27;t code. It&#x27;s unlearning.

---

### Bonus: Scaling Agent Development with tmux

How to run multiple agent experiments in parallel. Low-tech, high-leverage.

---

## Who Should Attend

Developers building AI agents, especially in domains where accuracy and auditability matter. Familiarity with PydanticAI is helpful but not required — you&#x27;ll see enough code to get started.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk (long)</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/RUSUYF/</url>
            <location>Merck Plenary (Spectrum) [1st Floor]</location>
            
            <attendee>Joshua Görner</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>AKGUAC@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-AKGUAC</pentabarf:event-slug>
            <pentabarf:title>Open Source as a Business — Models, Paths, and Practice</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T132000</dtstart>
            <dtend>20260416T142000</dtend>
            <duration>010000</duration>
            <summary>Open Source as a Business — Models, Paths, and Practice</summary>
            <description>The discussion centres on concrete experience — different starting points, different ecosystems, different business models — with the shared thread being open source as a deliberate professional and commercial choice.

**Key Questions:**

1. *Different entry points:* You each came to building a business on open source from a different direction. What drove that decision — and what did you not expect?

2. *Where the business actually starts:* Open source is the foundation, not the product. How do you define what you sell, and to whom?

3. *Community and commerce:* How do you maintain trust and credibility in an open source community while running a commercial operation around it?

4. *Open source and AI:* The AI landscape is consolidating fast around closed systems. What does that mean for open source projects and the businesses built on them?

5. *European perspective:* Is there something specifically European about the way you think about open source as a business — around sustainability, sovereignty, or independence?

6. *Advice:* What would you tell someone who wants to build a business on open source — or switch to doing so — and has not yet started?</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Panel</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/AKGUAC/</url>
            <location>Merck Plenary (Spectrum) [1st Floor]</location>
            
            <attendee>Yann Lechelle</attendee>
            
            <attendee>Ines Montani</attendee>
            
            <attendee>Sylvain Corlay</attendee>
            
            <attendee>Alexander CS Hendorf</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>KFPNUA@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-KFPNUA</pentabarf:event-slug>
            <pentabarf:title>Panel What Do We Still Need to Learn?</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T150500</dtstart>
            <dtend>20260416T160500</dtend>
            <duration>010000</duration>
            <summary>Panel What Do We Still Need to Learn?</summary>
            <description>If AI can handle the code, the writing, the routine data processing, and complete automation frameworks with agents, what exactly are we supposed to be learning? This panel brings together the Python community for an honest and likely heated conversation about the skills that actually provide a human edge in an automated world. We are moving past the hype to look at the hard realities of 2026:

**The Educational Pivot:** When the doing is automated, does technical mastery still matter? A developer and Data Science education expert debates whether we still need to learn the how, or whether the focus should shift entirely to the why.
**The Global Reality**: A consultant&#x27;s view on how AI is transforming non-technical industries. It is no longer just about code; AI is reshaping the very tasks that define professional roles across the board. And how do you spot real talent in a world of AI-assisted portfolios?
**The Future Framework:** The Head of an AI Academy asks: how do we upskill an entire workforce when the tools are changing faster than any curriculum can be written? And which future skills matter most, beyond AI skills themselves?

No consensus guaranteed. These are the very questions we all need to answer, right now.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Panel</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/KFPNUA/</url>
            <location>Merck Plenary (Spectrum) [1st Floor]</location>
            
            <attendee>Paula Gonzalez Avalos</attendee>
            
            <attendee>Sebastian Unterreitmeier</attendee>
            
            <attendee>Silvia Hänig</attendee>
            
            <attendee>Dr. Kristian Rother</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>9ZEFTR@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-9ZEFTR</pentabarf:event-slug>
            <pentabarf:title>Closing Session</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T162000</dtstart>
            <dtend>20260416T164000</dtend>
            <duration>002000</duration>
            <summary>Closing Session</summary>
            <description>Closing Session</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Plenary</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/9ZEFTR/</url>
            <location>Merck Plenary (Spectrum) [1st Floor]</location>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>TZYGTL@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-TZYGTL</pentabarf:event-slug>
            <pentabarf:title>5 Years of NiceGUI: What We Learned About Designing Pythonic UIs</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T101500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>5 Years of NiceGUI: What We Learned About Designing Pythonic UIs</summary>
            <description>Five years ago, the NiceGUI project set out to answer a simple question: Can we build modern, interactive web UIs entirely in Python without giving up power or flexibility? Since then, the framework has evolved into a production-ready, community-driven tool that builds on top of proven technologies such as HTML, CSS, JavaScript, Vue.js, Quasar, Tailwind, and FastAPI—while exposing a Pythonic interface that feels natural to Python developers.
This talk traces that journey and distills the design principles that worked, those that didn’t, and the patterns that ultimately enabled NiceGUI to provide a smooth developer experience.
We begin with a short demonstration of NiceGUI’s “3-line Hello World,” highlighting how familiar Python code can generate dynamic web interfaces. From there, we examine the technical foundations that allow the framework to stand on the shoulders of major frontend and backend ecosystems.
The core of the talk focuses on Python language features and how they shape API design:

* **Context managers** to express hierarchy and UI composition intuitively.
* **Method chaining** inspired by the builder pattern for concise, readable configuration.
* **Decorators** (such as @page and @refreshable) to define routing and reactive behaviour without ceremony.
* **Async/await** for event handlers, background tasks, and page functions.
* **Type hints** to support static analysis, IDE completion, and clearer API intent.
* **Dataclasses** as bindable, structured state containers.
* **Default arguments and sentinel patterns** to allow powerful yet discoverable APIs.

Attendees will gain practical insights useful beyond NiceGUI itself: how to design Python APIs for GUI frameworks, dashboards, developer tools, or any domain where clarity, maintainability, and expressiveness matter. The talk is aimed at Python developers interested in web interfaces, framework design, or improving the ergonomics of their own libraries.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/TZYGTL/</url>
            <location>Titanium [2nd Floor]</location>
            
            <attendee>Falko Schindler</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>GATMPP@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-GATMPP</pentabarf:event-slug>
            <pentabarf:title>Surviving AI Fatigue: Staying Sane and Relevant in a Fast Moving Field</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T105500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Surviving AI Fatigue: Staying Sane and Relevant in a Fast Moving Field</summary>
            <description>The world of AI and machine learning is moving at breakneck speed, with new papers, models, benchmarks, and frameworks announced daily. If you have ever felt overwhelmed, behind, or simply exhausted trying to keep up, you are not alone. In this talk, we share our own journey grappling with AI fatigue, what it feels like, why it happens, and what we have learned about staying informed without burning out.

We will start by defining AI fatigue and reflecting on why it is such a pervasive experience in our community, from social media hype to the sheer pace of real innovation. We highlight some of the common pitfalls, like chasing every trend, consuming too much noise, or neglecting mental health, and show why these approaches are counterproductive.

Then, we focus on actionable strategies and habits that actually work. We share concrete tips and techniques we personally use to manage our learning and maintain our enthusiasm for the field, including:

- Crafting an intentional information diet with trusted sources
- Setting clear boundaries and time boxing your learning
- Building a personal knowledge base for long term retention
- Using summarization tools to cut through dense papers and blogs
- Practicing “JOMO,” the joy of missing out, by focusing on depth over breadth
- Learning in public by teaching, blogging, or pairing with others
- Designing small, achievable experiments to stay engaged and motivated

Finally, we will suggest how organizations and teams can help prevent fatigue at a structural level by fostering focus, psychological safety, and curiosity instead of always on urgency.

This talk is for anyone, from beginner to expert, who wants to stay relevant and curious about AI without losing sight of their well being. You will leave with a set of practical tools, a fresh perspective on learning in a chaotic environment, and hopefully the reassurance that it is okay to not know everything.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/GATMPP/</url>
            <location>Titanium [2nd Floor]</location>
            
            <attendee>Ajay</attendee>
            
            <attendee>Jeyashree Krishnan</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>TRGQTL@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-TRGQTL</pentabarf:event-slug>
            <pentabarf:title>Open Table Formats in the Wild™ - Reloaded: Vortexing Ducks over Floating Icebergs</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T113500</dtstart>
            <dtend></dtend>
            <duration>004500</duration>
            <summary>Open Table Formats in the Wild™ - Reloaded: Vortexing Ducks over Floating Icebergs</summary>
            <description>### Description
The core promise of open table formats is engine interoperability with ACID guarantees, mutability, and schema evolution for massive datasets stored on cheap, reliable cloud object storage. Modern data platforms demand far more than *just* interoperable, analytical batch processing. Engineers now require native support for CDC, incremental processing, streaming workloads, low-latency access, and point lookups - especially for AI-driven applications. Ideally, all of this would be covered by a single, unified solution.

However, Parquet - the foundational format for physically storing much of today’s data - predates both the AI boom and the era of unified batch and streaming systems. Likewise, Iceberg’s original design DNA was firmly rooted in large-scale, batch-oriented analytical workloads. This raises an uncomfortable question: are Parquet and Iceberg truly up to the task?

This talk explores that question through real-world use cases and architectural constraints. While the focus is on conveying key ideas and practical insights, the session is aimed at an intermediate to advanced audience. If you are new to the topic, you may want to watch last year’s [episode](https://youtu.be/YdFeHj5lRP4?si=NxO0Ot2-S_kYOokV) on Apache Parquet and Delta Lake, which provides a gentle introduction to the fundamentals of open table formats.

### Takeaways

After this talk, attendees will:
- Understand why incremental processing is not a native concept in Apache Iceberg
- Recognize how Iceberg’s metadata model creates hard limits for low-latency streaming workloads
- Learn why Parquet’s physical layout becomes a bottleneck for point lookups and AI-driven access patterns
- Get an early look at DuckLake and Vortex as emerging alternatives 

### Agenda

**The Past (10 min)**
- Rationale - **The Idealized Model**
- Implications - **The Engineering Trade-offs**

**The Present (15 min)**
- Incremental Processing - **The Missing Primitive**
- Streaming Workloads - **The Batch Inheritance**
- AI Applications &amp; Point Lookups - **The Access Wall**

**The Future (15 min)**
- DuckLake - **The Return of Relational Databases**
- Vortex - **The Parquet of Tomorrow**</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk (long)</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/TRGQTL/</url>
            <location>Titanium [2nd Floor]</location>
            
            <attendee>Franz Wöllert</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>GYBRVN@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-GYBRVN</pentabarf:event-slug>
            <pentabarf:title>Making Tech Tutorials Accessible: Practical Techniques for Educators</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T132000</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Making Tech Tutorials Accessible: Practical Techniques for Educators</summary>
            <description>Accessible content isn&#x27;t just for people with disabilities—it makes tech education better by design for everyone. International learners, people on noisy trains junior developers and tired seniors at the end of the day—they all benefit from subtitles, simple language, and clear structure. Yet most developers who become educators have never learned how to make their content accessible.
This talk shares practical techniques I use creating tech tutorials for deaf and hard-of-hearing learners. Since June 2025, I&#x27;ve been creating Excel tutorial videos with manual subtitles in simple language for a YouTube community (~400 subscribers). My partner is hard of hearing, which taught me that accessibility isn&#x27;t optional—it&#x27;s essential. The techniques could be applied to any tech content: Python tutorials, data science courses, documentation, or workshops.
The talk follows this structure:
1. Understanding Barriers (5 minutes) Who benefits from accessible content? People with permanent, temporary, and situational limitations. 
2. Creating Accessible Videos (15 minutes) My core workflow: manual subtitles in DaVinci Resolve with timing based on text length, using AI tools to simplify technical language, visual clarity with arrows and highlights, and the insight that you can&#x27;t be accessible to everyone—focus on your target audience.
3. Clear Structure for any Content (8 minutes) Applying video principles to any format: logical heading hierarchy for navigation, alternative text for images, using built-in accessibility checkers, and simple language techniques.
4. Getting Started (2 minutes) One action to take this week, free tools to use, and resources for continued learning.
I completed the W3C &quot;Introduction to Web Accessibility&quot; course and will be conducting a guest lecture on accessible learning materials at MSB Medical School Berlin (January 2026). As a non-native German speaker, I understand language barriers firsthand.
Attendees will leave with practical techniques they can implement immediately and the confidence that accessibility is achievable without being an expert. No prior knowledge on accessibility is required.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/GYBRVN/</url>
            <location>Titanium [2nd Floor]</location>
            
            <attendee>Tamara Badikyan</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>TB9WYZ@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-TB9WYZ</pentabarf:event-slug>
            <pentabarf:title>How to compare apples with oranges: Proper evaluation of article-level demand forecasts</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T140000</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>How to compare apples with oranges: Proper evaluation of article-level demand forecasts</summary>
            <description>At the pricing department in Zalando, we are predicting future demand  for millions of articles on a daily basis by large-scale machine-learning models. These forecasts are key for discount decisions taken downstream. As evaluating every forecast on its own becomes infeasible at this scale and frequency we created a set of aggregated metrics that help us make informed statements about the performance of our models. On the one hand these metrics are being used by us to further improve our forecasting models, on the other hand they are used by our stakeholders to make informed decisions.
 
To handle this volume, we use *PySpark* for data processing and scaling our evaluations across the entire assortment. Furthermore, evaluating forecast performance in this context is crucial in two different scenarios, namely when analysing past forecast performance and when creating and comparing alternative models. In both cases we look at different time ranges and possible different subsets of the forecasted articles and calculate aggregated performance measures to compare them. We want to answer questions like


 - “Is this forecast performing better in low-discount periods than during sales events?”
 - “Did we make a higher error on highly discounted articles during last week?”
 - “Is this model well-suited to predict high (or low) selling articles?”
 - “Did our model perform well for sneakers during the last voucher event?”

Evaluating aggregated metrics like a relative mean squared error (MSE) or an mean absolute percentage error (MAPE) over different sets of articles has lots of pitfalls. Comparing different parts of the assortments leads to an &quot;Apples vs. Oranges&quot; problem that we want to elaborate on based on examples we experienced in our daily work.

To answer the questions above we developed a set of aggregated metrics that we monitor on a daily basis using *plotly* and *streamlit* for clear, interactive visualization. We want to present these metrics and explain how they are useful for the questions and tasks mentioned above. We will highlight the techniques and best practices to draw meaningful insights from evaluating forecast performance and how we are able to compare apples with oranges using meaningful lower bounds for our aggregated metrics.

We also want to share how observations from our monitoring influenced the evolution of our *LightGBM* and *PyTorch* models and how it shaped important parts like feature engineering, hyperparameter tuning and the choice of our loss functions. Lastly we will touch on how to communicate these sometimes very technical numbers with stakeholders so that they can make informed decisions without being overwhelmed by details.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/TB9WYZ/</url>
            <location>Titanium [2nd Floor]</location>
            
            <attendee>Stefan Birr</attendee>
            
            <attendee>Mones Raslan</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>WDHTQR@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-WDHTQR</pentabarf:event-slug>
            <pentabarf:title>Simulating the World using SimPy: A practical Example</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T150500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Simulating the World using SimPy: A practical Example</summary>
            <description>Real-world systems are often too complex to test reliably, in the same environment and under the same conditions. Changes are hard to measure, edge cases are difficult to reproduce, and external influences can hide the real behavior of a system. Simulation offers a way to abstract from reality while staying close enough to produce meaningful results. It allows full control over system components, timing, and disruptions, and makes it possible to test many scenarios in a repeatable way.

The practical example of this talk focuses on simulating `load-balancing algorithms`. Load-balancers are a good example of systems that are hard to evaluate in real environments. Some tested algorithms have no existing implementation, others differ across platforms, and cloud environments introduce many uncontrollable factors such as network latency, cloud noise, and reoccurring background workloads. These factors make fair and consistent testing almost impossible.

The problem is addressed by building an event-based simulation using `SimPy`. The session explains how `SimPy` works by using `Generators` to create the events, and how time and processes interact inside a simulation. An architecture for a practical example for a load-balancer simulation is presented, showing how different components interact and how algorithms can be swapped and compared.

The talk also covers improvements made to the simulation, including a `command-line interface` for easier execution and a `YAML` configuration file for flexible setup. It concludes with practical tips and lessons learned when working with `SimPy`, helping to avoid common pitfalls and improve simulation design.

Overall, the session provides an introduction to simulation as a testing tool, a hands-on example using `SimPy`, and a realistic architecture for building and evolving simulations in Python.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/WDHTQR/</url>
            <location>Titanium [2nd Floor]</location>
            
            <attendee>Niklas</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>MLUK9M@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-MLUK9M</pentabarf:event-slug>
            <pentabarf:title>Why Did The Model Do That? Debugging the Ghost in the Machine</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T154500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Why Did The Model Do That? Debugging the Ghost in the Machine</summary>
            <description>My planned outline for the talk is as follows:

- **Intro and opening hook** (4 mins): A look at a clearly biased model and why &quot;black box&quot; decisions fail to establish trust
- **The XAI Decision Tree** (17 mins):

    * A practical overview of the landscape and walking through the tree: Selecting the right method based on your model and data
     * Mapping these methods to specific Python libraries and frameworks (e.g., `shap`, `lime`, `captum`, `transformers-interpret`, `alibi`, `dalex`, ...)

- **Closing and Take-away** (4 mins)
- **Q&amp;A and Buffer** (5 mins)</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/MLUK9M/</url>
            <location>Titanium [2nd Floor]</location>
            
            <attendee>Cosima Meyer</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>3U3BZH@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-3U3BZH</pentabarf:event-slug>
            <pentabarf:title>Embedding Data Science in IoT devices with MicroPython and emlearn</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T101500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Embedding Data Science in IoT devices with MicroPython and emlearn</summary>
            <description>Typical Internet of Things devices send off most of the data to an external cloud service for analysis.
This causes challenges both in terms privacy, poor reliability under poor connectivity, and loss-of-availability when the service is discontinued.

We would like to show that it is possible to achieve the majority of functionality using a local-first approach, including machine-learning based sensor-data analysis.
And that this can done on low-cost microcontrollers such as ESP32.

This talk will cover how to build stand-alone devices for measuring and analying physical sensor data, using MicroPython. This includes these aspects:

- Measuring the surroundings using sensors
- Connectivity using WiFi
- Data storage using on-board filesystem
- Serving a webui for configuration/control, using Microdot
- Automated data processing/analysis using DSP and ML, with emlearn-micropython
- Enabling interactive data analysis via webui
- Managing concurrency on microcontroller, using asyncio
- Optional integration. Pull using HTTP, and/or push using Webhooks/MQTT

The sensor data will either be accelerometer, sound or images/video (To be Decided).

### About MicroPython

MicroPython is an implementation of Python that runs on practically all microcontrollers with 128kB+ RAM. It provides access to the microcontroller hardware, functions for interacting with sensors and external pheripherals, as well as connectivity options such as WiFi, Ethernet, Bluetooth Low Energy, etc.

While MicroPython can target a very wide range of hardware, we will focus on the Espressif ESP32 family of devices. These are very powerful and affordable, with good WiFi+BLE connectivity support, good open-source toolchains, are very popular both among hobbyist and companies, and have many good ready-to-use hardware development kits.

### About emlearn-micropython

emlearn-micropython is Machine Learning and Digital Signal Processing package for MicroPython, built on top of the emlearn C library. It provides convenient and efficient MicroPython modules, and enables application developers to run efficient Machine Learning models on microcontroller, without having to touch any C code. Compared to pure-Python approaches, the emlearn-micropython models are typically 10-100x faster and smaller.

### Intended audience and expected background

Intended audience: Any developer or data scientist curious about sensor data processing, IoT, and how Python scales down to the smallest of devices.

The audience is expected to have a basic literacy in Python and proficiency in programming. 
Familiarity with microcontrollers and embedded systems is of course an advantage, but the talk should be approachable to those who are new to this area. Familiarity with basic networking and web/browser concepts is an advantage.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/3U3BZH/</url>
            <location>Helium [3rd Floor]</location>
            
            <attendee>Jon Nordby</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>RQTJFS@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-RQTJFS</pentabarf:event-slug>
            <pentabarf:title>How We Built an Inclusive Data Organization: Careers, Community &amp; 50% Women</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T105500</dtstart>
            <dtend>20260416T112500</dtend>
            <duration>003000</duration>
            <summary>How We Built an Inclusive Data Organization: Careers, Community &amp; 50% Women</summary>
            <description>Many organizations aim to grow strong data teams, yet struggle with three connected challenges: unclear career paths, weak internal data communities, and a lack of diversity—especially in senior and technical roles. These challenges are often treated separately, even though they strongly influence one another.

This talk presents a holistic approach to building an inclusive data organization by aligning career development, community building, and diversity goals. The focus is on practical actions and structural choices that can be applied in real-world settings, regardless of company size or industry.

Talk Outline
1. The problem: why data organizations struggle
- Common myths about data careers (linear paths, constant availability, narrow profiles)
- Why diversity efforts often fail in technical teams
- The cost of ignoring community and inclusion: attrition, silos, burnout, and missed talent

2. Career growth beyond linear paths
- Designing career paths that support different life phases and backgrounds
- Recognizing and valuing transferable skills in data roles
- Making progression criteria transparent and fair
- Supporting growth from individual contributor to leadership without forcing a single model

3. Building an internal data science community
- Why internal communities matter for learning, retention, and impact
- Creating spaces for knowledge sharing without gatekeeping
- Encouraging collaboration across roles (data science, engineering, analytics)
- Aligning community activities with business value and technical standards

4. Achieving diversity with intention
- What “50% women in data” actually requires in practice
- Hiring processes that reduce bias while maintaining technical excellence
- Inclusive team structures and ways of working
- Leadership behaviors that support inclusion without tokenism

5. What worked—and what didn’t
- Trade-offs, challenges, and lessons learned
- Why inclusion is a continuous process, not a one-time initiative

6. Actionable takeaways
- Practical steps attendees can apply in their own teams
- Signals to look for when inclusion efforts are working—or failing
- How to start small and scale impact over time</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/RQTJFS/</url>
            <location>Helium [3rd Floor]</location>
            
            <attendee>Xia He-Bleinagel</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>9PBYAP@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-9PBYAP</pentabarf:event-slug>
            <pentabarf:title>Securing AI Agentic Systems: Enforcing Safety Constraints in AI Agent</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T113500</dtstart>
            <dtend></dtend>
            <duration>004500</duration>
            <summary>Securing AI Agentic Systems: Enforcing Safety Constraints in AI Agent</summary>
            <description>AI agents are increasingly used as autonomous systems that can call tools, access data, and take actions in real environments. As these systems gain more autonomy, ensuring their safe and predictable behavior becomes an engineering challenge rather than a prompting problem.

This talk examines how safety constraints can be explicitly enforced in agentic AI systems, instead of relying solely on natural language instructions or model alignment. We will discuss typical safety and security issues that arise in agent based architectures, including over permissioned tools, unintended action chains, goal drift, and unsafe retries.

Using practical Python examples, the talk introduces architectural patterns for constraining agent behavior, such as policy layers, capability based tool access, action budgets, and runtime validation of agent decisions before execution. We will also explore how human in the loop checkpoints and audit logging can be integrated into agent workflows to support safer operation in production environments.

The focus of this session is on practical design and implementation techniques that help developers build AI agents with clearly defined boundaries, making their behavior more controllable, observable, and secure

Through practical Python examples, we will demonstrate how to:

- Design constrained agent architectures
- Enforce tool level permissions and action budgets
- Validate and block unsafe agent actions at runtime
- Combine human-in-the-loop checkpoints with automated controls</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk (long)</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/9PBYAP/</url>
            <location>Helium [3rd Floor]</location>
            
            <attendee>John Robert</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>MS7AWK@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-MS7AWK</pentabarf:event-slug>
            <pentabarf:title>Escape the Hype: Teaching LLM Concepts Through an Interactive AI Factory Game</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T132000</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Escape the Hype: Teaching LLM Concepts Through an Interactive AI Factory Game</summary>
            <description>The gap between AI adoption and AI understanding keeps growing. Teams copy-paste prompts without understanding why they work, vendor materials highlight capabilities over limitations, and the EU AI Act now requires organizations to ensure &quot;a sufficient level of AI literacy among their staff.&quot; Traditional training — documentation, tutorials, talks — isn&#x27;t closing this gap. What&#x27;s missing is embodied learning: touching the parameters, breaking the system, feeling the consequences.

**Our Approach**

We built &quot;AI Factory&quot; — a Python-based educational game where players learn LLM concepts through hands-on challenges. Set in a magical potion factory, players master prompt engineering, guardrails, RAG pipelines, MCP tool orchestration, and multi-agent coordination.

What makes it different from typical AI tutorials:

- Real API calls, not simulations. Players interact with actual LLMs — when they misconfigure guardrails or adjust temperature, they see real consequences that transfer directly to production.
- Budget-driven decisions. Every API call costs in-game currency, forcing the same quality-cost-speed tradeoffs faced in real deployments.
- Progressive disclosure over information dumps. Each game stage reveals one missing piece. The full picture only clicks at the end — and that revelation is the reward.
- Immediate, specific, actionable feedback. Players see results the moment they submit — not just &quot;incorrect,&quot; but a diagnostic breakdown of exactly what went wrong, clear enough to act on and retry.

**What This Talk Covers**

We share concrete design decisions and their outcomes — what worked, what didn&#x27;t, and what surprised us:

- Narrative vs. jargon. How story-driven framing changed the way players understood complex concepts like RAG — without a single slide of theory.
- Constraints as a teaching tool. Why our first budget system backfired, and how a small redesign turned frustration into strategic thinking.
- When to simulate instead of build. Where we replaced real infrastructure with controlled illusions — and why the learning outcome didn&#x27;t suffer.
- One game, many audiences. How players from different backgrounds found completely different entry points into the same levels.
- Scoring on top of non-deterministic AI. How we built a reliable evaluation engine for a system that never gives the same answer twice.

**Who Should Attend this Talk**

This talk is designed for multiple audiences:

- Educators and trainers looking for new approaches to teaching AI concepts
- Team leads responsible for upskilling teams on AI fundamentals — take away a tested approach, not just theory
- Anyone interested in gamification as an approach to technical education</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/MS7AWK/</url>
            <location>Helium [3rd Floor]</location>
            
            <attendee>Vadim Vlasov</attendee>
            
            <attendee>Eric Glaser</attendee>
            
            <attendee>Lisa Amrhein</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>TST9LF@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-TST9LF</pentabarf:event-slug>
            <pentabarf:title>Dynamic Knowledge Graphs</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T140000</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Dynamic Knowledge Graphs</summary>
            <description>Like many organizations, we at VisualVest face the challenge of distributed and constantly evolving knowledge sources. Documentation lives across repositories, internal wikis, JIRA tickets, and various file formats in cloud storage. With ~250 employees making daily changes, our source of truth is highly dynamic. While traditional document-based RAG using semantic embeddings solved some of these pain points, it couldn&#x27;t answer holistic questions or understand relationships between sources, leading us to explore graph-based approaches.

The challenge? Real-world knowledge sources are inherently dynamic. When thinking about information management and retrieval, we cannot ignore this reality if we want to create powerful, machine-readable and actually useful products. Microsoft&#x27;s popular [GraphRAG](https://microsoft.github.io/graphrag/) library [explicitly rejected dynamic features](https://github.com/microsoft/graphrag/issues/429) (like deletion) due to complexity concerns. However, we believe that constantly rebuilding entire graphs isn&#x27;t feasible for production systems.

This talk presents our solution: a truly dynamic knowledge graph with full insertion, query and deletion capabilities. We are also working on reducing the high computational cost of building knowledge graphs. Through caching strategies and small language model fine-tuning, we are trying to minimized both computational effort and strengthen our independence from cloud providers.

What you&#x27;ll learn:
- An industry perspective on the challenges of distributed knowledge sources
- Formal definition and properties of dynamic knowledge graphs
- Our transformation pipeline
  - Experiments with fine-tuned small-language models
- Implementation details:
  - Inserting nodes and edges while preventing ambiguity through similarity matching
  - Tracking information origin across sources
  - Safely deleting documents from the graph without breaking relationships
  - Graph inference strategies

By the end of this talk, you&#x27;ll understand why real-world knowledge graphs should be dynamic, how to build one yourself as well as the limitations and future directions of our approach.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/TST9LF/</url>
            <location>Helium [3rd Floor]</location>
            
            <attendee>Jakob Leander Müller</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>BRCNB7@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-BRCNB7</pentabarf:event-slug>
            <pentabarf:title>(Autism and) The Predictive Brain Theory (in Tech)</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T150500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>(Autism and) The Predictive Brain Theory (in Tech)</summary>
            <description>What does new autism, AI and quantum computing research have in common? We need to stop following the well known &#x27;input - process - output&#x27; model. Let&#x27;s escape this model and embrace the predictive brain theory to understand autism and how people on the spectrum interact with technology and at the workplace.

Recent brain research shows that the brain is not a passive receiver of stimuli, but an active predictor of what will happen next. The brain is constantly building a big model of the world based on past experiences and using this model to predict what will happen next. When we have a big model of the world we know what to expect. If something unexpected happens, the brain receives this as an error and needs to update its model to accommodate the new information. This is not autism or tech specific, this is how the brain works for everyone according to the latest brain research.

A recent hypothesis suggests that people on the autism spectrum often have a harder time building the big model of the world and the stimuli-response system of people on the spectrum is often more sensitive, adding context blindness and the higher energy cost of executive functioning to the mix, it is harder for people on the spectrum to predict what will happen next and to deal with unexpected situations. This can lead to anxiety, stress and burnout.

If your tech job is constantly causing errors in the predictive model of the world, it will be hard to do your job and to be happy at work. In this talk I will explain how the predictive brain theory can help us understand autism and how we can build better technology and workplaces for people on the spectrum.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/BRCNB7/</url>
            <location>Helium [3rd Floor]</location>
            
            <attendee>Dennie Declercq</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>HQBC7R@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-HQBC7R</pentabarf:event-slug>
            <pentabarf:title>Rediscovering single-node processing: When does it make sense to move from Spark to Polars?</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T154500</dtstart>
            <dtend>20260416T161500</dtend>
            <duration>003000</duration>
            <summary>Rediscovering single-node processing: When does it make sense to move from Spark to Polars?</summary>
            <description>Apache Spark is the industry standard for big data processing, rightfully so. But for many data processing applications, a more light-weight solution will work just as well, avoiding Spark&#x27;s compute and configuration overhead. Polars offers such a solution, with a fast single-node processing engine and a syntax that will pose no problems for experienced Spark developers.
I will give a short comparison of Spark and Polars, where they have similarities and differences and show an implementation of a typical ETL and Feature Engineering task in both. I will compare the deployment, performance and cost of the two and, while giving my opinion on the topic, hope to enable you to also make an informed decision on when you want to use Polars and when to use Spark.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/HQBC7R/</url>
            <location>Helium [3rd Floor]</location>
            
            <attendee>Jonas Böer</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>P7NYXB@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-P7NYXB</pentabarf:event-slug>
            <pentabarf:title>Tracking Knowledge Diversity in LLM-Generated Responses.</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T101500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Tracking Knowledge Diversity in LLM-Generated Responses.</summary>
            <description>This talk summarizes our research on how LLMs generate narratives and recurring tropes in real-world information-seeking setups via prompting.

**Talk outline:**
* Knowledge collapse and epistemic diversity: What they mean and why they matter for real-world information access (5 mins).
* Framework overview: How we measure epistemic diversity across LLM outputs (5 mins).
* Experimental design, results: Curating dataset for comparisons across model families, search results, and Wikipedia pages (7 mins).
* Implications for designing LLM-powered systems that preserve information diversity (10 mins)

**Key takeaways for AI practitioners:**
* When can retrieval-augmented generation (RAG) increase diversity?
* Can expanding Wikipedia via translation improve epistemic diversity or reinforce existing tropes?
* What are some open challenges in measuring cultural and contextual diversity in LLM outputs?
* Where are we headed in terms of model sizes, fluency, and breadth of knowledge?

**Useful links:**
* [Our open source framework](https://github.com/dwright37/llm-knowledge)
* [Reproducible Data Hugging Face](https://huggingface.co/datasets/dwright37/llm-knowledge-collapse)
* [Our Research paper](https://arxiv.org/abs/2510.04226)</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/P7NYXB/</url>
            <location>Platinum [2nd Floor]</location>
            
            <attendee>Sarah Masud</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>FP7YN7@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-FP7YN7</pentabarf:event-slug>
            <pentabarf:title>Are we free-threaded ready? Looking at where free-threaded Python fails</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T105500</dtstart>
            <dtend>20260416T112500</dtend>
            <duration>003000</duration>
            <summary>Are we free-threaded ready? Looking at where free-threaded Python fails</summary>
            <description>We begin by exploring the background of free-threaded Python, summarising its origins, current status, and the technical differences distinguishing it from standard Python implementations. A key focus will be examining the compatibility landscape, specifically investigating how many popular third-party libraries are currently prepared for free-threading. We will distinguish between generic pure Python wheels and explicitly free-threaded wheels and I’ll explain how the community can contribute to compatibility verification. 

We then critically discuss free-threaded Python&#x27;s necessity, weighing the disadvantage of increased thread safety concerns (and verification methods) against the promised advantage of speed (including multithreaded profiling).Will free-threaded Python become a critical future direction for the language? How can you contribute? If and how specific projects can immediately benefit from it? Let’s find out together!</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/FP7YN7/</url>
            <location>Platinum [2nd Floor]</location>
            
            <attendee>Cheuk Ting Ho</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>VBPRQR@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-VBPRQR</pentabarf:event-slug>
            <pentabarf:title>Designing and Scaling a Python Library in the Open: Architecture, Automation and Community</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T113500</dtstart>
            <dtend>20260416T122000</dtend>
            <duration>004500</duration>
            <summary>Designing and Scaling a Python Library in the Open: Architecture, Automation and Community</summary>
            <description>Building a Python library that remains reliable, maintainable, and welcoming to contributors is a challenge many projects face as they grow. This session presents ScanAPI as a real-world case study of how thoughtful engineering and automation can support both technical scalability and open source sustainability.

ScanAPI is an open-source Python library that enables automated API integration testing and live documentation using declarative specifications. Distributed via PyPI and actively maintained, the project has been adopted by developers across different contexts and was recognized by GitHub as part of initiatives focused on securing the open source supply chain.

Rather than focusing on abstract best practices, this talk dives into concrete engineering decisions made while designing and maintaining the library.

What we will cover:

1. Designing a Python Library for Growth
- How the codebase is structured to separate configuration, execution, and reporting
- Organizing modules and public APIs to remain stable over time
- Packaging decisions and CLI design for ease of use

2. Using Python Features Effectively
- Configuration-driven workflows with YAML and JSON
- Validation, error handling, and predictable failures
- Type hints and interfaces to improve readability and contributor confidence

3. Automation as a First-Class Concern
- Continuous integration with GitHub Actions
- Unit and integration testing strategies
- Automated releases, versioning, and dependency management

4. Developer Experience and Adoption
- Documentation and live reports as part of the product, not an afterthought
- Lowering the barrier for new users and contributors
- Tooling choices that reduce cognitive load

5. Community and Sustainability
- Contribution guidelines and governance models
- How open collaboration scales better than individual ownership
- The role of the Cumbuca Dev open source community in sustaining the project

By the end of the talk, attendees will have a clear mental model for designing Python libraries that can scale technically and socially. The lessons shared are applicable to anyone maintaining or planning to publish Python libraries, whether in personal projects, companies, or community-driven initiatives.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk (long)</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/VBPRQR/</url>
            <location>Platinum [2nd Floor]</location>
            
            <attendee>Camila Maia</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>FJQXEQ@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-FJQXEQ</pentabarf:event-slug>
            <pentabarf:title>Increase productivity of CNC-machining of aerospace engine parts with Python</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T132000</dtstart>
            <dtend>20260416T135000</dtend>
            <duration>003000</duration>
            <summary>Increase productivity of CNC-machining of aerospace engine parts with Python</summary>
            <description>Python is not exclusively a powerful tool for datascience and web-development. It gains in importance on the shopfloor in industrial production, too. Increasing unit labour costs and the imperative need to reduce energy consumption raises the need to enhance productivity in industrial production in general.

For GKN Aerospace, as the world’s leading tier one aerospace supplier of systems and components, this leads to higher utilization and unmanned operation of a high variety of production processes and CNC-machining tools. On its site in Kongsberg, Norway, GKN produces mainly turbine shafts and casings for civil and military engines, which are used for up to 100.000 flights every day around the globe.

The programming language Python enables fast and forward-thinking development of powerful applications which supports the operators and increases the degree of automation in the next years by concurrent assurance to fulfill demanding quality requirements and to cope the workload.

Standardization of applications is a key factor for increasing robustness of the automated production and reduce maintenance effort. Therefore a standardized interface to NC-controller and PLC’s from the “pre ASCII era” in the 70ies to up-to-date systems had to be found and suitable gateway services had to be developed.

As an example for the usage of Python on the GKN shopfloor in Norway, a standardized in-house developed “Production Execution System”, consisting of a Python backend and a REACT frontend is presented. Connected to more than 25 different machining tools on the shopfloor and additional digital services of the company’s IT environment, the application orchestrates all necessary data on cell-level. It provides data, like NC-programs and part meta-data to the machining tools and additional process information to the operators, necessary to produce a high variety of different engine components.

It enables unmanned production by selecting the next part for machining and commanding part changes to the machine to maximize utilization of the means of production. Furthermore, it reports back collected data for usage by other processes downstream in production.

Cloud-based and due to its restart capability in running production, bugfixes and implementation of new features can be carried out “on-the-fly”. Thereby automated unit- and integration testing for the core functionalities ensure robustness for the high variety of used machining tools.

The development and usage of the “Production Execution System” on GKN Aerospace’s shopfloor are an excellent example for the increasing importance of the programming language Python to ensures highest quality of engine parts in a work environment of increasing digitalization and workload.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/FJQXEQ/</url>
            <location>Platinum [2nd Floor]</location>
            
            <attendee>Nico Buhl</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>GMNE3E@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-GMNE3E</pentabarf:event-slug>
            <pentabarf:title>Making bad CLIs fun with Small Language Models</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T140000</dtstart>
            <dtend>20260416T143000</dtend>
            <duration>003000</duration>
            <summary>Making bad CLIs fun with Small Language Models</summary>
            <description>I&#x27;ve often had to rely on a poorly designed home-grown CLI, leading to frustration due to constantly forgetting argument names and allowable values. While Large Language Models (LLMs) initially appeared to be an ideal fix, their limitations quickly became evident, suggesting the need for a more efficient approach.

To begin, we&#x27;ll have a look at what makes CLIs hard to use and articulate why LLMs fall short in addressing them. Following this, we&#x27;ll examine the process of generating synthetic data tailored for any CLI, whether it&#x27;s proprietary or open-source. Then, I&#x27;ll show you how to use this synthetic dataset to fine-tune a Small Language Model on your laptop or in the cloud. We will use the smallest variant of Google&#x27;s Gemma 3 models, which boasts a lean 270 million parameters, to transform natural language instructions into actionable CLI commands.

Lastly, I&#x27;ll share benchmark results to illustrate that these models can operate smoothly on various machines without needing API keys or GPUs, showcasing their robust capability and practical deployment potential.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/GMNE3E/</url>
            <location>Platinum [2nd Floor]</location>
            
            <attendee>Moritz Bauer</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>MQJVFU@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-MQJVFU</pentabarf:event-slug>
            <pentabarf:title>AI Evals Done Right: From Vibes to Confident Decisions</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T150500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>AI Evals Done Right: From Vibes to Confident Decisions</summary>
            <description>Testing traditional software is &quot;simple&quot;... same input, same output. LLMs? Not so much. Same prompt, different result every time. So how do you actually know if your AI product is good?

Spoiler: Most teams don&#x27;t. They ship on vibes and hope for the best.

This talk takes you through our real journey at Blue Yonder, where we built an LLM-powered analytics system and needed a way to actually measure its quality. You&#x27;ll see how we went from &quot;feels okay-ish&quot; to concrete numbers that let us make real decisions - with actual examples from production along the way.

The methodology is called Error Analysis: collect traces, annotate them from the user&#x27;s perspective, group similar issues into failure modes, and turn those into automated evals. Along the way, we&#x27;ll share practical best practices like why binary Pass/Fail beats rating scales, and why 100% pass rate means your evals are broken.

The payoff? When a new model drops, we run our pipeline and know within hours - not weeks - whether it&#x27;s better or worse for our specific use case. Real percentages. Real trade-offs. Real decisions.

Expect a meme-powered walkthrough and a clear path to implement this yourself starting with just 20 traces.

Outline:
- Introduction: The challenge of testing stochastic systems, why we needed a better approach
- Collecting and Annotating Traces: Every trace is a user experiencing your product, Open Coding from the user perspective, real examples of failure modes we discovered
- Building the Failure Taxonomy: Grouping observations into categories, Axial Coding, turning scattered comments into actionable failure modes
- Writing Evals That Work: LLM-as-judge setup, binary scores vs rating scales, validating against human judgment
- From Vibes to Decisions: Prioritizing what to fix, measuring improvement, 24-hour model benchmarking
- Wrap-up: Your action plan, start with 20 traces</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/MQJVFU/</url>
            <location>Platinum [2nd Floor]</location>
            
            <attendee>Martin Seeler</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>NAHX3L@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-NAHX3L</pentabarf:event-slug>
            <pentabarf:title>Restaurants around train stations are bad and I can prove it</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T154500</dtstart>
            <dtend>20260416T161500</dtend>
            <duration>003000</duration>
            <summary>Restaurants around train stations are bad and I can prove it</summary>
            <description>Does the quality of restaurants degrade with your proximity to a train station? And which German town is worst for the hungry traveller? In this culinary data exploration, we used publicly accessible data to assess whether busy train stations correlate with lower restaurant ratings - and which towns are actually the worst. Using the Google Maps API and the hottest framework for data manipulation, polars, we give an overview over publicly available data resources and show how far you can get with them.

Of course, this talk will also deliver all the cold hard food facts: Analyzing the data of over 10,000 restaurants in Germany and worldwide, we will present the best and worst dining options available at train stations. We compare urban and rural environments, examine the impact of chain stores, and provide practical advice for you, the hungry traveler.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/NAHX3L/</url>
            <location>Platinum [2nd Floor]</location>
            
            <attendee>Dennis Schulz</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>EWZMJK@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-EWZMJK</pentabarf:event-slug>
            <pentabarf:title>Don’t call your LLM too often! How to build your dialog graph with confidence and sleep at night.</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T101500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Don’t call your LLM too often! How to build your dialog graph with confidence and sleep at night.</summary>
            <description>Building reliable dialog flows for LLM-based conversational systems remains difficult once interactions move beyond linear question–answer patterns. While early prototypes often rely on prompt chains, real-world systems quickly require branching, correction, clarification, and multi-step reasoning. At this stage, dialog logic implicitly turns into a graph, yet is still implemented and reasoned about as a sequence. This mismatch leads to structural problems that are hard to detect without explicit modeling and observability.

Complex document retrieval systems are not born out of theoretical itch. We’ll exemplify practical problems framing them around the following practical use case from the area of electricity/power production.

*Use Case: Aladdin and the Case of the Almost-Exploding Power Plant*

Rick and Morty are operations engineers at a large electrical power plant. Every single day, they face the same heroic challenge: too many documents, too little clarity.

The technical staff produces a constant stream of operational reports: free-text summaries describing the health and performance of steam generators. These reports are rich in knowledge, but poor in structure. Rick’s daily ritual is to read, compare, and summarize them, trying to predict which units will soon need maintenance. If he gets it right, the plant saves money by avoiding unnecessary service routines which are prescribed by regular maintenance guidelines. If he gets it wrong… well, let’s just say steam generators have a dramatic way of expressing dissatisfaction.

But unstructured reports are only one part of the story. Alongside them exists a well-behaved, structured world: databases containing results of regular, non-invasive ultrasonic inspections of pipelines, used to track corrosion development over time. Morty has built a quantitative model that predicts the probability (and timing out of this probability) of a pipeline rupture based on these corrosion measurements.

Naturally, Rick and Morty want everything. They want one system that can: 1) Understand messy human-written reports, 2) Reason over numerical corrosion models, and 2) Answer simple document questions without investing into unnecessary intelligence.


Thus, the system Aladdin is born.

Aladdin combines three very different subsystems:

  - An agentic indexing component, which dynamically builds a search index for a GraphRAG over heterogeneous documents, given a pre-defined graph structure.
  - An autonomous analytical agent, which evaluates pipeline failure probabilities using Morty’s quantitative corrosion model.
  - A lightweight text-based RAG, backed by a vector index, for fast and simple document retrieval.

But what is the challenge? Once these components start talking to each other, the dialog graph becomes unpredictable. Execution paths depend heavily on what information is actually present in the documents. And this is something that cannot be fully reasoned about in advance. Loops appear, branches explode, and theoretically “clean” dialog designs fail in practice.

This use case illustrates why observability, tracing, and empirical optimization of dialog graphs are essential when building real-world document retrieval systems for industrial environments. Especially when Rick just wants a straight answer and Morty really doesn’t want another pipeline incident on his watch.

Given this use case we will exemplify several structural pathologic cases in the dialog graph which we observed in the practice and for which we found curative approaches.

**Non-ending loops in the dialog graph**
A frequent failure mode is the emergence of endless circular dialog graphs. Typical examples include:
  - correction loops (“Please rephrase your input” → user rephrases → validation fails again → same prompt),
  - clarification cycles (“What do you mean by X?” → partial answer → same clarification),
  - fallback loops where a generic catch-all path routes the conversation back to an earlier state without introducing new information.

Such cycles are rarely intentional; they arise from local fixes applied over time and are difficult to identify by prompt inspection alone. In production, they manifest as stalled conversations, increased latency, rising token costs, and user frustration.

Beyond circularity, several other structural pathologies commonly appear in document retrieval systems.

**Dead subpaths after non-matching branching conditions**

Dialog graphs often include branches guarded by semantic or data-dependent conditions, but changes in document structure, embeddings, or preprocessing can make these conditions unsatisfiable, creating dead subpaths that are never executed. These paths are dangerous because they give a false sense of coverage, increase maintenance and reasoning complexity, and in production often manifest as mysterious fallback behavior where the system always takes a default route instead of a specialized one.

**Redundant validation and re-validation steps**

Another common issue is redundant validation, where the same or equivalent checks are performed multiple times along a single dialog path. This often happens when validation logic is added defensively at multiple layers: once at input parsing, again before retrieval, and again before response generation. While each validation step may seem harmless in isolation, their combination leads to inflated dialog depth, unnecessary latency, and increased cognitive load when analyzing traces. Worse, slight inconsistencies between validation prompts can produce contradictory outcomes, for example, an input being accepted in one step and rejected in the next.

**Overly generic catch-all branches**

Catch-all branches are often introduced as a safety mechanism: a “default” path that handles unexpected input or retrieval failure. Over time, however, these branches tend to grow in scope and responsibility, eventually becoming overly generic handlers that do everything. Such branches blur the distinction between genuinely exceptional situations and routine cases. As more logic is added to the catch-all path, it becomes harder to reason about what the system is actually responding to. Specialized logic may be silently bypassed, while unrelated scenarios are forced through the same generic response strategy.

**Linear sequences that should be collapsed**

Many dialog graphs contain long linear chains of nodes with no branching, no state changes, and no observable side effects between steps. These sequences often originate from iterative prompt development, where small transformations are added one by one (“extract entities” → “normalize entities” → “rephrase query” → “check relevance”). While conceptually clean, such linear chains are rarely optimal. They increase token usage, latency, and the number of failure points, without adding expressive power. More importantly, they obscure the true logical structure of the system: what could be a single semantic transformation is spread across multiple opaque steps.

An additional aspect of an overcomplicated dialog graph - especially baked by an autonomous agent - are barely predictable costs. Autonomous parts of the system need a very tight observability net to stay under control and not to burst cost prediction by an order of magnitude.

Working within a specifically regulated environment of a power plant posts additional restrictions on the explainability of the results. Every fact must be trackable to the source of the information and model hallucinations must be recognized in the very early step. 

All the above requirements result in a setup which is heavily based on an LLM Operating Platform like Langfuse. 

When combined with dialog-oriented orchestration frameworks such as Langflow, experiment tracking extends from single calls to full conversational trajectories. Complete dialog traces expose path stability, node utilization, dead branches, fallback prevalence, and user-facing metrics such as turns to resolution or correction-loop repetition.

Over time, this empirical evidence replaces design-time assumptions. Dialog paths are merged or removed based on observed execution rather than theoretical intent, with unreachable branches, redundant validations, and unstable loops revealed directly through trace analysis. Dialog graph optimization thus becomes a continuous, reproducible process grounded in measured behavior.

This talk proposes an engineering-oriented approach that models conversational logic as explicit dialog graphs and treats execution traces as first-class data. Using Langfuse instrumentation, developers can analyze concrete execution paths—branch frequency, loop formation, latency hotspots—and compare alternative graph designs through aggregated metrics and A/B testing, enabling systematic optimization based on evidence rather than intuition.

**To sum up:** using concrete production-oriented examples, the talk shows how graph-based dialog design improves multi-step retrieval, explainability, and robustness across languages. Endless correction loops are detected and eliminated, dead branches are pruned, and overly generic catch-all paths are replaced with targeted recovery strategies. The overall message is that scalable conversational systems require not just better prompts or larger models, but explicit dialog graphs combined with rigorous tracing and data-driven optimization.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/EWZMJK/</url>
            <location>Europium [3rd Floor]</location>
            
            <attendee>Evgeniya Ovchinnikova</attendee>
            
            <attendee>Andrei Beliankou</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>AWMRFD@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-AWMRFD</pentabarf:event-slug>
            <pentabarf:title>Vibe NLP for Applied NLP</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T105500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Vibe NLP for Applied NLP</summary>
            <description>One of the hardest parts of applied NLP has always been breaking down complex business problems into machine learning components. It&#x27;s so hard because it requires domain expertise and reasoning about the specific use case, and it&#x27;s the one thing technology couldn&#x27;t fix. But what if we could take some of the learnings from AI-powered coding assistants and apply them to solving real-world NLP problems? In this talk, I&#x27;ll show how we&#x27;ve built powerful assistants and tools to help developers solve NLP tasks using open-source software, and create modular solutions that are small, fast and fully data-private.

At the core of it is an often overlooked idea: using LLMs to *build systems* instead of *as systems*. AI-powered coding assistants have transformed the way we build software – and they can be even more impactful for AI development itself and bridge the experience gap that&#x27;s often holding teams back and causing projects to fail. In the talk, I will show you a new way of using generative models for AI development, and some practical examples of how to make &quot;Vibe NLP&quot; work for real-world problems</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/AWMRFD/</url>
            <location>Europium [3rd Floor]</location>
            
            <attendee>Ines Montani</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>FT7V39@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-FT7V39</pentabarf:event-slug>
            <pentabarf:title>From Row-Wise to Columnar: Speeding Up PySpark UDFs with Arrow and Polars</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T113500</dtstart>
            <dtend></dtend>
            <duration>004500</duration>
            <summary>From Row-Wise to Columnar: Speeding Up PySpark UDFs with Arrow and Polars</summary>
            <description>Objective
Demonstrate how to accelerate UDF-heavy PySpark workloads by switching from row-wise execution to Arrow-backed columnar execution, using Polars for fast, maintainable column transformations and table transformations.

Key Takeways
- How Arrow is being used in PySpark for batched, columnar data exchange
- Why Polars helps: a higher-level DataFrame API plus Arrow interoperability that can often reuse Arrow buffers
- How to design fast column transformations (column in → column out) and fast table transformations (batch/table in → batch/table out).
- Benchmarks and tradeoffs across scalar UDFs, Pandas UDFs, Arrow-native UDFs, and Polars-based Arrow table transforms on real-world examples.

Audience
- Data engineers and data scientists working with PySpark at scale
- Engineers seeking concrete strategies to optimize spark pipelines that rely on Python UDFs

Knowledge Expected
- Familiarity with PySpark DataFrames and UDFs
- Basic understanding of Spark execution helps but is not required
- Exposure to Polars/Arrow is not required but might be beneficial</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk (long)</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/FT7V39/</url>
            <location>Europium [3rd Floor]</location>
            
            <attendee>Aimilios Tsouvelekakis</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>AAY8KQ@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-AAY8KQ</pentabarf:event-slug>
            <pentabarf:title>Simplicity Scales: Rewriting to a Django Monolith and Monorepo</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T132000</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Simplicity Scales: Rewriting to a Django Monolith and Monorepo</summary>
            <description>When working on older codebases most developers encounter the question of &quot;Should we fix this or rewrite it completely?&quot;. Weeklong releases, multi-day bug hunts and a very obvious impact of tech debt on developer happiness and development velocity led us to ask that question in a very general way. We were wondering how a better approach to a Python-based infrastructure could look when working in a multi-disciplinary startup environment. We believe that many developers have been in a similar situation and we would like to introduce our holistic take on a when and how to refactor older codebases.

Our solution to these problems consists of two main changes: moving the entire code of various teams in a uv-based monorepo and questioning a lot of technical decisions of the past under the paradigm of &quot;Can this be done simpler or can we rephrase the problem to solve it with an existing technical solution?&quot;.  We will share our insights into how a multi-language monorepo approach can work at a startup where full-stack and ML practitioners work on the same code base. This includes going over standardized procedures (e.g. code quality) that are shared between all teams.

Furthermore we will discuss some of the high-level decisions and introduce our reasoning within the available options of the python ecosystem. This includes for example why for us a monolithic approach based on Django works better than a Flask based microservice solution.

The goal of this talk is to give the listener an introduction to our solutions and make it easy to draw parallels to their own situation or problems. Attendees will leave with an insight into our decision framework and we will show what metrics we used to validate the success of our refactoring and technical choices.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/AAY8KQ/</url>
            <location>Europium [3rd Floor]</location>
            
            <attendee>Bruno Vollmer</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>LVJXK3@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-LVJXK3</pentabarf:event-slug>
            <pentabarf:title>Using Sensor Fusion and ML to Navigate Underground When GPS Fails</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T140000</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Using Sensor Fusion and ML to Navigate Underground When GPS Fails</summary>
            <description>In the twisting vaults of a subway, metro, or U-Bahn, there’s often no reliable cell service, wifi, or GPS. Which means riders had no good way of keeping track of their stops or ETA when underground.
After collecting extensive ground truth data, we trained a motion classifier using the phone&#x27;s accelerometer to identify a moving train. This prediction is fed into a location model that combines it with the train schedule to estimate a location, even when GPS fails. We cover our unique data pipeline, feature engineering, and the optimization for high-scale, offline edge deployment to millions of users.

Attendees will gain from the lessons learned developing a sensor fusion ML system for offline use in smartphones

##### Data Collection &amp; Annotation
Strategies for gathering high-quality, labeled &quot;ground truth&quot;, especially in cases where the labels can&#x27;t be inferred by human annotators after the fact

##### The ML Pipeline
Hyperparameter tuning of a convolutional neural network (CNN)
Building a multi-stage training regimen, to leverage different datasets

##### UX
Presenting predictions to users in a way that expresses uncertainty when necessary, and inspires confidence when justified. We want users to forget GPS doesn&#x27;t work underground.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/LVJXK3/</url>
            <location>Europium [3rd Floor]</location>
            
            <attendee>Étienne Tremblay</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>3C9P9V@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-3C9P9V</pentabarf:event-slug>
            <pentabarf:title>Is my AI Recruiting biased? - How to evaluate these systems</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T150500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Is my AI Recruiting biased? - How to evaluate these systems</summary>
            <description>AI recruiting systems are rapidly reshaping talent acquisition by automating candidate filtering, ranking, and selection. However, their growing influence raises critical concerns around fairness, robustness, and decision transparency. This talk introduces a practical testing methodology for evaluating AI recruiting pipelines beyond traditional accuracy metrics.

We will examine how synthetic data and augmentation techniques can expose hidden weaknesses, improve coverage, and stress-test edge cases. The talk will address the role of proxy variables, why they matter, and how they can help uncover unintended model behavior. We will also explore fairness measurement strategies, including individual and group fairness metrics, and discuss how these approaches reveal structural bias in ranking and scoring outcomes.

Because parts of the evaluation process can be automated, the session will demonstrate how Python-based agents and LLM “referees” can assist in generating and augmenting CVs and certificates, validating predictions, and assessing explanation quality. This automation can accelerate workflows, increase reproducibility, and reduce human error.

Participants will walk through a complete testing pipeline, supported by insights from real-world projects that illustrate how different tools and strategies expose systemic risks and guide mitigation. Attendees will leave with practical techniques to make recruiting systems more reliable, transparent, and trustworthy in real deployment contexts.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/3C9P9V/</url>
            <location>Europium [3rd Floor]</location>
            
            <attendee>Sebastian Krauss</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>NN7CVP@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-NN7CVP</pentabarf:event-slug>
            <pentabarf:title>Post-Processing and Visualization of Astrophysical Data with PyPLUTO</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T154500</dtstart>
            <dtend>20260416T161500</dtend>
            <duration>003000</duration>
            <summary>Post-Processing and Visualization of Astrophysical Data with PyPLUTO</summary>
            <description>Numerical simulations often generate vast amounts of structured data; yet, extracting insights from these outputs remains a major challenge. Analysis is frequently performed through fragmented, ad-hoc scripts that are difficult to maintain, reuse, or reproduce. **PyPLUTO** is a Python package designed to address this gap by providing a clear and flexible interface for post-processing, analyzing, and visualizing simulation data produced by the **PLUTO** code for computational astrophysics.

This talk presents **PyPLUTO** as a case study in building lightweight, domain-specific scientific tools on top of the Python scientific ecosystem. The emphasis is on offline analysis and visualisation workflows that operate on completed simulation outputs, enabling efficient exploration, comparison, and communication of results. Rather than coupling visualisation to simulation runtime, **PyPLUTO** focuses on clarity, composability, and integration with established PyData libraries.

Through concrete examples, the session demonstrates how structured simulation data can be processed and visualised using tools such as NumPy and Matplotlib. Attendees will learn how Python-based workflows can replace scattered analysis scripts, how visualization supports rapid scientific insight, and how a clean separation between simulation and analysis enhances reproducibility and productivity.

## Outline

#### 1. **From Simulation Output to Insight**
- Common challenges in post-processing large numerical simulations
- The gap between raw data and scientific interpretation
- Why offline analysis and visualisation remain essential

#### 2. **PyPLUTO: Scope and Design**
- What PyPLUTO does and the problems it targets
- Design goals: simplicity, flexibility, and interoperability
- Clear separation between simulation execution and analysis

#### 3. **Working with Simulation Data**
- Loading and organising structured simulation outputs
- Handling scalar and vector fields across space and time
- Typical post-processing tasks and analysis patterns

#### 4. **Visualisation Workflows**
- Exploratory plots and diagnostic views
- Time evolution and comparison between simulations
- Producing publication-quality figures with Matplotlib

#### 5. **Interactive GUI for Post-Processing**
- Lightweight graphical interfaces for exploring simulation data
- Interactive selection of fields, slices, and time steps
- GUI as a complement to scripting, not a replacement

#### 6. **Integration with the Python Ecosystem**
- Efficient data handling with NumPy
- Interoperability with existing scientific Python tools
- Benefits of building on established libraries

#### 7. **Software Design Lessons**
- Building user-friendly scientific APIs
- Balancing usability, transparency, and performance

#### 8. **Broader Applicability and Outlook**
- Relevance to other simulation-heavy fields
- Reusable patterns for Python-based post-processing
- Future directions and potential extensions

The talk is aimed at scientists, data practitioners, and Python developers interested in scientific visualisation and simulation data analysis. No background in astrophysics or PLUTO is required; the focus is on workflows, tools, and design principles applicable across the PyData community.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/NN7CVP/</url>
            <location>Europium [3rd Floor]</location>
            
            <attendee>Giancarlo Mattia</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>Q9HMT3@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-Q9HMT3</pentabarf:event-slug>
            <pentabarf:title>Schema-Driven Lambdaliths in Python with AWS Lambda Powertools and Pydantic</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T105500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Schema-Driven Lambdaliths in Python with AWS Lambda Powertools and Pydantic</summary>
            <description>The rise of modern web frameworks such as Hono has brought increased attention to schema-driven development and the “Lambdalith” architecture, where an application is delivered through a single Lambda function. These approaches offer a highly streamlined developer experience, but many existing Python-based systems struggle to achieve the same level of consistency, validation, and maintainability.

In Python, closing this gap often means introducing additional execution layers outside the language itself. When frameworks designed around web servers and request lifecycles are deployed on AWS Lambda, they typically require ASGI adapters, web adapters, or container-based runtimes. While powerful, these layers can make it harder to focus on what many teams actually want: writing clear, minimal Python handlers with explicit data boundaries.

This talk explores how combining AWS Lambda Powertools and Pydantic can close that gap and enable a modern, predictable development workflow—even in established Python ecosystems. Drawing from real-world product use cases, we will examine how these tools can simplify handler-level logic, standardize request and response validation, and improve observability and error handling.

Lambda Powertools provides far more than logging and metrics: it includes utilities for structured tracing, data parsing, idempotency, typed configuration, and other features that bring Python serverless development closer to the ergonomics of newer frameworks. When paired with Pydantic, developers can enforce clear data contracts, reduce boilerplate, and achieve stronger guarantees around application behavior.

Attendees will learn practical patterns for improving quality and productivity in Lambda-based applications, including how to:

- Validate event payloads and responses using Pydantic models
- Implement consistent error handling strategies
- Structure a Lambdalith-style architecture in Python
- Leverage Powertools utilities to enhance reliability and developer experience

This session will be valuable for Python developers who want to apply schema-driven design principles, modernize existing serverless codebases, or build more maintainable Lambda applications with confidence.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/Q9HMT3/</url>
            <location>Palladium [2nd Floor]</location>
            
            <attendee>Tanio Toranosuke</attendee>
            
            <attendee>Haruto Mori</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>GAUNKM@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-GAUNKM</pentabarf:event-slug>
            <pentabarf:title>When LLMs Are Too Big: Building Cost-Efficient High-Throughput ML Systems for E-Commerce Cataloging</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T113500</dtstart>
            <dtend></dtend>
            <duration>004500</duration>
            <summary>When LLMs Are Too Big: Building Cost-Efficient High-Throughput ML Systems for E-Commerce Cataloging</summary>
            <description>## When LLMs Are Too Big: Building Cost-Efficient High-Throughput Machine Learning for Cataloging in E-Commerce

 

idealo.de offers a price comparison service for over 5.7 million products from a wide variety of over thousands of categories. It navigates a dynamic, constantly changing billion-scale landscape with **over 2 billion offers from 50,000+ shops in 6 countries**. Our central challenge is cataloging this huge amount of offers automatically at scale, with a peak throughput of **processing 4.8 million offers per minute.** 

 

While modern large language models (LLMs) excel in such tasks, they do not scale well to huge amounts of data. To fulfill business needs, we need to strike a balance between processing speed and offer cataloging quality. By employing modern machine learning techniques to extract specialist knowledge from downscaled state-of-the-art LLMs and a multitude of performance enhancing techniques we speed up idealo’s processing while massively improving cataloging performance. This talk presents how these solutions find the balance between cost and performance and how they integrate into idealo’s offer cataloging pipelines. 

  

 

### What makes this approach unique?

Our solution and practical experiences in the area of high-throughput classification are presented. This includes the operational aspects of our system, in particular the design of a stable and high-performance MLOps lifecycle integrated into our CI/CD and continuous Training pipelines. Where we automate continuous data sampling, model training, model deployments, and monitoring. 

Concrete solutions and best practices are discussed that demonstrate how our model accuracy of the multilingual MiniLM transformer encoder model is improved through knowledge distillation by a large e5 instruction transformer. Additionally, we show how the integration of these models on specialized hardware like AWS Neuron enables strict runtime and latency requirements to be met in a cost-efficient manner. 

In detail we will discuss the following topics: 

* Machine Learning Operation Lifecyle for a high-throughput category classification system. 
* Challenges when creating training and testing datasets from the huge amount of existing massively unbalanced data efficiently. 
* Selecting the right model in presence of the current encoder language model zoo. 
* Using knowledge distillation via student-teacher models to balance required compute and classification performance. 
* Integrating quantization techniques for speed improvements. 
* Selecting ideal compute instances for our production environment. 
* How to compile the model on custom designed machine learning accelerators using the neuron package. 

 

 

### Key takeaways for attendees:

* An overview of months of research and exploration for massive throughput environments including their practical integration in live systems. 
* Modern machine learning systems in production, especially with billion-scale data, need to carefully balance business needs in terms of cost and quality. 
* State-of-the-art LLMs are often not feasible for large-scale tasks. However, new machine learning techniques can extract their knowledge for specific applications. 
* How to transition research findings to production. 

 

 

The talk will be aligned along our tech stack, which includes PyTorch, PyTorch Lightning, Huggingface, AWS Sagemaker, AWS Neuron SDK, Grafana Loki, Docker and GitHub Actions.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk (long)</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/GAUNKM/</url>
            <location>Palladium [2nd Floor]</location>
            
            <attendee>Tobias Senst</attendee>
            
            <attendee>Bastian Wandt</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>NDZSSB@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-NDZSSB</pentabarf:event-slug>
            <pentabarf:title>Free T(h)r(e)ading: A Trading Systems Journey Beyond the GIL</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T132000</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Free T(h)r(e)ading: A Trading Systems Journey Beyond the GIL</summary>
            <description>The release of Python 3.13 with experimental free-threaded mode (PEP 703) represents a fundamental shift in Python&#x27;s concurrency model. For decades, the Global Interpreter Lock has dictated how we write concurrent Python code, pushing developers toward async/await patterns for I/O-bound workloads and multiprocessing for CPU-bound tasks. But what happens when we remove that constraint?

We designed a research experiment to answer this question empirically: take a production trading algorithm built on asyncio, migrate it to free threading, and measure everything. Trading systems make ideal subjects for this research—they&#x27;re latency-sensitive, handle multiple concurrent data streams, perform both I/O and CPU-bound operations, and have clear, quantifiable performance metrics.

This talk presents our complete research journey, from initial hypothesis to validated conclusions, sharing both our methodology and findings.

Detailed Outline:
1. Research Question &amp; Motivation (3 minutes)

The research question: can a trading algorithm benefit from true parallelism?
Why trading systems make ideal experimental subjects
Initial hypotheses about performance characteristics
Baseline system: async architecture and performance profile

2. Experimental Design (4 minutes)

Migration approach
Benchmarking framework
Workload simulation
Control variables and isolation of I/O vs. CPU-bound operations

3. Migration Journey (5 minutes)

Architectural transformation
Key refactoring patterns and synchronization strategies
Thread safety challenges
Library ecosystem compatibility findings

4. Results &amp; Discoveries (8 minutes)

Performance data: latency, throughput, and resource utilization
Workload analysis: where free threading won, where async remained competitive
Visual data presentation: charts and comparative analysis

5. Practical Implications (4 minutes)

Decision framework: when to choose free threading over async
Migration best practices and lessons learned
Production readiness assessment
What this means for Python&#x27;s concurrent future

6. Q&amp;A (5 minutes)

Prerequisites - This talk assumes attendees have:

- Strong understanding of Python&#x27;s concurrency models (asyncio, threading, multiprocessing)
- Familiarity with the GIL and its implications
- Basic understanding of systems programming concepts (thread safety, synchronization)</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/NDZSSB/</url>
            <location>Palladium [2nd Floor]</location>
            
            <attendee>Tim Kreitner</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>VUHSG9@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-VUHSG9</pentabarf:event-slug>
            <pentabarf:title>Letting AI Move: Robotics Demos Powered by Python</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T140000</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Letting AI Move: Robotics Demos Powered by Python</summary>
            <description>Artificial intelligence can be difficult to explain, especially to people outside of tech. We often rely on slides, diagrams, or on-screen demos, but the real impact does not always stick. For people encountering AI for the first time—such as students or non-technical audiences—AI terminology and concepts can remain abstract and disconnected from real-world experience.

Robots can help change that. When AI controls a physical system, its behavior becomes visible, tangible, and easier to reason about. In this talk, we explore how Python and playful robotics experiments can be used to make AI more concrete, interactive, and engaging. Using the Hugging Face Reachy Mini robot as a case study, we show how physical interaction can turn abstract AI concepts into intuitive, memorable experiences.

The perspective of this talk is intentionally non-traditional: we started with no prior knowledge of robotics or mechanics and approached the problem purely from a Python developer’s point of view. This journey strongly shapes the talk. Rather than focusing on advanced robotics engineering, the emphasis is on accessibility, experimentation, and learning by doing. The goal is to show that robotics can be an approachable medium for explaining AI, even for people without a hardware or engineering background.

During the talk, we walk through basic building blocks such as movement, gestures, and simple interaction patterns, and show how AI-driven behavior can be layered on top of them using familiar Python tools. We share examples from real experiments and demos, including what worked well, what failed, and what we learned from unexpected behavior in live settings.

Importantly, this is not a product demo or a hardware-specific tutorial. While Reachy Mini is used as a concrete example, the focus is on transferable ideas and design patterns:

How physical interaction changes the way people perceive AI

How Python lowers the barrier to experimenting with robotics

How to design demos that invite curiosity rather than intimidation

How to make AI systems easier to explain in educational and outreach contexts

Attendees do not need access to a robot to benefit from this talk. The lessons and patterns discussed can be applied to a wide range of settings, including classrooms, workshops, meetups, and public demonstrations.

This talk is aimed at Python beginners and intermediate developers, especially educators and anyone who regularly needs to explain or demonstrate AI to others. Attendees will leave with new ideas, inspiration, and practical approaches for making AI more tangible, engaging, and human-centered.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/VUHSG9/</url>
            <location>Palladium [2nd Floor]</location>
            
            <attendee>Larissa Haas</attendee>
            
            <attendee>Annika Herbert</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>QMBEZX@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-QMBEZX</pentabarf:event-slug>
            <pentabarf:title>Is digital sovereignty a new buzzword in AI development?</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T150500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Is digital sovereignty a new buzzword in AI development?</summary>
            <description>AI development typically prioritises feasibility and implementation. While solutions should be efficient, high-performing and scalable, sovereignty and data security are often overlooked. These issues tend to be overlooked when solutions are being found, even though we don&#x27;t use AI as an end in itself, but rather to benefit or support our customers. Customers operate within a regulatory framework and rely on responsible technology. 
Rather than seeing regulation as a hindrance, we should view it as an opportunity to drive innovation and create sustainable, trustworthy solutions. However, this is only possible if we understand the full meaning of sovereignty. 
This presentation will explore the various aspects of the term &#x27;sovereignty&#x27; and its potential impact on AI projects. We will discuss current examples from politics and development to identify best practices for secure data processing.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/QMBEZX/</url>
            <location>Palladium [2nd Floor]</location>
            
            <attendee>Dr. Maria Börner</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>BRRHGY@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-BRRHGY</pentabarf:event-slug>
            <pentabarf:title>On Interventional Generalisation</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T154500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>On Interventional Generalisation</summary>
            <description>If I do X instead of Y, will I get the outcome I want (in a novel situation)? Making predictions alone is pointless, one wants to act in the world. Furthermore one must act in situations that are similar but different to all past situations. The real underlying goal of all decision making is interventional generalisation: the ability to evaluate hypothetical choices in new unseen situations.

This talk covers this history and problems of null hypothesis significance testing, the benefits (and limitations) of Bayesian reasoning. Introduces the basics of Pearl-ian causality theory and its treatment of interventions and counter-factuals (things that hypothetically could have happened, but didn&#x27;t), finally we discuss the next step, interventional generalisation, that is being able to compare the value of hypothetical interventions in new unseen situations. Decisively improve your modelling practically and conceptually with the mental tools in this talk.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/BRRHGY/</url>
            <location>Palladium [2nd Floor]</location>
            
            <attendee>Andy Kitchen</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>GPJGH3@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-GPJGH3</pentabarf:event-slug>
            <pentabarf:title>Building reliable data pipelines with polars and dataframely</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T101500</dtstart>
            <dtend></dtend>
            <duration>013000</duration>
            <summary>Building reliable data pipelines with polars and dataframely</summary>
            <description>**Note for attendees: Please check out the [git repository](https://github.com/Quantco/tutorial-pycon26-polars-dataframely) and follow the simple setup steps in the `README`, ideally before the tutorial.** 

In this tutorial, you will become familiar with `polars` basics by writing a simple pipeline: you will read data, transform it to make it ready for use, and you will learn how to do that fast. With `dataframely` schemas, you will upgrade your code from &quot;it works&quot; to &quot;it&#x27;s beautiful!&quot;, and along the way, `dataframely` will help you eliminate entire classes of bugs you will never have to think about again. After the tutorial, you will be all set to use these tools in your own work.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Tutorial</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/GPJGH3/</url>
            <location>Ferrum [2nd Floor]</location>
            
            <attendee>Oliver Borchert</attendee>
            
            <attendee>Andreas Albert</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>GCGLPN@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-GCGLPN</pentabarf:event-slug>
            <pentabarf:title>pytest tips and tricks for a better testsuite</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T131000</dtstart>
            <dtend>20260416T144000</dtend>
            <duration>013000</duration>
            <summary>pytest tips and tricks for a better testsuite</summary>
            <description>We&#x27;ll cover things like:

- Recommended pytest settings for more strictness
- What&#x27;s xfail and why is it useful?
- How to mark an entire test file or single parameters
- Ways to deal with parametrize IDs and syntax
- Useful built-in pytest fixtures
- Caching for fixtures
- Using fixtures implicitly
- Advanced fixture and parametrization topics
- How to customize fixtures behavior based on markers or custom CLI arguments
- If time permits: Short intro to writing pytest plugin and to property-based testing with Hypothesis

**To prepare, please clone the [GitHub repository](https://github.com/The-Compiler/pytest-tips-and-tricks) and follow the setup steps in the README.**</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Tutorial</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/GCGLPN/</url>
            <location>Ferrum [2nd Floor]</location>
            
            <attendee>Freya Bruhin</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>KQM8JJ@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-KQM8JJ</pentabarf:event-slug>
            <pentabarf:title>Foundation Models in Forecasting: Are We There Yet? Lessons from the Trenches</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T150500</dtstart>
            <dtend>20260416T153500</dtend>
            <duration>003000</duration>
            <summary>Foundation Models in Forecasting: Are We There Yet? Lessons from the Trenches</summary>
            <description>The landscape of time-series forecasting is undergoing a seismic shift. With the emergence of foundation models like Chronos 2 and TimesFM, the industry is at a crossroads: can a large-scale pre-trained model truly replace the specialized, &quot;local&quot; models that practitioners have spent years tuning?

In this talk, we move beyond theoretical benchmarks to provide a transparent look at testing time-series foundation models in production-like environments. We explore the transition from traditional statistical and machine learning methods to generative architectures, focusing on the practical challenges that arise when &quot;zero-shot&quot; capabilities meet the messy reality of business data.

### What you will learn:

* **The Foundation Model Landscape:** A high-level mapping of the current state-of-the-art and how these architectures differ from classical statistical and ML approaches.
* **Zero-Shot vs. Reality:** How pre-trained models handle domain-specific context and exogenous business drivers—such as promotions, seasonality, and market shocks—without explicit training.
* **The Operational Shift:** How moving toward foundation models changes the MLOps lifecycle,from data preparation to running inference at scale
* **Predictive Stability &amp; Trust:** A framework for evaluating whether a model is &quot;production-ready,&quot; focusing on forecast stability and consistency of predictions over time.
* **A Decision Roadmap:** A practical checklist for teams looking to integrate these models into their stack without sacrificing reliability.

Whether you are a data scientist looking to upgrade your forecasting pipeline or a lead evaluating the impact of Foundation Models on time-series workflows, this session offers a grounded, hype-free perspective from the front lines of implementation.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/KQM8JJ/</url>
            <location>Ferrum [2nd Floor]</location>
            
            <attendee>Dr. Irena Bojarovska</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>VJPQCR@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-VJPQCR</pentabarf:event-slug>
            <pentabarf:title>Your Data Is Leaking: A Hands-On Introduction to Differential Privacy with OpenDP</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T101500</dtstart>
            <dtend></dtend>
            <duration>013000</duration>
            <summary>Your Data Is Leaking: A Hands-On Introduction to Differential Privacy with OpenDP</summary>
            <description>Aggregate statistics feel safe to release - just counts, means, and totals, no individual records. But a long history of privacy failures has shown otherwise. From the AOL search data leak to the Netflix Prize re-identification attack to LLM memorization, &quot;anonymized&quot; data has repeatedly revealed more than intended.

Differential privacy offers a different approach: a mathematical framework that quantifies and bounds the information any release reveals about any individual. It has moved from theory to practice in recent years, with deployments at the US Census, Wikimedia, Israel’s national birth registry, Google, Apple, Linkedin and more.

In this tutorial, we provide a hands-on introduction to differential privacy. We&#x27;ll start by making the problem concrete - executing an attack on aggregate statistics - and then explore how differential privacy addresses it. The focus will be on practical implementation rather than underlying theory.

## What You&#x27;ll Learn
1. Why traditional anonymization and aggregation fail to protect privacy
1. The core ideas of differential privacy: what it guarantees, what epsilon means, and when DP is a suitable solution
1. How to use OpenDP&#x27;s building blocks
1. How to build differentially private data analyses using OpenDP&#x27;s Polars integration
1. Where to go next: resources for AI/ML with DP, synthetic data, and further learning

## Tutorial Outline

### Part 1 - The Privacy Problem (20 minutes)

- Real-world privacy failures (such as AOL search data, Netflix Prize, LLM memorization)
- Hands-on: execute a reconstruction attack on aggregate statistics
- Discussion: why traditional approaches fail

### Part 2 - Introduction to Differential Privacy (20 minutes)

- Core ideas: masking the contribution of a single individual through calibrated noise; protection against membership inference attack
- Learning by doing: exploring DP with OpenDP&#x27;s building blocks
- Tuning privacy protection with f-DP; the privacy-utility tradeoff
- Real-world deployments (such as US Census, Israel birth registry, LinkedIn API)

### Part 3 - Data Analysis with OpenDP (40 minutes)

- OpenDP fundamentals: domains, transformations, measurements, chaining
- Working with tabular data using OpenDP&#x27;s Polars integration
- Building a complete DP data analysis pipeline
- Revisiting the attack: does it still work?

### Part 4 - What&#x27;s Next (10 minutes)

- Beyond the basics: AI/ML with differential privacy, synthetic data generation
- Resources and community
- Q&amp;A

## Prerequisites
- Python: Comfortable writing functions and working with notebooks
- Statistics: Basic familiarity with mean, counts, histograms
- Differential privacy: No prior knowledge required

## Materials
Participants will have access to interactive Jupyter notebooks with all code and exercises. Materials will be publicly available after the tutorial.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Tutorial</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/VJPQCR/</url>
            <location>Dynamicum [Ground Floor]</location>
            
            <attendee>Shlomi Hod</attendee>
            
            <attendee>Marcel Neunhoeffer</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>QX8DDJ@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-QX8DDJ</pentabarf:event-slug>
            <pentabarf:title>Do you know how well your model is doing? Evaluate your LLMs</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T131000</dtstart>
            <dtend>20260416T144000</dtend>
            <duration>013000</duration>
            <summary>Do you know how well your model is doing? Evaluate your LLMs</summary>
            <description>We will begin with an essential revision of the Hugging Face Transformers library, covering basic LLM inference and fine-tuning. The core of the workshop will introduce and provide deep practice with Lighteval, an efficient and powerful LLM evaluation framework. Participants will learn how to leverage Lighteval to compare various LLMs available on the Hugging Face Hub using a range of pre-built tasks and metrics.

Finally, we will delve into advanced evaluation techniques, focusing on creating custom tasks and metrics tailored to unique, real-world application requirements. Participants will learn how to prepare custom datasets on the Hugging Face Hub and integrate them into Lighteval for precise, domain-specific evaluation. By the end of this workshop, you will possess the practical skills to rigorously evaluate, benchmark, and fine-tune your LLMs with confidence.

Prerequisites:

    - Have experience coding in Python (with Python installed in the local machine)
    - Basic understand of machine learning and LLMs
    - Experience with Hugging Face Transformers preferred but not necessary
    - A Hugging Face Hub account (sign up for free)
    - A modern computer that can fine-turn small LLMs locally

Preparation:

Clone and follow setup [here](https://github.com/Cheukting/lighteval-exercises/)</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Tutorial</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/QX8DDJ/</url>
            <location>Dynamicum [Ground Floor]</location>
            
            <attendee>Cheuk Ting Ho</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>TPNBRN@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-TPNBRN</pentabarf:event-slug>
            <pentabarf:title>Zero-Copy or Zero-Speed? The hidden overhead of PySpark, Arrow &amp; SynapseML for inference</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T150500</dtstart>
            <dtend></dtend>
            <duration>003000</duration>
            <summary>Zero-Copy or Zero-Speed? The hidden overhead of PySpark, Arrow &amp; SynapseML for inference</summary>
            <description>This talk is a technical deep dive into the &quot;physics&quot; of distributed machine learning inference. While high-level APIs promise seamless integration between Spark (JVM) and Python, the underlying data transfer mechanisms often become the primary bottleneck for high-throughput systems. We start by reality-checking the &quot;Zero-Copy&quot; promise of Apache Arrow in a PySpark context, identifying exactly where the abstraction leaks and where &quot;Zero-Copy&quot; isn&#x27;t actually free.

The session concludes with a focus on tuning for throughput. We will explore the delicate balance of configuring `spark.sql.execution.arrow.maxRecordsPerBatch`, demonstrating how to find the &quot;Goldilocks&quot; zone that maximizes CPU saturation without causing JVM off-heap memory crashes. Attendees will gain a deep understanding of the memory hierarchy involved in distributed inference and practical strategies for profiling serialization overhead in production.

Key Takeaways:

- Internals knowledge: Understand exactly how data moves from JVM heap to Python worker memory.
- Which method to use depending on your use-case
- Tuning skills: Learn how to configure Apache Arrow batch sizes to optimize CPU saturation.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/TPNBRN/</url>
            <location>Dynamicum [Ground Floor]</location>
            
            <attendee>Petar Ilijevski</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>U7QDCH@@pretalx.com</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-U7QDCH</pentabarf:event-slug>
            <pentabarf:title>Problem Clinic: Python in Regulated Environments --- What Works, What Doesn&#x27;t  [no-video]</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20260416T101500</dtstart>
            <dtend>20260416T111500</dtend>
            <duration>010000</duration>
            <summary>Problem Clinic: Python in Regulated Environments --- What Works, What Doesn&#x27;t  [no-video]</summary>
            <description>**Topics that may come up:**

- Migrating away from SAS, MATLAB, or proprietary stacks
- AI/ML in environments where cloud is not an option
- Auditability and governance for Python-based models
- Bridging the gap between tech teams and C-level on AI investment decisions
- Open source strategy under regulatory constraints

**Format:** Open discussion, no slides, no projector, no recording (Chatham House Rule). Limited to approx. 20 participants. No registration required.

**Who should join:** Anyone using or introducing Python in a regulated environment --- regardless of industry.

**Moderation:** Alexander CS Hendorf</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Open Space</category>
            <url>https://pretalx.com/pyconde-pydata-2026/talk/U7QDCH/</url>
            <location>Lounge [1st Floor]</location>
            
            <attendee>Alexander CS Hendorf</attendee>
            
        </vevent>
        
    </vcalendar>
</iCalendar>
