{"$schema": "https://c3voc.de/schedule/schema.json", "generator": {"name": "pretalx", "version": "2025.2.0.dev0"}, "schedule": {"url": "https://pretalx.com/pyconde-pydata-2025/schedule/", "version": "1.5", "base_url": "https://pretalx.com", "conference": {"acronym": "pyconde-pydata-2025", "title": "PyCon DE & PyData 2025", "start": "2025-04-23", "end": "2025-04-25", "daysCount": 3, "timeslot_duration": "00:05", "time_zone_name": "Europe/Berlin", "colors": {"primary": "#b7bcbf"}, "rooms": [{"name": "Zeiss Plenary (Spectrum)", "slug": "3893-zeiss-plenary-spectrum", "guid": "cbef5dea-b209-5af2-a15b-df1d7d9a6581", "description": "Ground floor", "capacity": 1072}, {"name": "Titanium3", "slug": "3894-titanium3", "guid": "bfdfe5f6-faa1-5b37-a7cf-958ab3542af3", "description": "2.0234, 2nd floor", "capacity": 300}, {"name": "Helium3", "slug": "3895-helium3", "guid": "18b8ec3e-6a25-5c3d-b5b2-dd327763a43b", "description": "3.0789, 3rd floor", "capacity": 284}, {"name": "Platinum3", "slug": "3896-platinum3", "guid": "8654a106-8c74-51b7-8369-c13d7bd596ee", "description": "2.0678, 2nd floor", "capacity": 263}, {"name": "Europium2", "slug": "3897-europium2", "guid": "48c803b5-725c-580d-be34-e4881cebf708", "description": "3.034, 3rd floor", "capacity": 161}, {"name": "Hassium", "slug": "3898-hassium", "guid": "9a07052d-93e6-5dda-bc4f-064dcb37cea2", "description": "3.02, 3rd floor", "capacity": 96}, {"name": "Palladium", "slug": "3899-palladium", "guid": "2c9d96b3-ecd5-5f0b-ba21-51ddbde820ce", "description": "2.05, 2n floor", "capacity": 63}, {"name": "Ferrum", "slug": "3900-ferrum", "guid": "85710256-32e2-5582-8be7-6f9d3c4c1d75", "description": "Ground floor", "capacity": 269}, {"name": "Dynamicum", "slug": "3901-dynamicum", "guid": "863735e7-b50b-5ca5-81c3-4e5be3c0c65c", "description": "0.04, ground floor", "capacity": 150}, {"name": "Carbonium", "slug": "3973-carbonium", "guid": "a8dc83a0-8ecc-5fa6-ac83-939783158320", "description": "2.01, 2nd floor", "capacity": null}, {"name": "OpenSpace", "slug": "4084-openspace", "guid": "c183bd0b-c8c3-5027-bd11-2d796f2ee872", "description": "3.11, 3rd floor, foyer", "capacity": 80}], "tracks": [{"name": "PyCon: MLOps & DevOps", "slug": "5205-pycon-mlops-devops", "color": "#000000"}, {"name": "PyCon: Programming & Software Engineering", "slug": "5206-pycon-programming-software-engineering", "color": "#000000"}, {"name": "PyCon: Python Language & Ecosystem", "slug": "5207-pycon-python-language-ecosystem", "color": "#000000"}, {"name": "PyCon: Security", "slug": "5208-pycon-security", "color": "#000000"}, {"name": "PyCon: Testing", "slug": "5209-pycon-testing", "color": "#000000"}, {"name": "PyCon: Django & Web", "slug": "5210-pycon-django-web", "color": "#000000"}, {"name": "PyData: Data Handling & Engineering", "slug": "5211-pydata-data-handling-engineering", "color": "#000000"}, {"name": "PyData: Machine Learning & Deep Learning & Statistics", "slug": "5212-pydata-machine-learning-deep-learning-statistics", "color": "#000000"}, {"name": "PyData: Natural Language Processing & Audio (incl. Generative AI NLP)", "slug": "5213-pydata-natural-language-processing-audio-incl-generative-ai-nlp", "color": "#000000"}, {"name": "PyData: Computer Vision (incl. Generative AI CV)", "slug": "5289-pydata-computer-vision-incl-generative-ai-cv", "color": "#000000"}, {"name": "PyData: Generative AI", "slug": "5214-pydata-generative-ai", "color": "#000000"}, {"name": "PyData: Embedded Systems & Robotics", "slug": "5290-pydata-embedded-systems-robotics", "color": "#000000"}, {"name": "PyData: PyData & Scientific Libraries Stack", "slug": "5215-pydata-pydata-scientific-libraries-stack", "color": "#000000"}, {"name": "PyData: Visualisation & Jupyter", "slug": "5216-pydata-visualisation-jupyter", "color": "#000000"}, {"name": "PyData: Research Software Engineering", "slug": "5291-pydata-research-software-engineering", "color": "#000000"}, {"name": "General: Community & Diversity", "slug": "5217-general-community-diversity", "color": "#000000"}, {"name": "General: Education, Career & Life", "slug": "5292-general-education-career-life", "color": "#000000"}, {"name": "General: Ethics & Privacy", "slug": "5218-general-ethics-privacy", "color": "#000000"}, {"name": "General: Infrastructure - Hardware & Cloud", "slug": "5219-general-infrastructure-hardware-cloud", "color": "#000000"}, {"name": "General: Others", "slug": "5221-general-others", "color": "#000000"}, {"name": "Sponsor", "slug": "5222-sponsor", "color": "#000000"}, {"name": "Keynote", "slug": "5223-keynote", "color": "#000000"}, {"name": "General: Rust", "slug": "5393-general-rust", "color": "#000000"}], "days": [{"index": 1, "date": "2025-04-23", "day_start": "2025-04-23T04:00:00+02:00", "day_end": "2025-04-24T03:59:00+02:00", "rooms": {"Zeiss Plenary (Spectrum)": [{"guid": "32ce63ca-96c6-58ac-b5d9-e6e4dfc3baaf", "code": "3MNGN8", "id": 64178, "logo": null, "date": "2025-04-23T10:30:00+02:00", "start": "10:30", "duration": "00:45", "room": "Zeiss Plenary (Spectrum)", "slug": "pyconde-pydata-2025-64178-reasonable-ai", "url": "https://pretalx.com/pyconde-pydata-2025/talk/3MNGN8/", "title": "Reasonable AI", "subtitle": "", "track": "Keynote", "type": "Keynote", "language": "en", "abstract": "The relationship between humans and machines, especially in the context of Artificial Intelligence (AI), is shaped by hopes, concerns, and moral questions. On the one hand, advances in AI offer great promise: it can help us solve complex problems, improve healthcare, streamline workflows, and much more. Yet, at the same time, there are legitimate concerns about the control over this technology, its potential impact on jobs and society, and ethical issues related to discrimination and the loss of human autonomy. In the talk I shall will explore and illustrate the complex tension between innovation and moral responsibility in AI research.", "description": "The relationship between humans and machines, especially in the context of Artificial Intelligence (AI), is shaped by hopes, concerns, and moral questions. On the one hand, advances in AI offer great promise: it can help us solve complex problems, improve healthcare, streamline workflows, and much more. Yet, at the same time, there are legitimate concerns about the control over this technology, its potential impact on jobs and society, and ethical issues related to discrimination and the loss of human autonomy. In the talk I shall will explore and illustrate the complex tension between innovation and moral responsibility in AI research.", "recording_license": "", "do_not_record": false, "persons": [{"code": "B37CUN", "name": "Kristian Kersting", "avatar": "https://pretalx.com/media/avatars/B37CUN_YYuqKpp.jpg", "biography": "Kristian Kersting is co-director of the Hessian Center for AI (hessian.AI), head of research at the German Research Center for AI / Darmstadt, and professor of AI and machine learning at TU Darmstadt. After his PhD at the Univrsity of  Freiburg in 2006, he was with the MIT, Fraunhofer IAIS, the University of Bonn and the TU Dortmund. He is an AAAI, EurAI and ELLIS Fellow, coauthor of the popoular science book \u201cWie Maschinen Lernen\u201d, winner of the \u201cGerman AI Prize\u201d, member of the Mainz Academy of Sciences and Literature and seed investor at Aleph Alpha, one of Europe's AI hopes. in collaboration with Aleph Alpha Research, he also runs the collaboration lab 1141 at TU Darsmtadt on safe and transparent generative AI. He had a regular AI column in the Welt (am Sonntag).", "public_name": "Kristian Kersting", "guid": "e2fa18b3-b1a8-56e7-bdbb-5a6fe77c2173", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/B37CUN/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/3MNGN8/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/3MNGN8/", "attachments": []}, {"guid": "a928818e-3cd8-53f7-a7ee-5d4db0bfc4f9", "code": "AJDYRL", "id": 61317, "logo": null, "date": "2025-04-23T11:45:00+02:00", "start": "11:45", "duration": "00:30", "room": "Zeiss Plenary (Spectrum)", "slug": "pyconde-pydata-2025-61317-python-performance-unleashed-essential-optimization-techniques-beyond-libraries", "url": "https://pretalx.com/pyconde-pydata-2025/talk/AJDYRL/", "title": "Python Performance Unleashed: Essential Optimization Techniques Beyond Libraries", "subtitle": "", "track": "PyCon: Python Language & Ecosystem", "type": "Talk", "language": "en", "abstract": "Every Python developer faces performance challenges, from slow data processing to memory-intensive operations. While external libraries like Numba or Cython offer solutions, understanding core Python optimization techniques is crucial for writing efficient code. This talk explores practical optimization strategies using Python's built-in capabilities, demonstrating how to achieve significant performance improvements without external dependencies. Through real-world examples from machine learning pipelines and data processing applications, we'll examine common bottlenecks and their solutions. Whether you're building data pipelines, web applications, or ML systems, these techniques will help you write faster, more efficient Python code.", "description": "Performance optimization remains a critical challenge in Python development. While Python's simplicity and extensive ecosystem make it the language of choice for many applications, its interpreted nature can lead to significant performance bottlenecks. This is particularly evident in data-intensive applications, machine learning pipelines, and large-scale production systems where every millisecond counts.\r\n\r\nMany developers immediately reach for external libraries or complex solutions when facing performance issues. However, Python's standard library and built-in features offer powerful optimization opportunities that are often overlooked. Understanding these fundamental optimization techniques not only improves code performance but also helps developers write more efficient code from the start.\r\n\r\nThis talk addresses the core performance challenges faced by Python developers daily. From memory management to algorithmic efficiency, we'll explore how seemingly simple code changes can lead to substantial performance improvements. Through practical examples drawn from real-world applications, we'll demonstrate how to identify, measure, and optimize performance bottlenecks effectively.", "recording_license": "", "do_not_record": false, "persons": [{"code": "HRNBJX", "name": "Thomas Berger", "avatar": "https://pretalx.com/media/avatars/HRNBJX_ZtkyGZH.jpeg", "biography": "Hi, I\u2019m Thomas Berger! I work as a Machine Learning Engineer at a FinTech company and also teach part-time as a lecturer. I\u2019ve been working with Python for over six years, starting during my studies, where I focused on machine learning. For the last three years, I\u2019ve been applying these skills professionally in my full-time role as a Machine Learning Engineer. I\u2019ve been diving deep into Python for things like **machine learning**, **reinforcement learning**, and **high-performance computing**. I love finding ways to make Python run faster and more efficiently, especially when tackling big data or complex models.\r\n\r\nAt PyCon, I\u2019ll be talking about **high-performance Python** and sharing tips and tricks to help you optimize your code for demanding tasks. I\u2019m excited to share what I\u2019ve learned and connect with others in the Python community!", "public_name": "Thomas Berger", "guid": "feb694c9-78b3-535f-882b-7930146d15aa", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/HRNBJX/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/AJDYRL/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/AJDYRL/", "attachments": [{"title": "Slides -23.04.2025", "url": "/media/pyconde-pydata-2025/submissions/AJDYRL/resources/_pyth_snTmAvH.html", "type": "related"}]}, {"guid": "d17252fa-6f5e-5ed6-8086-9fec72694333", "code": "MRHNCV", "id": 60702, "logo": null, "date": "2025-04-23T12:25:00+02:00", "start": "12:25", "duration": "00:45", "room": "Zeiss Plenary (Spectrum)", "slug": "pyconde-pydata-2025-60702-open-table-formats-in-the-wild-from-parquet-to-delta-lake-and-back", "url": "https://pretalx.com/pyconde-pydata-2025/talk/MRHNCV/", "title": "Open Table Formats in the Wild: From Parquet to Delta Lake and Back", "subtitle": "", "track": "PyData: Data Handling & Engineering", "type": "Talk (long)", "language": "en", "abstract": "Open table formats have revolutionized analytical, columnar storage on cloud object stores with critical features like ACID compliance and enhanced metadata management, once exclusive to proprietary cloud data warehouses. Delta Lake, Iceberg, and Hudi have significantly advanced over traditional open file formats like Parquet and ORC.\r\n\r\nIn an effort to modernize our data architecture, we aimed to replace our Parquet-based bronze layer with Delta Lake, anticipating better query performance, reduced maintenance, native support for incremental processing, and more. While our initial pilot showed promise, we encountered unexpected pitfalls that ultimately brought us back to where we began.\r\n\r\nCurious? Join me as we shed light on the current state of table formats.", "description": "# Description\r\n\r\nOpen Table Formats (OTF) such as Hudi, Iceberg and Delta Lake have disruptively changed the data engineering landscape in recent years. While the Parquet file format has evolved as the de-facto standard for open, interoperable columnar storage for analyical workloads, it lacked first class support for critical features such as ACID compliance, incremental processing, flexible schema & partioning evolution and scalable meta data management. This led to increased development and maintenance efforts while building idempotent and failure tolerant data pipelines that often resulted in custom frameworks. OTFs solve all of these issues via providing a sophisticated meta data layer and improved maintenance capabilities on top of Parquet.\r\n\r\nDriven by the promises of OTFs, we intended to replace our own bronze-read-only Parquet-based storage layer with Delta Lake. In theory, this should have improved performance, reduced maintenanced and provided more flexibility. However, we've stumbled upon several issues:\r\n\r\n1. drastic performance issues with Liquid Clustering during incremental processing\r\n2. inmature interoperability in the python and cloud-based ecosystem (DuckDB, Pandas, Polars, Athena, Snowflake)\r\n3. maintaining logical session-boundaries during incremental processing\r\n\r\nWhile the first two issues are solvable in foreseeable future, the last one is specific to our requirements and does not overlap with design decisions made for incremental processing in Delta Lake. Taken together, these points ultimately led us to go back to relying on Parquet again.\r\n\r\n## Targeted Audience\r\n\r\nThis talk is mainly intended for an intermediate data engineering audience but is well suited for interested beginners, too. The content of this talk is relevant for all architects and data engineers being responsible for storing and managing data for analytical workloads.\r\n\r\n# Key takeaways\r\n\r\n- What problems do OTFs solve?\r\n- How do OTFs contribute to an open, composable data stack?\r\n- Is there a predominant Open Table Format?\r\n- How does Delta Lake conceptionally work?\r\n- What are concrete real-world advantages of Delta Lake in contrast to \"plain\" Parquet?\r\n- What is the \"small files\" problem and how does Liquid Clustering help?\r\n- How is the current state of interoperability with Delta Lake?\r\n\r\n# Talk Outline\r\n- Introduction (5 min)\r\n- OTFs in comparison (5 min)\r\n- Delta Lake Internals (10 min)\r\n- Use Case Requirements (5 min)\r\n- Benchmarks & Results (10 min)\r\n- Conclusion and Outlook (5 min)\r\n- Questions (5 min)", "recording_license": "", "do_not_record": false, "persons": [{"code": "3CXHP7", "name": "Franz W\u00f6llert", "avatar": "https://pretalx.com/media/avatars/3CXHP7_nqaEbER.jpg", "biography": "Hi my name is Franz and I\u2019m an open source and python enthuisiast:\r\n\r\n- father of 3 girls\r\n- major in psychology\r\n- chess hobbiyst\r\n- competitive ultimate frisbee player\r\n- likes cooking and baking sourdough bread", "public_name": "Franz W\u00f6llert", "guid": "c4193aee-6e76-5d3a-9f99-018a827796e1", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/3CXHP7/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/MRHNCV/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/MRHNCV/", "attachments": [{"title": "Slides", "url": "/media/pyconde-pydata-2025/submissions/MRHNCV/resources/Open_T_Ylyv86n.pdf", "type": "related"}]}, {"guid": "e7f690bf-5b49-58b1-8720-95a527a07910", "code": "83QH37", "id": 61250, "logo": null, "date": "2025-04-23T14:30:00+02:00", "start": "14:30", "duration": "00:30", "room": "Zeiss Plenary (Spectrum)", "slug": "pyconde-pydata-2025-61250-from-trees-to-transformers-our-journey-towards-deep-learning-for-ranking", "url": "https://pretalx.com/pyconde-pydata-2025/talk/83QH37/", "title": "From Trees to Transformers: Our Journey Towards Deep Learning for Ranking", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Statistics", "type": "Talk", "language": "en", "abstract": "GetYourGuide, a global marketplace for travel experiences, reached diminishing returns with its XGBoost-based ranking system. We switched to a Deep Learning pipeline in just nine months, maintaining high throughput and low latency. We iterated on over 50 offline models and conducted more than 10 live A/B tests, ultimately deploying a PyTorch transformer that yielded significant gains. In this talk, we will share our phased approach\u2014from a simple baseline to a high-impact launch\u2014and discuss the key operational and modeling challenges we faced. Learn how to transition from tree-based methods to neural networks and unlock new possibilities for real-time ranking.", "description": "GetYourGuide is a global online marketplace that helps travelers discover and book the best experiences. One of our core challenges is ensuring users always see the most relevant activities first\u2014a task historically powered by an XGBoost-based ranking system. However, as we continued refining our tree-based models, returns on incremental improvements began to plateau. To spark our next step change in performance, we decided to adopt Deep Learning.\r\n\r\nIn this talk, we will share how, in just nine months, we migrated our ranking pipeline to a Deep Learning architecture while maintaining tight latency and high-throughput requirements. We will walk through our phased approach, starting with a minimal viable model to confirm our production setup and gradually increasing its complexity. Along the way, we tested over 50 iterations offline and ran more than 10 live A/B tests to validate the impact on our customers. Ultimately, we rolled out a PyTorch transformer-based model with significant business impact. We will also discuss the main challenges we faced on the operational and modeling sides, how we overcame them, and the lessons we learned.\r\n\r\nYou will leave with practical strategies for transitioning from traditional tree-based models to neural networks in production. Join us to learn how to advance your machine-learning capabilities and unlock new dimensions of relevance and personalization for real-time ranking.", "recording_license": "", "do_not_record": false, "persons": [{"code": "EZMJWT", "name": "Theodore Meynard", "avatar": "https://pretalx.com/media/avatars/EZMJWT_XWSm1xx.jpg", "biography": "Theodore Meynard is a data science manager at GetYourGuide.He leads the evolution of their ranking algorithm, helping customers to find the best activities to book and locations to explore. Beyond work, he is one of the co-organizers of the Pydata Berlin meetup and the conference. \r\nWhen he is not programming, he loves riding his bike, looking for the best bakery-patisserie in town.", "public_name": "Theodore Meynard", "guid": "86973d97-e18a-5002-9b99-7690509f6220", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/EZMJWT/"}, {"code": "SVAMFE", "name": "Mihail Douhaniaris", "avatar": "https://pretalx.com/media/avatars/SVAMFE_cv3x4TF.png", "biography": "Mihail Douhaniaris is a Senior Data Scientist at GetYourGuide, where he specializes in improving the marketplace ranking algorithms to improve search relevance. His work helps travelers find experiences that match their preferences more effectively. Beyond his role, Mihail is deeply interested in responsible AI, ML observability, and the challenges of deploying machine learning at scale.", "public_name": "Mihail Douhaniaris", "guid": "42abed7f-8ee7-5b38-b0b9-1a771ce3f487", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/SVAMFE/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/83QH37/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/83QH37/", "attachments": []}, {"guid": "ef431219-59f8-5732-82dc-43d9ff2f4d1d", "code": "JA9NFW", "id": 60441, "logo": null, "date": "2025-04-23T15:10:00+02:00", "start": "15:10", "duration": "00:45", "room": "Zeiss Plenary (Spectrum)", "slug": "pyconde-pydata-2025-60441-beyond-agents-what-ai-strategy-really-needs-in-2025", "url": "https://pretalx.com/pyconde-pydata-2025/talk/JA9NFW/", "title": "Beyond Agents: What AI Strategy Really Needs in 2025", "subtitle": "", "track": "General: Others", "type": "Talk (long)", "language": "en", "abstract": "Artificial intelligence is no longer confined to models and APIs\u2014it now shapes systems, hardware, and real-world agents. In this talk, I reflect on strategic insights gained at NVIDIA\u2019s GTC 2025, where AI\u2019s convergence with simulation, synthetic data, and robotics signals a fundamental shift. Drawing from over 1,100 sessions and personal experiences at the heart of Silicon Valley, I explore emerging patterns that redefine what it means to build and deploy AI at scale. We\u2019ll look beyond the hype of large language models to examine autonomous systems, interdisciplinary development, and the infrastructure shifts enabling AI everywhere\u2014from cloud to desktop. This session is a call to technical leaders and practitioners to broaden their perspective, think beyond tools, and engage strategically. Whether you\u2019re developing agents, managing data pipelines, or scaling AI across teams, this talk will challenge assumptions and highlight what truly matters in 2025 and beyond.", "description": "Artificial intelligence is expanding beyond the boundaries of models and APIs\u2014into real-world agents, high-fidelity simulation, and strategic infrastructure. This talk offers a practical, forward-looking perspective on AI strategy, based on insights gathered at NVIDIA\u2019s GTC 2025, one of the most influential events in the global AI ecosystem.\r\n\r\nWe begin with a personal reflection: why attending GTC as an AI consultant helped reset my strategic thinking after experiencing the common challenges of fragmented data, isolated tools, and innovation fatigue. From there, we\u2019ll explore key emerging trends\u2014agentic AI, synthetic data generation, and real-time digital twins\u2014and discuss their broader implications for how we design, train, and deploy intelligent systems.\r\n\r\nThe second part of the talk focuses on convergence: how disciplines such as robotics, healthcare, simulation, and cloud infrastructure are blending, creating new demands for cross-functional collaboration. A brief clustering analysis of 500+ GTC sessions will illustrate this shift.\r\n\r\nWe\u2019ll conclude by examining strategic changes in AI infrastructure\u2014especially the rise of powerful, local AI systems\u2014and draw lessons from unexpected collaborations (such as Disney, DeepMind, and NVIDIA) that reveal how innovation often happens at the intersection of domains.\r\n\r\nThis talk is intended for developers, data scientists, and technical leads who want to broaden their understanding of where AI is headed and how to align today\u2019s decisions with tomorrow\u2019s possibilities.\r\n\r\nTalk Outline:\r\n\t\u2022\tIntroduction: personal motivation and strategic perspective on GTC 2025\r\n\t\u2022\tKey trends: agentic AI, synthetic data, and real-time simulation\r\n\t\u2022\tInterdisciplinary convergence: how domains like robotics, biology, and infrastructure intersect\r\n\t\u2022\tCase study: the Disney\u2013DeepMind\u2013NVIDIA collaboration and its broader lessons\r\n\t\u2022\tStrategic implications: shifts in AI infrastructure and a call for action-oriented, cross-domain thinking", "recording_license": "", "do_not_record": false, "persons": [{"code": "8F38DV", "name": "Alexander CS Hendorf", "avatar": "https://pretalx.com/media/avatars/8F38DV_QTtqqiS.jpg", "biography": "Alexander C. S. Hendorf has over 20 years of experience in digitalization, data, and artificial intelligence. As an independent consultant, he focuses on the practical implementation, adoption, and communication of data- and AI-driven strategies and decision-making processes.\r\n\r\nWhile still in law school, he worked as a DJ\u2014before dropping out to join a transatlantic music start-up. The venture evolved into a decent independent label group and, eventually, a small stock corporation, where Alexander became a partner and, at 28, took over as COO. He led the company\u2019s digital transformation and designed systems that could scale with growth. This entrepreneurial journey laid the foundation for his deep understanding of business strategy, technology, and innovation.\r\n\r\nAfter closing the chapter on digital music, Alexander turned his focus to data science and AI\u2014initially driven by curiosity, with weekends on Coursera and evenings on GPUs. That passion evolved into a career advising organizations on AI integration, data strategy, and building impact-driven teams.\r\n\r\nSome say he just picks the flashiest jobs\u2014record label owner, data scientist\u2014but really, he follows his passion: for what\u2019s new, what matters, and what connects people and technology.\r\n\r\nToday, he supports clients\u2014especially in regulated or legacy-heavy industries\u2014in aligning emerging technologies with real-world business goals. His work emphasizes cultural impact, sustainable change, and interdisciplinary thinking.\r\n\r\nAlexander is a recognized expert in data intelligence and a frequent speaker and chair at international conferences, including PyCon DE & PyData, Data2Day, and EuroPython. He\u2019s a Python Software Foundation Fellow, EuroPython Fellow, and board member of the Python Software Verband (Germany).\r\n\r\nSince 2024, he has been driving [Pioneers Hub](https://pioneershub.org), a non-profit supporting vibrant, inclusive tech communities\u2014and helping innovators keep pace in a rapidly changing world.", "public_name": "Alexander CS Hendorf", "guid": "e61ae96e-6f0d-5312-867d-6bf04eefb64f", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/8F38DV/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/JA9NFW/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/JA9NFW/", "attachments": [{"title": "Slides", "url": "/media/pyconde-pydata-2025/submissions/JA9NFW/resources/Beyond_ngR0Jv6.pdf", "type": "related"}]}, {"guid": "a6b76132-159a-5e3f-9df2-a1cf347219c8", "code": "NNGWGC", "id": 61232, "logo": null, "date": "2025-04-23T16:10:00+02:00", "start": "16:10", "duration": "00:30", "room": "Zeiss Plenary (Spectrum)", "slug": "pyconde-pydata-2025-61232-mastering-demand-forecasting-lessons-from-europe-s-largest-retailer", "url": "https://pretalx.com/pyconde-pydata-2025/talk/NNGWGC/", "title": "Mastering Demand Forecasting: Lessons from Europe's Largest Retailer", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Statistics", "type": "Talk", "language": "en", "abstract": "Ever craved your favorite dish, only to find its key ingredient missing from the store? You're not alone - stock outs can have significant consequences for businesses, resulting in frustrated customers and lost sales. On the other hand, overstocking can lead to wasted storage costs and potential write-offs. The replenishment system is responsible for striking the right balance between these opposing risks.\r\nThe key to successful replenishment is making accurate predictions about future demand. \r\n\r\nThis presentation takes a deep dive into the intricate world of demand forecasting, at Europe's largest retailer. We will demonstrate how enhancing simple machine learning methods with domain knowledge allows to generate hundreds of millions of high-quality forecasts every day.", "description": "This talk will provide an in-depth look at the forecasting engine, the heart of Lidl's replenishment system.\r\n\r\nEach day at Lidl, hundreds of millions of various products journey from suppliers to warehouses before reaching the shelves. Our so-called forecasting engine helps to automate the supply chain at every step along the way. \r\nEven with the vast amount of data at our disposal, the problem is still extraordinarily intricate. Each item, store or warehouse has unique demand patterns influenced heavily by a wide range of factors, such as holidays. While most of the effects are quantifiable, others remain unavailable and a certain degree of stochasticity is inherent to the process. The objective of our demand prediction may also vary based on their usages. Accuracy on the day level typically matters for short-term predictions, while it doesn't for long-term predictions. \r\n\r\nWe'll present our pragmatic modeling methodology on a simplified version of the problem at hand: The warehouse forecasting of single items. \r\n\r\nWe explain the rationale for training separate models for each item-warehouse combination and go into the reasons why we opted for using a LGBM model and why we believe it is best suited for our application. In addition to outlining our high-level modeling approach, we demonstrate how business and domain expertise are integrated into the modeling process through the use of sample and feature weighting and examine the impact of this integration on prediction quality. Following the base model, extensions are introduced that enable the incorporation of higher-level information at the finest level of granularity. This is achieved through decomposition and recomposition of the time-series at hand. In detail, we will present uplift decomposition for different use-cases, which include handling of promotions and holidays.\r\n\r\nTo conclude, we will give an overview of how all the presented methods synergize in delivering reliable forecasts for happy customers, so that you will hopefully never find yourself in front of an empty shelf!", "recording_license": "", "do_not_record": false, "persons": [{"code": "HHWHWK", "name": "Moreno Schlageter", "avatar": null, "biography": "Machine Learning Engineer at Schwarz IT, Germany, where I'm passionate about harnessing the power of AI to revolutionize the retail industry", "public_name": "Moreno Schlageter", "guid": "b10dd709-a018-53ca-a903-cefb3b419130", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/HHWHWK/"}, {"code": "UFVRPX", "name": "Yovli Duvshani", "avatar": null, "biography": "Versatile data scientist with 3+ years of experience building AI-products at the service of the industry. I believe that the key for success revolves around embracing shared best practices, upholding high quality standards for code development and having a team composed of complementary skill sets.", "public_name": "Yovli Duvshani", "guid": "772a08a3-1297-513a-b88b-29f8cb3cfa35", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/UFVRPX/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/NNGWGC/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/NNGWGC/", "attachments": []}, {"guid": "07b2df9d-5d9b-580a-8afd-8e82b72748d6", "code": "FUX3FR", "id": 59318, "logo": null, "date": "2025-04-23T17:10:00+02:00", "start": "17:10", "duration": "00:30", "room": "Zeiss Plenary (Spectrum)", "slug": "pyconde-pydata-2025-59318-conquering-pdfs-document-understanding-beyond-plain-text", "url": "https://pretalx.com/pyconde-pydata-2025/talk/FUX3FR/", "title": "Conquering PDFs: document understanding beyond plain text", "subtitle": "", "track": "PyData: Natural Language Processing & Audio (incl. Generative AI NLP)", "type": "Talk", "language": "en", "abstract": "NLP and data science could be so easy if all of our data came as clean and plain text. But in practice, a lot of it is hidden away in PDFs, Word documents, scans and other formats that have been a nightmare to work with. In this talk, I'll present a new and modular approach for building robust document understanding systems, using state-of-the-art models and the awesome Python ecosystem. I'll show you how you can go from PDFs to structured data and even build fully custom information extraction pipelines for your specific use case.", "description": "For the practical examples, I'll be using spaCy, and the new Docling library and layout analysis models. I'll also cover Optical Character Recognition (OCR) for image-based text, how to convert tabular data to pandas DataFrames, and strategies for creating training and evaluation data for information extraction tasks like text classification and entity recognition using PDFs and other documents as inputs.", "recording_license": "", "do_not_record": false, "persons": [{"code": "FZKG9N", "name": "Ines Montani", "avatar": "https://pretalx.com/media/avatars/FZKG9N_5iBQp5R.jpg", "biography": "Ines Montani is a developer specializing in tools for AI and NLP technology. She\u2019s the co-founder and CEO of Explosion and a core developer of spaCy, a popular open-source library for Natural Language Processing in Python, and Prodigy, a modern annotation tool for creating training data for machine learning models.", "public_name": "Ines Montani", "guid": "b60e58b3-bd41-534c-a286-22ae8481a00a", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/FZKG9N/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/FUX3FR/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/FUX3FR/", "attachments": []}, {"guid": "6e07b8ae-2785-515b-8457-a2151dbdbe20", "code": "GURXPK", "id": 61192, "logo": null, "date": "2025-04-23T17:50:00+02:00", "start": "17:50", "duration": "00:30", "room": "Zeiss Plenary (Spectrum)", "slug": "pyconde-pydata-2025-61192-is-prompt-engineering-dead-how-auto-optimization-is-changing-the-game", "url": "https://pretalx.com/pyconde-pydata-2025/talk/GURXPK/", "title": "Is Prompt Engineering Dead? How Auto-Optimization is Changing the Game", "subtitle": "", "track": "PyData: Natural Language Processing & Audio (incl. Generative AI NLP)", "type": "Talk", "language": "en", "abstract": "The rise of LLMs has elevated prompt engineering as a critical skill in the AI industry, but manual prompt tuning is often inefficient and model-specific. This talk explores various automatic prompt optimization approaches, ranging from simple ones like bootstrapped few-shot to more complex techniques such as MIPRO and TextGrad, and showcases their practical applications through frameworks like DSPy and AdalFlow. By exploring the benefits, challenges, and trade-offs of these approaches, the attendees will be able to answer the question: is prompt engineering dead, or has it just evolved?", "description": "With the rise of LLMs, prompt engineering has become a highly impactful skill in the AI industry. However, manual prompt tuning is challenging, time-consuming, and not always generalizable across different models. This raises a reasonable question: can prompts be automatically learned from data? The answer is yes, and in this talk, we will explore how.\r\n\r\nFirst, we will provide a high-level overview of various prompt optimization approaches, starting with a simple technique like bootstrapped few-shot, which automatically generates and selects an optimal set of demonstrations for each step in the LLM chain. Then, we will discuss more complex approaches, such as MIPRO and TextGrad, which directly optimize the instructions.\r\n\r\nAfterwards, we will move on to a more practical part by showcasing how these techniques can be used via popular frameworks such as DSPy and AdalFlow.\r\n\r\nFinally, we will discuss the benefits and trade-offs of these approaches and frameworks in terms of costs, complexity and performance, so the audience can decide whether prompt engineering is truly dead.\r\n\r\n**Outline:**\r\n* Introduction (2 min)\r\n* Discussion of problems with manual prompt engineering (2 min)\r\n* Overview of existing prompt optimization approaches (10 min):\r\n    * Bootstrapped few-shot (3 min)\r\n    * MIPRO (3 min)\r\n    * TextGrad (4 min)\r\n* Showcasing the prompt optimization frameworks (8 min):\r\n    * DSPy (4 min)\r\n    * AdalFlow (4 min)\r\n* Comparison of methods and concluding remarks (3 min)\r\n* Q&A (5 min)", "recording_license": "", "do_not_record": false, "persons": [{"code": "EACXYX", "name": "Iryna Kondrashchenko", "avatar": "https://pretalx.com/media/avatars/EACXYX_VKiz23C.png", "biography": "Iryna is a data scientist and co-founder of DataForce Solutions, a company specialized in delivering end-to-end data science and AI services. She contributes to several open-source libraries, and strongly believes that open-source products foster a more inclusive tech industry, equipping individuals and organizations with the necessary tools to innovate and compete.", "public_name": "Iryna Kondrashchenko", "guid": "70018355-f170-5858-9983-12cb340d312d", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/EACXYX/"}, {"code": "NWAQCX", "name": "Oleh Kostromin", "avatar": "https://pretalx.com/media/avatars/NWAQCX_aiMHOjX.png", "biography": "I am a Data Scientist primarily focused on Deep Learning and MLOps. In my spare time I contribute to several open-source python libraries.", "public_name": "Oleh Kostromin", "guid": "68c7801e-b9b7-5c17-bc44-d5e705e5c269", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/NWAQCX/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/GURXPK/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/GURXPK/", "attachments": []}, {"guid": "22a92cb8-9dcd-55e8-98cf-80d0d88bdca2", "code": "ESD7KF", "id": 68193, "logo": null, "date": "2025-04-23T18:30:00+02:00", "start": "18:30", "duration": "01:00", "room": "Zeiss Plenary (Spectrum)", "slug": "pyconde-pydata-2025-68193-lightning-talks-1-2", "url": "https://pretalx.com/pyconde-pydata-2025/talk/ESD7KF/", "title": "Lightning Talks (1/2)", "subtitle": "", "track": "General: Others", "type": "Lightning Talks", "language": "en", "abstract": "Lightning Talks at PyCon DE & PyData are short, 5-minute presentations open to all attendees. They\u2019re a fun and fast-paced way to share ideas, showcase projects, spark discussions, or raise awareness about topics you care about \u2014 whether technical, community-related, or just inspiring. No slides are required, and talks can be spontaneous or prepared. It\u2019s a great chance to speak up and connect with the community!\r\n\r\nPlease note: community conference and event announcements are limited to 1 minute only.  All event announcements will be collected in a slide slide deck.", "description": "### \u26a1 Lightning Talk Rules\r\n\r\n* No promotion for products or companies.\r\n* No call for 'we are hiring' (but you may name your employer).\r\n* One LT per person per conference policy.\r\n\r\n#### Community Event Announcements\r\n\r\n* \u23f1 You want to announce a community event? You have ONE minute.\r\n* All event announcements will be collected in a single slide slide deck, see instructions at the Lightning Talk desk in the Community Space\r\n  in\r\n  the Lounge on Level 1.\r\n\r\n#### All other LTs:\r\n\r\n* \u23f1 You have exactly 5 minutes. The clock starts when you start \u2014 and ends when time\u2019s up. That\u2019s the thrill of Lightning Talks \u26a1\r\n* \ud83c\udfaf Be sharp, clear, and fun. Introduce your idea, make your point, give the audience something to remember. No pressure. (Okay, maybe a\r\n  little.)\r\n* \ud83c\udfb2 You must include at least **one entry from the [official Bingo Card list](/bingocard/)**. Every audience member will receive a Bingo\r\n  card \u2014 and they\u2019ll be\r\n  watching \ud83d\udc40 Your job? Choose at least one Bingo item from the [official Bingo Card list\u2014](/bingocard/)and drop it into your talk. Subtly or\r\n  dramatically \u2014 your style.\r\n* \ud83d\udc0d Keep it relevant to Python, PyData and the community. You can go broad \u2014 tools, workflows, stories, experiments \u2014 as long as there\u2019s\r\n  some connection to Python, PyData or the community.\r\n* \ud83d\udc4f Keep it respectful. Keep it awesome. Humor is welcome, but please be kind, inclusive, and professional.\r\n* \ud83c\udfa4 Be ready when your name is called. We\u2019re running a tight session \u2014 speakers go on stage rapid-fire. Stay close and stay hyped.\r\n* \ud83c\udfc6 Bonus prizes may be awarded. Best talk, best Bingo moment, most unexpected Hogwarts reference... who knows what could happen?\r\n\r\n#### How to Submit\r\n\r\nThe Lightning Talk desk is located in the Community Space in the Lounge on Level 1.", "recording_license": "", "do_not_record": false, "persons": [{"code": "S3GNBU", "name": "Valerio Maggio", "avatar": "https://pretalx.com/media/avatars/S3GNBU_KZhV6e4.jpg", "biography": null, "public_name": "Valerio Maggio", "guid": "78939915-227f-5f14-99fd-52e1eac75300", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/S3GNBU/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/ESD7KF/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/ESD7KF/", "attachments": []}], "Titanium3": [{"guid": "3b483832-2e84-5f59-8f7a-7be98a6a52c8", "code": "JM3G8S", "id": 60161, "logo": null, "date": "2025-04-23T11:45:00+02:00", "start": "11:45", "duration": "00:30", "room": "Titanium3", "slug": "pyconde-pydata-2025-60161-why-e-on-loves-python", "url": "https://pretalx.com/pyconde-pydata-2025/talk/JM3G8S/", "title": "Why E.ON Loves Python", "subtitle": "", "track": "PyCon: MLOps & DevOps", "type": "Talk", "language": "en", "abstract": "Join me as I share my 20-year journey with Python and its pivotal role at E.ON. Discover how we transitioned fully to Python, streamlined our development framework, and embraced MLOps principles. Learn about some of our AI projects, including image analysis and real-time inference, and our steps towards open-sourcing code to foster innovation in the energy sector. Explore why Python is our go-to language for data science and collaboration.", "description": "In this talk, I will share my journey with Python, spanning over 20 years, and how it has become an integral part of our work at E.ON. My experience with open source began over 30 years ago during my research as a Theoretical Particle Physicist, where sharing insights and code was a daily practice. Transitioning to a software developer role at a start-up, I initially used Perl for various tasks but soon realized the challenges of code readability and collaboration. Python, with its enforced indentation and readability, quickly became my language of choice.\r\n\r\nAt E.ON, Python is our go-to language for Data Science tasks. In our team we recently migrated another programming language codebase to Python to streamline our development framework and attract top talent. Python's straightforward modularization into packages and modules simplifies maintenance and lineage, especially in cloud-based pipelines, and helps prevent vendor lock-in. The robust toolchain for code quality checks, testing, and building packages makes Python a no-brainer for development and supports our MLOps principles.\r\n\r\nI will discuss how Python facilitates collaboration globally at E.ON and share examples of our MLOps principles in action. Highlights include image analysis projects like object detection with batch inferencing and instance segmentation with real-time inference endpoints. Additionally, I will detail E.ON's steps towards open-sourcing some of our codebases, enabling other energy companies to build on our projects.\r\n\r\nJoin me to explore why Python is not just a tool but a catalyst for innovation and collaboration at E.ON.", "recording_license": "", "do_not_record": false, "persons": [{"code": "BDNEEF", "name": "Christer Friberg", "avatar": "https://pretalx.com/media/avatars/BDNEEF_fQTvVdO.jpg", "biography": "Christer Friberg is a Senior Machine Learning Engineer at E.ON Energidistribution AB in Malm\u00f6, Sweden. With a background in theoretical particle physics, Christer transitioned into the field of machine learning and software development, leveraging open-source technologies like Python to drive innovation and collaboration.\r\n\r\nAt E.ON, Christer has been instrumental in several image analysis projects. Notably, Christer co-authored the \"STORM\" project, which aims to improve the asset documentation process and ensure compliance with standards through AI-driven automated checks", "public_name": "Christer Friberg", "guid": "47994ea9-f992-5663-8034-3338487fc4fd", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/BDNEEF/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/JM3G8S/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/JM3G8S/", "attachments": []}, {"guid": "654cd976-cd5e-5d63-a7f7-0b6593cfa4bf", "code": "S8MUBF", "id": 60094, "logo": null, "date": "2025-04-23T12:25:00+02:00", "start": "12:25", "duration": "00:45", "room": "Titanium3", "slug": "pyconde-pydata-2025-60094-why-exceptions-are-just-sophisticated-gotos-and-how-to-move-beyond", "url": "https://pretalx.com/pyconde-pydata-2025/talk/S8MUBF/", "title": "Why Exceptions Are Just Sophisticated Gotos - and How to Move Beyond", "subtitle": "", "track": "PyCon: Programming & Software Engineering", "type": "Talk (long)", "language": "en", "abstract": "\"Why Exceptions Are Just Sophisticated Gotos - and How to Move Beyond\" explores a common programming tool with a fresh perspective. While exceptions are a key feature in Python and other languages, they share surprising similarities with the notorious goto statement. This talk examines those parallels, the problems exceptions can create, and practical alternatives for better code. Attendees will gain a clear understanding of modern programming concepts and the evolution of programming.", "description": "Exceptions have long been seen as an improvement over error-handling approaches like goto. However, they can introduce complexity and obscure control flow when used without care. This talk will critically examine exceptions, outline the similarities to goto, and explore better ways to handle errors in programming.\r\n\r\n### Outline:\r\n\r\n1. Introduction (5 minutes)\r\n    - The historical role of goto in programming.\r\n    - Spaghetti code and the rise of structured programming.\r\n    - How exceptions emerged as an alternative.\r\n2. Why and What Are Exceptions (10 minutes)\r\n    - Why exceptions were introduced.\r\n    - How they became mainstream in languages like Java and C++.\r\n    - Common problems caused by exceptions: hidden control flow, debugging challenges, and performance impacts.\r\n3. The Evolution Toward Result Types (10 minutes)\r\n    - How result types address the shortcomings of exceptions.\r\n    - Implementations in Haskell, Rust, and Golang.\r\n    - Real-world benefits of using result types.\r\n4. Using Result Types in Python (10 minutes)\r\n    - Introducing the returns package.\r\n    - Practical examples of result types in Python.\r\n    - How this approach improves code clarity and reliability.\r\n5. Conclusion (5 minutes)\r\n    - Recap of the journey from goto to exceptions to result types.\r\n    - Key takeaways: thoughtful error handling and modern best practices.\r\n    - Encouragement to explore and adopt better patterns in Python.\r\n\r\nThis session is ideal for intermediate and advanced Python developers seeking actionable techniques to improve error handling and write cleaner, more predictable code.", "recording_license": "", "do_not_record": false, "persons": [{"code": "8LQU9C", "name": "Florian Wilhelm", "avatar": "https://pretalx.com/media/avatars/8LQU9C_vv210Xj.jpg", "biography": "Florian is Head of Data Science & Mathematical Modeling at inovex GmbH, an IT project center driven by innovation and quality, focusing its services on \u2018Digital Transformation\u2019. He holds a PhD in mathematics, has more than 10 years of experience in predictive & prescriptive analytics use-cases and likes everything math \ud83e\udd2f", "public_name": "Florian Wilhelm", "guid": "d8a2dd67-d397-54f5-88e9-b2c680fb4e5c", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/8LQU9C/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/S8MUBF/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/S8MUBF/", "attachments": [{"title": "Slides", "url": "/media/pyconde-pydata-2025/submissions/S8MUBF/resources/Evolut_cctxhcL.pdf", "type": "related"}]}, {"guid": "93375c74-1196-5bf6-9067-b59f83af1962", "code": "G3AT7E", "id": 61362, "logo": null, "date": "2025-04-23T14:30:00+02:00", "start": "14:30", "duration": "00:30", "room": "Titanium3", "slug": "pyconde-pydata-2025-61362-llm-inference-arithmetics-the-theory-behind-model-serving", "url": "https://pretalx.com/pyconde-pydata-2025/talk/G3AT7E/", "title": "LLM Inference Arithmetics: the Theory behind Model Serving", "subtitle": "", "track": "PyData: Generative AI", "type": "Talk", "language": "en", "abstract": "Have you ever asked yourself how parameters for an LLM are counted, or wondered why Gemma 2B is actually closer to a 3B model? You have no clue about what a KV-Cache is? (And, before you ask: no, it's not a Redis fork.) Do you want to find out how much GPU VRAM you need to run your model smoothly? \r\n\r\nIf your answer to any of these questions was \"yes\", or you have another doubt about inference with LLMs - such as batching, or time-to-first-token - this talk is for you. Well, except for the Redis part.", "description": "The talk will cover the theory necessary to understand how to serve LLMs. The talk covers the math behind transformers inference in an accessible and light way. By the end of the talk, attendants will learn:\r\n\r\n1. How to count the parameters in an LLM, especially the ones in the attention layers.\r\n2. The difference between compute and memory in the context of LLM inference.\r\n3. That LLM inference is made up of two parts: prefill and decoding.\r\n4. What is an LLM server, and what features they implement to optimise GPU memory usage and reduce latency\r\n4. How batching affects your inference metrics, like time-to-first-token.\r\n\r\nThe talk will cover:\r\n\r\n**Did you pay attention?** (4 min). A short review of the attention mechanism and how to count parameters in a transformer-based model.\r\n\r\n**Get to know your params** (8 min). The math-y section of the talk, explaining how to translate parameter counts into memory and compute requirements.\r\n\r\n**Prefill and Decoding** (8 min) Explains that inference happens in two steps (prefill and decoding) and how KV-cache exploits this to make decoding faster. Common metrics to measure inference performance, like time-to-first-token and token-per-second.\r\n\r\n**Context and batch size** (5 min) Adds to the picture the sequence length, as well as the number of requests to process in parallel. Explains how LLM servers, like vLLM, use techniques like Paged Attention to optimise GPU usage\r\n\r\n**Conclusion** (5 min) Wrap up, Q&A.", "recording_license": "", "do_not_record": false, "persons": [{"code": "LCEK33", "name": "Luca Baggi", "avatar": "https://pretalx.com/media/avatars/LCEK33_94a2JTa.jpg", "biography": "AI Engineer at xtream by day, and open source maintainer by night. I strive to be an active part of the Python and PyData communities - e.g. as an organiser of PyData Milan. Feel free to reach out!", "public_name": "Luca Baggi", "guid": "d255ece2-e1cc-51dc-9e2e-2620af57f6f1", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/LCEK33/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/G3AT7E/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/G3AT7E/", "attachments": []}, {"guid": "44f245d4-8027-51eb-9442-00aed233ea8f", "code": "GJ9MVT", "id": 61850, "logo": null, "date": "2025-04-23T15:10:00+02:00", "start": "15:10", "duration": "00:45", "room": "Titanium3", "slug": "pyconde-pydata-2025-61850-size-matters-inspecting-docker-images-for-efficiency-and-security", "url": "https://pretalx.com/pyconde-pydata-2025/talk/GJ9MVT/", "title": "Size matters: Inspecting Docker images for Efficiency and Security", "subtitle": "", "track": "PyCon: MLOps & DevOps", "type": "Talk (long)", "language": "en", "abstract": "Inspecting Docker images is crucial for building secure and efficient containers. In this session, we will analyze the structure of a Python-based Docker image using various tools, focusing on best practices for minimizing image size and reducing layers with multi-stage builds. We\u2019ll also address common security pitfalls, including proper handling of build and runtime secrets.\r\n\r\nWhile this talk offers valuable insights for anyone working with Docker, it is especially beneficial for Python developers seeking to master clean and secure containerization techniques.", "description": "1. **Introduction**\r\n    - We start with an example Dockerfile for a Python-based image.\r\n    - We will explore the role of OverlayFS, Docker\u2019s file system for combining layers, to understand how layers stack and how data (or even secrets) can be retrieved from individual layers.\r\n    \r\n2.  **Layer Analysis**\r\n    - To gain better understanding of layering, we use simple command-line tools like `docker history` and `docker inspect` to examine image layers.\r\n    - We introduce `dive`, a tool for exploring the contents of each layer.\r\n    - We apply these insights to optimize the image by implementing multi-stage builds to create a smaller image with fewer layers, improving storage efficiency, build speed, and security.\r\n    - We discuss the benefits of Docker\u2019s caching mechanism in reducing build times.\r\n\r\n3. **Security Enhancements**\r\n    - Given our example image, we will use `trivy`, a comprehensive security scanner, to scan the example image for vulnerabilities and demonstrate how to address common issues.\r\n    - Finally, we introduce `hadolint`, an open-source linter for Dockerfiles.\r\n\r\nTo get the most out of this session, participants are encouraged to clone the session's [repository](https://github.com/pythonmonty/inspect-docker-images).", "recording_license": "", "do_not_record": false, "persons": [{"code": "BZY9AN", "name": "Irena Grgic", "avatar": "https://pretalx.com/media/avatars/BZY9AN_CLaF2al.jpeg", "biography": "As a clean code enthusiast, Women in Tech advocate, DevOps engineer, and mathematician, I have worked in multiple tech fields. My journey has taken me from roles as a data scientist and machine learning engineer to MLOps, culminating in my current position as the lead DevOps engineer of a computer vision platform with hundreds of active users. I possess a broad range of experience in multiple programming languages, creating fast and structured CI/CD pipelines, deploying entire platforms to Kubernetes, and working with various cloud providers. I am passionate about efficient, well-readable, and easily maintainable code and strongly believe that machine learning products should be developed with the same standards as good software.", "public_name": "Irena Grgic", "guid": "146d1329-0cf4-528e-9e36-9caba1e8e83e", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/BZY9AN/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/GJ9MVT/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/GJ9MVT/", "attachments": []}, {"guid": "d9539b59-0b0c-5ddd-b4c7-e5f547f4d805", "code": "TYXMZC", "id": 59827, "logo": null, "date": "2025-04-23T17:10:00+02:00", "start": "17:10", "duration": "00:30", "room": "Titanium3", "slug": "pyconde-pydata-2025-59827-guiding-data-minds-how-mentoring-transforms-careers-for-both-sides", "url": "https://pretalx.com/pyconde-pydata-2025/talk/TYXMZC/", "title": "Guiding data minds: how mentoring transforms careers for both sides", "subtitle": "", "track": "General: Community & Diversity", "type": "Talk", "language": "en", "abstract": "Mentorship is a powerful way to shape careers while building meaningful connections in the data field. In this talk, I\u2019ll share my journey as a professional mentor, what the role entails, and the impact it has on both mentees and mentors. Learn how mentorship drives growth, fosters innovation, and creates value for the data community\u2014and why you should consider stepping into this rewarding role.", "description": "Mentorship is a rewarding journey that allows experienced professionals to guide and empower the next generation of talent. As a mentor in the data field, I have had the privilege of helping individuals navigate their careers, refine their skills, and unlock their potential. In this talk, I will share my personal journey into becoming a professional mentor, how I approach mentorship in a structured and impactful way, and the unique value a mentorship brings to both mentees and mentors.\r\n\r\nI\u2019ll provide insights into the day-to-day activities of mentoring, from offering career guidance to solving technical challenges, while also discussing the importance of tailoring advice to individual goals. Beyond technical skills, mentorship fosters confidence, networking, and long-term growth for mentees while offering mentors opportunities for personal development, deep satisfaction, and a broader industry perspective.\r\n\r\nWith the rapid evolution of the data industry, mentorship has never been more critical. This talk will highlight how professionals at any stage of their career can engage in mentorship to create a ripple effect of positive change in the data community\u2014and why taking the step to become a mentor, paid or otherwise, is an investment in the future of data science and yourself.", "recording_license": "", "do_not_record": false, "persons": [{"code": "BJ3JTQ", "name": "Anastasia Karavdina", "avatar": null, "biography": "My background is particle physics, where I was completely spoiled by access to large amounts of data and the freedom to try out every hot ML algorithm on it. The experiments I participated in were so-called large scale experiments (e.g Large Hadron Collider) and had from 500+ up to 2.5k other people working on them. So in addition to physics, I was exposed to the best software development practices that helped us to avoid a complete mess and destroy the Universe. \r\n\r\nAfterwards I was working as Data Scientist in various fields and recently became \"Solution Architect ML/AI and BI\" at big enterprise company. \r\n\r\nDuring my free time, I like learning new tools and techniques and implementing them in end-to-end AI/ML and IoT projects. My experience has also been very helpful in guiding data analysts, data scientists, and machine learning engineers as a mentor and contributing to the growth of the next generation of data scientist elite.", "public_name": "Anastasia Karavdina", "guid": "486fc76c-9e80-5b7b-9522-aaa03e19ee45", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/BJ3JTQ/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/TYXMZC/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/TYXMZC/", "attachments": []}, {"guid": "e6278906-46e3-5654-b792-c0e0995c23cf", "code": "AWPYGE", "id": 61899, "logo": null, "date": "2025-04-23T17:50:00+02:00", "start": "17:50", "duration": "00:30", "room": "Titanium3", "slug": "pyconde-pydata-2025-61899-the-earth-is-no-longer-flat-introducing-support-for-spherical-geometries-in-spherely-and-geopandas", "url": "https://pretalx.com/pyconde-pydata-2025/talk/AWPYGE/", "title": "The earth is no longer flat - introducing support for spherical geometries in Spherely and GeoPandas", "subtitle": "", "track": "PyData: PyData & Scientific Libraries Stack", "type": "Talk", "language": "en", "abstract": "The geometries in GeoPandas, using the Shapely library, are assumed to be in projected coordinates on a flat plane. While this approximation is often just fine, for global data this runs into its limitations. This presentation introduces spherely, a Python library for working with vector geometries on the sphere, and its integration into GeoPandas.", "description": "Not all geospatial data are best represented using a projected coordinate system. Unfortunately, the Python geospatial ecosystem is almost fully based on planar geometries using Shapely, and is still lacking a general purpose library for efficient manipulation of geometric objects on the sphere. We introduce Spherely: a new Python library that fills this gap, aiming to provide a similar API as Shapely, but then gor geometries on the sphere.\r\n\r\nSpherely provides Python/Numpy vectorized bindings to S2Geometry, a mature and performant C++ library for spherical geometry that is widely used for indexing and processing geographic data, notably in popular database systems. This is done via S2Geography, a C++ library that has emerged from the R-spatial ecosystem and that provides a GEOS-like compatibility layer on top of S2Geometry. Unlike S2Geometry\u2019s SWIG wrappers or S2Sphere (pure-Python implementation), Spherely exposes its functionality via \u201cuniversal\u201d functions operating on n-dimensional Numpy arrays, therefore greatly reducing the overhead of the Python interpreter.\r\n\r\nComplementary to Shapely 2.0, Spherely may be used as a backend geometry engine for Python geospatial libraries like GeoPandas, hence extending their functionality to more robust and accurate manipulation of geographic data (i.e., using longitude and latitude coordinates).\r\n\r\nThis presentation introduces spherely and its capabilities to work with vector geometries on the sphere, and its integration into GeoPandas.\r\n\r\nCode repository: https://github.com/benbovy/spherely", "recording_license": "", "do_not_record": false, "persons": [{"code": "7VUXWM", "name": "Joris Van den Bossche", "avatar": "https://pretalx.com/media/avatars/7VUXWM_5SP7h9s.png", "biography": "I am a core contributor to Pandas and Apache Arrow, and one of the maintainers of GeoPandas and Shapely. I did a PhD at Ghent University and VITO in air quality research, worked at the Paris-Saclay Center for Data Science and at Voltron Data contributing to Apache Arrow. I am a freelance open source software developer and teacher.", "public_name": "Joris Van den Bossche", "guid": "7e876587-827f-57eb-8ec2-ba1bbb58a7f3", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/7VUXWM/"}], "links": [{"title": "Slides", "url": "https://jorisvandenbossche.github.io/talks/2025_PyConDE_spherely/#1", "type": "related"}], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/AWPYGE/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/AWPYGE/", "attachments": []}], "Helium3": [{"guid": "69c924c0-b02d-5f78-9601-72f1ace9e0e5", "code": "MQG9HN", "id": 66106, "logo": null, "date": "2025-04-23T11:45:00+02:00", "start": "11:45", "duration": "00:30", "room": "Helium3", "slug": "pyconde-pydata-2025-66106-introducing-the-synthetic-data-sdk-privacy-preserving-synthetic-data-for-ai-ml", "url": "https://pretalx.com/pyconde-pydata-2025/talk/MQG9HN/", "title": "Introducing the Synthetic Data SDK - Privacy Preserving Synthetic Data for AI/ML", "subtitle": "", "track": "PyData: Data Handling & Engineering", "type": "Sponsored Talk", "language": "en", "abstract": "AI-generated synthetic data is gaining traction as a privacy-safe solution for data access and sharing. This data is created from original datasets, maintaining privacy without compromising utility.\r\n\r\nIn this Session, we'll cover the fundamental concepts of AI-generated synthetic data and demonstrate how easy it is to generate synthetic data within your local compute environment using the open-source Synthetic Data SDK.", "description": "Privacy regulations are tightening globally, making it increasingly challenging for organizations to access and share data while ensuring compliance.\r\n\r\nAI-generated synthetic data is gaining traction as a privacy-safe solution for data access and sharing. This data is created from original datasets, maintaining privacy without compromising utility.\r\n\r\nMOSTLY AI has recently released an efficient and flexible Synthetic Data SDK under a fully permissive Apache v2 license, empowering anyone to generate high-quality synthetic data with top-tier performance. Powered by the TabularARGN model architecture, the SDK achieves training times 10x to 100x faster than existing models, while acchieving a SOTA fidelity-privacy balance.\r\n\r\nIn this Session, we'll cover the fundamental concepts of synthetic data and demonstrate how easy it is to generate synthetic data directly from a Jupyter Notebook using the Synthetic Data SDK. Specifically, we will go through\r\n- Installing the Synthetic Data SDK\r\n- Loading original data into the SDK and locally creating a Generator\r\n- Using a Generator to create different versions of synthetic data\r\n- Uploading a Generator to the MOSTLY AI Platform and sharing it with the world\r\n\r\nThis will be a hands-on session - so come with your laptop and ideally a dataset that you'd like to synthesize!", "recording_license": "", "do_not_record": false, "persons": [{"code": "E7TWUS", "name": "Michael Platzer", "avatar": "https://pretalx.com/media/avatars/E7TWUS_Nt86mBw.jpeg", "biography": "Michael co-founded MOSTLY AI in 2017, led the company as CEO until 2020, and then transitioned to the CTO role. Michael is a world class data scientist who held leading positions at Microsoft and Nokia before founding MOSTLY AI. He was awarded with the Global Marketing Research Award by the American Marketing Association. He holds a PhD degree from the Vienna University of Economics and Business and a Master degree from the Vienna University of Technology. Michael is a proud dad of two daughters and passionate for all kinds of sports including running, biking and baseball.", "public_name": "Michael Platzer", "guid": "3f2d3b8c-d9cb-55ee-915d-99a19872439e", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/E7TWUS/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/MQG9HN/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/MQG9HN/", "attachments": []}, {"guid": "5aadf863-a24e-5dc6-98ac-893de2b90cc3", "code": "ZKNTGN", "id": 60861, "logo": null, "date": "2025-04-23T12:25:00+02:00", "start": "12:25", "duration": "00:45", "room": "Helium3", "slug": "pyconde-pydata-2025-60861-expectation-a-modern-take-on-statistical-a-b-testing-with-e-values-and-martingales", "url": "https://pretalx.com/pyconde-pydata-2025/talk/ZKNTGN/", "title": "expectation: A modern take on statistical A/B testing with e-values and martingales", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Statistics", "type": "Talk (long)", "language": "en", "abstract": "This talk introduces a novel Python library for statistical testing using e-values, offering a refreshing alternative to traditional p-values. We'll explore how this approach enables real-time sequential testing, allowing data scientists to monitor experiments continuously without the statistical penalties of repeated testing. Through practical examples, we'll demonstrate how e-values provide more intuitive evidence measures and enable flexible stopping rules in A/B testing, clinical trials, and anomaly detection. The library implements cutting-edge methods from game-theoretic probability, making advanced sequential testing accessible to Python practitioners. Whether you're conducting A/B tests, monitoring production models, or running clinical trials, this talk will equip you with powerful new tools for sequential data analysis.", "description": "Modern data science demands flexible statistical methods that can handle sequential data analysis and continuous monitoring. Traditional p-values, while widely used, have limitations when dealing with sequential testing scenarios. This talk introduces a Python library that implements e-values and e-processes, offering a more natural approach to measuring statistical evidence and enabling true sequential testing.\r\n\r\nOutline:\r\n1. Statistical toolkit\r\n- Current tools\r\n- Purpose and fundamental concepts\r\n- Challenges in modern statistics\r\n- Type 1 error concerns\r\n- Optional stopping problems\r\n\r\n2. Sequential testing\r\n- Origins\r\n- The concept of sequential testing\r\n- Peeking\r\n\r\n3. e-values\r\n- What are e-values?\r\n- Definitions and concepts\r\n- Betting interpretation\r\n- Wealth process\r\n- Ville's inequality\r\n- Anytime valid inference\r\n- p-value vs. e-value differences \r\n\r\n4. Python library\r\n- Architecture\r\n- Core components\r\n- Installation and basic setup\r\n\r\n5. Demo 1: A/B testing\r\n\r\n6. Beyond A/B testing\r\n- Broader applications\r\n- Conformal e-testing\r\n- Confidence sequences\r\n\r\n7. Demo 2: It is a versatile library\r\n\r\n8. Acknowledgments\r\n\r\nQ&A Session", "recording_license": "", "do_not_record": false, "persons": [{"code": "SJPXAQ", "name": "Jako Rostami", "avatar": "https://pretalx.com/media/avatars/SJPXAQ_5P4Zusw.jpeg", "biography": "I am a Machine Learning Engineer at H&M Group, former Data Scientist at Lidl Sweden, as a professional I am designing Machine Learning services, extracting insights and arranging meaningful stories for my clients by conducting high-quality modeling, engineering, data mining and analytics. \r\n\r\nI have a Bachelor degree in Statistics and Probability theory from Uppsala University of Sweden. Because I am a Statistician at core I have good experience with Data Sciencr, Python, R, time series modeling, simulations, machine learning algorithms, SQL, Excel, Spark and database technologies, as well as good communication skills. \r\n\r\nYou\u2019ll find two comprehensive Python libraries I have open-sourced. One is based on an emerging modern statistical hypothesis testing framework using e-values and martingales based on game-theoretic statistics. The other is for computational Supply Chain and Logistics. The first one is called \u2019expectation\u2019 and the second one is called \u2019supplyseer\u2019 and you can find both on my GitHub.", "public_name": "Jako Rostami", "guid": "5c03a26a-3999-531d-bc32-b6e67578bd2f", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/SJPXAQ/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/ZKNTGN/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/ZKNTGN/", "attachments": []}, {"guid": "8bb6e5e7-a8e3-5b76-9730-1003ead83714", "code": "GUKTNX", "id": 61277, "logo": null, "date": "2025-04-23T14:30:00+02:00", "start": "14:30", "duration": "00:30", "room": "Helium3", "slug": "pyconde-pydata-2025-61277-benchmarking-time-series-foundation-models-with-sktime", "url": "https://pretalx.com/pyconde-pydata-2025/talk/GUKTNX/", "title": "Benchmarking Time Series Foundation Models with sktime", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Statistics", "type": "Talk", "language": "en", "abstract": "Recent time series foundation models such as LagLlama, Chronos, Moirai, and TinyTimesMixer promise zero-shot forecasting for arbitrary time series. One central claim of foundation models is their ability to perform zero-shot forecasting, that is, to perform well with no training data. However, performance claims of foundation models are difficult to verify, as public benchmark datasets may have been a part of the training data, and only the already trained weights are available to the user.\r\n\r\nTherefore, performance in specific use cases must be verified based on the use case data itself to ensure a reliable assessment of forecasting performance. sktime allows users to easily produce a performance benchmark of any collection of forecasting models, foundation models, simple baselines, or custom methods on their internal use case data.", "description": "In the past years, time series foundation models emerged. They have the potential to change time series forecasting. For example, multiple time series models such as LagLlama, Chronos, Moirai, and TinyTimesMixer promise zero-shot forecasting for arbitrary time series. Furthermore, also sktime started to unify the interfaces of the various foundation models to make the usage of those models easy.\u00a0\r\nHowever, whether these time series foundation models provide added value to various forecasting applications is still unclear. Thus, benchmarking is necessary. In sktime, we have implemented a benchmarking module enabling easy comparison of those time series foundation models on custom datasets and with arbitrary metrics.\r\n\r\nOur talk will outline how sktime\u2019s benchmarking module works and how users can use it to evaluate time series foundation models.\u00a0\r\nWe will show how to combine the benchmarking module with the time series foundation models.\r\nWe will show the results of a small benchmarking study using time series foundation models and statistical time series models.\u00a0\r\nWe will outline our roadmap for time series foundation models.\u00a0\r\n\r\nsktime is developed by an open community with the aim of ecosystem integration in a commercially neutral, charitable space. We welcome contributions or donations and seek to provide opportunities for anyone worldwide.", "recording_license": "", "do_not_record": false, "persons": [{"code": "GB89CS", "name": "Benedikt Heidrich", "avatar": "https://pretalx.com/media/avatars/GB89CS_V0kWU9j.jpeg", "biography": "I completed my PhD in deep learning based time series forecasting in 2023 with the Karlsruhe Institute of Technology. In sktime, I am focusing on forecasting methods (mainly deep learning based ones) and implementing pipelines.", "public_name": "Benedikt Heidrich", "guid": "b2a8457c-9a55-5062-b968-09ec6f157270", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/GB89CS/"}, {"code": "JZJFAM", "name": "Franz Kiraly", "avatar": null, "biography": "Franz Kiraly is Director at the German Center for Open Source AI, the by software footprint largest German non-profit for open source AI software.\r\n\r\nHe is also the original founder and a core developer of sktime.", "public_name": "Franz Kiraly", "guid": "a31b00fd-1c5c-5332-ab55-6f868f5ab738", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/JZJFAM/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/GUKTNX/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/GUKTNX/", "attachments": []}, {"guid": "e416f6cc-3d29-525b-9848-300f1452711d", "code": "PRRPQ3", "id": 61785, "logo": null, "date": "2025-04-23T15:10:00+02:00", "start": "15:10", "duration": "00:45", "room": "Helium3", "slug": "pyconde-pydata-2025-61785-pydata-stack-pure-python-open-source-data-platforms", "url": "https://pretalx.com/pyconde-pydata-2025/talk/PRRPQ3/", "title": "PyData Stack: Pure Python open source data platforms", "subtitle": "", "track": "PyData: Data Handling & Engineering", "type": "Talk (long)", "language": "en", "abstract": "Modern open source Python data packages offer the opportunity to build and deploy pure Python, production-ready data platforms. Engineers can and do play a big role in helping companies become data-driven by centralising this data, cleaning and modelling it and presenting back to the business. Now more than ever it allows engineers and companies of any size the ability to build data products and insights for relatively low cost. In this talk we\u2019ll walk through the key components of this stack, tooling options available and demo a deployable containerised Python data stack.", "description": "Modern data platforms can be built and deployed using completely open source, Python packages. In this talk, I\u2019ll cover what constitutes a modern data stack and what open source Python packages can be used to build a stack suitable for the needs of most developers and companies. Rather than a one size fits-all approach, I\u2019ll initially demonstrate the rich ecosystem of technologies available and the pros and cons of the technology choices.\r\n\r\nTo be concrete, we will demo an instance of this type of self-contained, deployable platform that is composed of specific technology choices for the key components: data pipelines, transformation engine, data warehouse, presentation layer and orchestration. This implementation will only use Python with a sprinkling of SQL. \r\n\r\nStructure\r\n1. What is a data stack?\r\n2. Data Stores\r\n3. Pipelines\r\n4. Transformation\r\n5. Orchestration\r\n6. Visualisation\r\n\r\nOutcomes\r\n\r\nThe aim of this talk  is to equip attendees with an understanding of the available python libraries and  the knowledge to build their own data platforms. This would specifically be useful for attendees who may be software or backend engineers who may also be called upon to own the data stack to support business and analyst use cases. It may also help engineers who may be looking to re-platform legacy, expensive data platforms to a more modern data stack. \r\nFor research and personal projects, spinning up a modern platform could be useful for compute heavy analytics that have outgrown local development.", "recording_license": "", "do_not_record": false, "persons": [{"code": "A7QPAE", "name": "Eric Thanenthiran", "avatar": "https://pretalx.com/media/avatars/A7QPAE_pk93ftt.jpg", "biography": "I lead the Engineering function at Tasman Analytics, a boutique data consultancy. We act as an interim/fractional data team and have built many, many data stacks for our clients. We are passionate about helping clients leverage the power of their data. \r\n\r\nPersonally, I have a background of mechanical engineering and have worked across a range of sectors including sustainability, energy, property, construction and architecture. I am an engineer at heart and perennially look to hone the craft of engineering. \r\n\r\nWriting Python makes me incredibly happy.", "public_name": "Eric Thanenthiran", "guid": "da8af761-4ced-5a3a-81d6-d8058766ba54", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/A7QPAE/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/PRRPQ3/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/PRRPQ3/", "attachments": []}, {"guid": "abbd0630-28ff-548a-9fa9-c7afc63091de", "code": "RLTZTC", "id": 60453, "logo": null, "date": "2025-04-23T16:10:00+02:00", "start": "16:10", "duration": "00:30", "room": "Helium3", "slug": "pyconde-pydata-2025-60453-how-to-use-data-science-superpowers-in-real-life-a-bayesian-perspective", "url": "https://pretalx.com/pyconde-pydata-2025/talk/RLTZTC/", "title": "How to use Data Science Superpowers in real life, a Bayesian perspective", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Statistics", "type": "Talk", "language": "en", "abstract": "In the data science field, we use all these powerful methods to solve important problems. Most of the time, we do this very well because our data science and machine-learning toolbox fits the problems we tackle quite precisely. Yet, what about our everyday choices or even our most important life decisions? Can we use for our private lives what we advocate for in our jobs or are these choices inherently different?\r\nMany of this real life decisions are a little different than textbook machine-learning problems. There is often less or hard-to-come-by data and the decisions are infrequent, but sometimes very consequential. This talk will dive into what makes everyday decisions difficult to handle with our data science toolbox. It will show how Bayesian thinking can help to reason in such cases, especially when there is not a lot of data to rely on.", "description": "In this talk, I want to have a look on decision making from a slightly different angle. In a world that produces an ever growing amount of data in every domain, data scientists can shine with their tools to make data-driven decisions. Often there is even too much data and the most tedious part of the work is to remove the noise from the signal with clever feature engineering. Though the world gets covered more and more by big data, this development is not distributed evenly.\r\n\r\nLots of decisions we need to make in real life do not follow this pattern. In fact, there are often surprisingly few data points that help us here. Yet, are there fundamental differences between everyday decisions and the type of decisions we automate so well with machine learning in our jobs? In this part of the talk, I will attempt a characterisation of both types of decisions. We will have a closer look at what implicit assumptions we make to use our machine learning toolbox. After this we might get a first explanation why these tools might be unsuited to answer questions like \u2019how longe should I study for an exam\u2019 or \u201a\u2019should I  accept this new job or not\u2019.\r\n\r\nEnter Bayesian statistics: This part of the talk will introduce Bayesian statistics for beginners using simple examples and images. It will highlight the benefits of the method when we are short of data but have some additional experience not encoded in the data. I will show how in these circumstances prior distributions come in really handy.\r\n\r\nAfter laying the groundwork on Bayesian methods we will circle back to the everyday decisions and see how well both things fit together. On a higher level, this will show what makes problems in decision making a great fit for Bayesian methods. I will introduce this using a practical example. The example will deal with the decision how long one should study for a test or exam. Taking a step-by-step approach, we explore how this decision can be informed with just a few data points. Set aside finding the key to successful exam preparation, the example is also helpful to see some of the basics for working with the pymc library.\r\n\r\nThe talk will end with some more general thoughts. This will answer where to go from here and for which decisions a thorough investigation like the presented one is worthwhile.,Yet, once one is familiar with the basics of Bayesian thinking, there might be shortcuts. I will show that we can use the principles as a great tool to improve discussions about important decisions on a broader scale.", "recording_license": "", "do_not_record": false, "persons": [{"code": "VFC8XU", "name": "Tim Lenzen", "avatar": "https://pretalx.com/media/avatars/VFC8XU_kd3aCQ8.jpeg", "biography": "I am currently working as a Senior Data Scientist at Ailio. My focus is on helping improve organizations by better utilizing their data. I contribute to these transformation projects by bringing in my broad expertise in data related topics ranging from data engineering and cloud-development (AWS, Azure) over data science and machine learning to communication and leadership skills.\r\n\r\nAfter completing my masters in chemistry, I really started my journey in the data science and machine learning field during my PhD studies in theoretical chemistry. The next step for me was a role as a data scientist  in a company developing software in the IT-Security field. For five years, I worked on a system to detect suspicious e-mail traffic using machine learning. Set aside the technical aspect of the job, I also built a small team. From this experience I learnt a lot about leadership and developing software products on a larger scale.\r\n\r\nI strongly believe that using the right data to inform important decisions helps organizations of all kinds improve. However, often this is easier said than done. I am always curios to discover and tackle these interesting challenges. Also, I am more than happy to sharing my knowledge and learnings.", "public_name": "Tim Lenzen", "guid": "e585bf54-c035-5c11-9189-b7bf7819a286", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/VFC8XU/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/RLTZTC/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/RLTZTC/", "attachments": [{"title": "Slides of the talk", "url": "/media/pyconde-pydata-2025/submissions/RLTZTC/resources/PyData_sgInN3r.pdf", "type": "related"}]}, {"guid": "5a2138f2-13be-5dda-9d19-629f4b49c091", "code": "ZHT9HW", "id": 61878, "logo": null, "date": "2025-04-23T17:10:00+02:00", "start": "17:10", "duration": "00:30", "room": "Helium3", "slug": "pyconde-pydata-2025-61878-information-retrieval-without-feeling-lucky-the-art-and-science-of-search", "url": "https://pretalx.com/pyconde-pydata-2025/talk/ZHT9HW/", "title": "Information Retrieval Without Feeling Lucky: The Art and Science of Search", "subtitle": "", "track": "General: Others", "type": "Talk", "language": "en", "abstract": "Search is everywhere, yet effective Information Retrieval remains one of the most underestimated challenges in modern technology. While Retrieval-Augmented Generation has captured significant attention, the foundational element - Information Retrieval - often remains underexplored. \r\n\r\nIn this talk, we put Information Retrieval center stage by asking: \r\nHow do we know that user queries and data 'speak' the same language?\r\nHow do we evaluate the relevance and completeness of search results? And how do we prioritize what gets displayed? Or do we even want to hide specific content?\r\n\r\nWe try to answer these questions by introducing the audience to the art and science of Information Retrieval, exploring metrics such as precision, recall, and desirability. We\u2019ll examine key challenges, including ambiguity, query relaxation, and the interplay between sparse and dense search techniques. Through a live demo using public content from Sendung mit der Maus, we show how hybrid search improves upon vector and keyword based search in isolation.", "description": "Information Retrieval goes beyond keyword matching - it\u2019s about intent, context, and delivering relevant and accurate results. As RAG applications gain traction, understanding the retrieval process becomes more crucial for developers, data scientists, and search engineers.\r\n\r\nWe start with the Why. People have different needs for search - lookup, research, and inspiration. Each of these needs can be influenced and affected by the key IR metrics of search engines: precision, recall, and desirability. Having introduced these fundamentals, we go into common retrieval challenges, such as ambiguity, mismatched vocabularies, and the impact of context.\r\n\r\nAiming to solve these challenges, we then go into advanced search techniques, comparing sparse (keyword-based) and dense (vector-based) retrieval, highlighting their strengths and limitations. We\u2019ll explore hybrid search as a powerful approach that blends these techniques. In a live demo, using crawled data from the Sendung mit der Maus, we\u2019ll showcase a hybrid search setup leveraging tools like Mistral, Elasticsearch, and Streamlit. While the dataset language is German, the core concepts and search dynamics should hopefully be easily understandable also for non native speakers.\r\n\r\nThe talk concludes with key takeaways on building effective search systems and a look ahead at future developments in contextualized search.\r\n\r\nTentative Outline:\r\n1. Introduction to Information Retrieval (~ 5 min)\r\n  * Why do we search? Lookup, research, inspiration\r\n  * Core metrics: precision, recall, desirability\r\n\r\n2. Challenges in Search and Retrieval (~ 5 min)\r\n  * Ambiguity\r\n  * Discrepancy in query and content \r\n  * The impact of context \r\n\r\n3. Search Techniques (~ 5 min)\r\n  * Sparse vs dense retrieval: comparing keyword and vector search (semantic search, embeddings, synsets, decompounders)\r\n  * Hybrid search: Combining sparse and dense approaches\r\n\r\n4. Hybrid Search in Action (~ 10 min)\r\n  * Setting up a hybrid search with Mistral, Elasticsearch, and Streamlit\r\n  * Live Demo: exploring search in Lach- & Sachgeschichten from Sendung mit der Maus\r\n\r\n5. Takeaways & Outlook (< 5 min)\r\n* hybrid search systems combine semantics, precision and explainability\r\n* contextualized search\r\n\r\nThe talk is directed at anyone interested in building or improving search systems. Attendees will gain a deeper understanding of the tools, methodologies, and metrics essential for building robust and explainable search systems.", "recording_license": "", "do_not_record": true, "persons": [{"code": "L7SDGZ", "name": "Anja Pilz", "avatar": "https://pretalx.com/media/avatars/L7SDGZ_RCss5jt.jpeg", "biography": "I received my PhD in Machine Learning (ML) and Natural Language Processing (NLP) from the University of Bonn and Fraunhofer IAIS where I was member of the Text Mining group. Now I work on AI and data driven products, mostly focused on applications in the medical and healthcare domain.\r\nMy main passion is in NLP, especially for the German language, and Information Retrieval (IR). Sometimes I build Recommender Systems.", "public_name": "Anja Pilz", "guid": "639f4213-754e-5858-8199-3580acbc81f3", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/L7SDGZ/"}], "links": [{"title": "slides", "url": "https://de.slideshare.net/slideshow/information-retrieval-without-feeling-lucky-the-art-and-science-of-search/280267238", "type": "related"}], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/ZHT9HW/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/ZHT9HW/", "attachments": []}, {"guid": "315f44ee-7eae-5f15-88a7-3f4f431bad96", "code": "FGFFEE", "id": 61842, "logo": null, "date": "2025-04-23T17:50:00+02:00", "start": "17:50", "duration": "00:30", "room": "Helium3", "slug": "pyconde-pydata-2025-61842-rustzeit-asynchronous-concurrency-in-python-rust", "url": "https://pretalx.com/pyconde-pydata-2025/talk/FGFFEE/", "title": "\ud83e\udd80 R\u00fcstzeit: Asynchronous Concurrency in Python & Rust", "subtitle": "", "track": "General: Rust", "type": "Talk", "language": "en", "abstract": "Many Python developers are enhancing their Rust knowledge and want to take the next step in translating their understanding of advanced concepts like asynchronous programming. \r\n\r\nIn this talk, I'll help you take that step by juxtaposing Python's asyncio with Rust's async ecosystems, tokio and async-std. Through real-world examples and insights from conversations with graingert, co-author of Python's Anyio, we'll explore how each language approaches asynchronous execution, highlighting similarities and differences in syntax, performance, and ecosystem support. \r\n\r\nThis talk aims to persuade you that by leveraging Rust's powerful type system and compiler guarantees, we can build fast, reliable async code that's less prone to race conditions and concurrency bugs. Whether you're a Pythonista venturing into Rust or a Rustacean curious about Python's concurrency model, this session will provide practical insights to help you navigate async programming across both languages.\r\n\r\nWelcome to R\u00fcstzeit: Prepare to navigate async programming across both ecosystems.", "description": "Talk Timings (30 minutes):\r\n\r\nIntroduction and Hybrid Programming in Python and Rust [5 mins]\r\nAsynchronous Programming in Python [5 mins]\r\nAsynchronous Programming in Rust [5 mins]\r\nPerformance Comparison: Python vs. Rust [1 min]\r\nLeveraging Rust's Type System and Compiler Guarantees [5 mins]\r\nCase Study: \"A Million Large Language Monkeys at a Million Typewriters\" \u2013 Building Scalable Microservices with Tokio [7 mins]\r\n(Optional) Tom's Library: AnyIO and Unified Async in Python [3 mins]\r\nConclusion and Takeaways [3 mins]\r\n\r\n---\r\n\r\nMany Python developers are enhancing their Rust knowledge and want to take the next step in translating their understanding of advanced concepts like asynchronous programming. \r\n\r\nIn this talk, I'll help you take that step by juxtaposing Python's asyncio with Rust's async ecosystems, tokio and async-std. Through real-world examples and insights from conversations with graingert, co-author of Python's Anyio, we'll explore how each language approaches asynchronous execution, highlighting similarities and differences in syntax, performance, and ecosystem support. \r\n\r\nThis talk aims to persuade you that by leveraging Rust's powerful type system and compiler guarantees, we can build fast, reliable async code that's less prone to race conditions and concurrency bugs. Whether you're a Pythonista venturing into Rust or a Rustacean curious about Python's concurrency model, this session will provide practical insights to help you navigate async programming across both languages.\r\n\r\nWelcome to R\u00fcstzeit: It's time to prepare for async programming in Python and Rust.\r\n\r\nFurther Resources:\r\n\r\nhttps://rust-lang.github.io/async-book/\r\nhttps://anyio.readthedocs.io/en/stable/\r\nhttps://github.com/graingert", "recording_license": "", "do_not_record": false, "persons": [{"code": "UD3JDR", "name": "Jamie Coombes", "avatar": "https://pretalx.com/media/avatars/UD3JDR_pQElCy4.jpg", "biography": "I am a Machine Learning Engineer with 4 years of Python and PyTorch development experience. I've provided ML expertise to startups and the UK government, and I'm particularly interested in beneficial AI applications. My background is in Physics and Atmospheric Physics, where I interpreted large tropical cyclone datasets at Imperial College London.\r\n\r\nMy previous talks are: \r\nEuroPython Prague 2022 - \ud83d\udc0d Large Language Model Zen\r\nPyCon/PyData DE Berlin 2023 - Mojo \ud83d\udd25 - Is it Python's faster cousin or just hype? \r\n\r\nCompleting my language trilogy: I recently began exploring Rust \ud83e\udd80.", "public_name": "Jamie Coombes", "guid": "d49bf8cb-0ea4-582a-899b-00b89d9993ab", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/UD3JDR/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/FGFFEE/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/FGFFEE/", "attachments": []}], "Platinum3": [{"guid": "0adcdc0b-0548-5376-8c25-f17ae64d4e9c", "code": "AGLBMF", "id": 66135, "logo": null, "date": "2025-04-23T11:45:00+02:00", "start": "11:45", "duration": "00:30", "room": "Platinum3", "slug": "pyconde-pydata-2025-66135-interactive-end-to-end-root-cause-analysis-with-explainable-ai-in-a-python-shiny-app", "url": "https://pretalx.com/pyconde-pydata-2025/talk/AGLBMF/", "title": "Interactive end-to-end root-cause analysis with explainable AI in a Python Shiny App", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Statistics", "type": "Sponsored Talk", "language": "en", "abstract": "We demonstrate a pure Python solution for exploring and understanding datasets using state-of-the-art machine learning and explainable AI techniques. Our application features a reactive dashboard built with Shiny, specifically designed for the daily work of data scientists.\r\n\r\nThe tool provides insights into data rapidly and effortlessly through an interactive dashboard. It facilitates data preprocessing, interactive exploratory data analysis, on-demand model training, evaluation, and interpretation. It further renders dynamic, annotated, and interactive visualizations. This allows to pinpoint critical elements and relations as root causes in a haystack of features, compressing a full day's work into under an hour.\r\n\r\nUtilizing Plotly for dynamic visualizations, along with Scikit-learn, CatBoost, SHAP values, and MLflow for experiment tracking, married with shiny reactive dashboard, we facilitate quick and easy data preprocessing and exploration, model training and evaluation, together with explainable AI.", "description": "Problem Statement\r\nData scientists' daily work is characterized by a repetitive and time-consuming cycle of exploratory data analysis, preprocessing, model training, and feature identification. This ultimately means missing key insights into the data. Time spent on repetitive tasks detracts from critical work. We enable data scientists to focus on what matters.\r\n\r\nSolution\r\nWe streamline the data analysis process to facilitate efficient dataset exploration and uncovering critical insights without time spent on coding. We empower users to seamlessly conduct data preprocessing, interactive exploratory analysis, on-demand model training, evaluation, and interpretation, reducing the time to understand a dataset to under an hour.\r\n\r\nDemonstrator\r\nOur pure Python application features a reactive dashboard. It allows users to engage with data\u2014uploading, manipulating, creating interactive visualizations, performing on-demand model training and interpretation, while tracking results in MLflow. We demonstrate how to quickly deliver insights and identify root causes.\r\n\r\nArchitecture/Technical Implementation\r\nOur application is built entirely in Python, utilizing the Shiny framework for a reactive dashboard. The backend uses Plotly, Scikit-learn, CatBoost, SHAP values, and MLflow. We highlight the core functionalities and development choices, emphasizing data preprocessing, model training, evaluation, and explainable AI features.", "recording_license": "", "do_not_record": false, "persons": [{"code": "M3WRCP", "name": "Julius M\u00f6ller", "avatar": null, "biography": null, "public_name": "Julius M\u00f6ller", "guid": "74a5c883-3dcd-5586-87fc-23fc6e18a382", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/M3WRCP/"}, {"code": "KY3BZH", "name": "Simone Lederer", "avatar": "https://pretalx.com/media/avatars/KY3BZH_saRF83j.jpeg", "biography": "Trained as a mathematician, I quickly delved into the world of machine learning and computational statistics to learn more about cancer dynamics in molecular biology and patient data.\r\nI currently work as a Machine Learning Engineer in the domains of Med-Tech, optics, and semi-conductors at Carl Zeiss AG.", "public_name": "Simone Lederer", "guid": "07c94714-bc2b-5586-967b-f7179984bbbd", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/KY3BZH/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/AGLBMF/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/AGLBMF/", "attachments": []}, {"guid": "9fda372a-8b84-59ad-9762-797c3f4d7f94", "code": "GS9QWQ", "id": 66967, "logo": null, "date": "2025-04-23T12:25:00+02:00", "start": "12:25", "duration": "00:45", "room": "Platinum3", "slug": "pyconde-pydata-2025-66967-generative-ai-monitoring-with-pydanticai-and-logfire", "url": "https://pretalx.com/pyconde-pydata-2025/talk/GS9QWQ/", "title": "Generative AI Monitoring with PydanticAI and Logfire", "subtitle": "", "track": "PyData: Generative AI", "type": "Sponsored Talk (long)", "language": "en", "abstract": "In this talk, we will explore how the integration of PydanticAI and Logfire creates a powerful foundation for generative AI applications. We'll demonstrate how these tools combine to form sophisticated AI workflows and give you comprehensive monitoring.\r\n\r\nThe session illustrates how PydanticAI enables more reliable agent responses while Logfire provides real-time insights for efficient troubleshooting.\r\n\r\nThrough practical examples, you'll learn implementation techniques that will help your team build AI systems with observability, transforming how you develop and maintain generative AI projects. \ud83d\ude80", "description": "In this talk, we'll explore the essential techniques for developing generative AI applications that are not only powerful but also reliable and transparent. By leveraging the combined capabilities of PydanticAI \r\nand Logfire, developers can create systems that deliver consistent results while maintaining full visibility into their operations.\r\n\r\nWe'll begin by examining how to create and configure PydanticAI agents, demonstrating how these structured components can form the backbone of sophisticated AI workflows. This foundation will be enhanced through a detailed exploration of Logfire monitoring implementation using MCP servers, providing a robust observability layer for your applications.\r\n\r\nThe discussion will then shift to evaluation methodologies, offering practical approaches to assess and validate your AI applications' performance and accuracy. We'll delve into the advantages of structured     \r\noutputs, showing how they enable more predictable and testable agent responses across various scenarios.\r\n\r\nFinally, we'll investigate how real-time insights can transform your troubleshooting process, allowing teams to quickly identify bottlenecks and resolve issues before they impact users. By the end of this       \r\nsession, you'll have a comprehensive understanding of how these tools and techniques can elevate your generative AI projects to new levels of reliability and observability.", "recording_license": "", "do_not_record": false, "persons": [{"code": "BGPPXA", "name": "Marcelo Trylesinski", "avatar": "https://pretalx.com/media/avatars/BGPPXA_OrIJlid.JPEG", "biography": "Marcelo Trylesinski, known as \"The FastAPI Expert\", is a passionate software engineer from Brazil \ud83c\udde7\ud83c\uddf7 (half \ud83c\uddfa\ud83c\uddfe, half \ud83c\uddee\ud83c\uddf9).\r\n\r\nCurrently based in Utrecht, Netherlands \ud83c\uddf3\ud83c\uddf1, he actively maintains Starlette \ud83c\udf1f and Uvicorn \ud83e\udd84, contributing significantly as a senior engineer at Pydantic \ud83e\udd13. Marcelo also shares insights about Python and FastAPI via his YouTube channel \ud83c\udfa5.", "public_name": "Marcelo Trylesinski", "guid": "ecf9d37a-4c89-533b-b2fa-a6212e0ec60d", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/BGPPXA/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/GS9QWQ/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/GS9QWQ/", "attachments": []}, {"guid": "a83cc850-bd19-5ac1-bb84-3fbdf8d0c217", "code": "UWTH7C", "id": 65732, "logo": null, "date": "2025-04-23T14:30:00+02:00", "start": "14:30", "duration": "00:30", "room": "Platinum3", "slug": "pyconde-pydata-2025-65732-ai-coding-agent-what-it-is-how-it-works-and-is-it-good-for-developers", "url": "https://pretalx.com/pyconde-pydata-2025/talk/UWTH7C/", "title": "AI coding agent - what it is, how it works and is it good for developers", "subtitle": "", "track": "PyData: Generative AI", "type": "Sponsored Talk", "language": "en", "abstract": "In this talk, we will have a deeper technical look at AI coding agents, their design, and how they can carry out coding tasks with the support of large language models. We will look at the journey from the user entering a prompt to how it converts to actions in completing the task.\r\n\r\nAfter that, we will look at the impact it could make in the industry, as a developer, whether or not you should use an AI coding agent, and what a user should be cautious of when using suchan  agent.", "description": "## Goal\r\n\r\nTo educate developers, especially those who are using it about what an AI coding agent is. Explore potential benefits and also potential harm when using such tools.\r\n\r\n## Target audience\r\n\r\nAnyone who is interested in AI agents, especially AI coding agents, wants to learn more about it and maybe try using them in their work or hobby coding projects.\r\n\r\n## Outline\r\n\r\nWhat are AI agents\r\n    - Examples of AI agents\r\n    - What are AI coding agents\r\nHow do AI coding agents work\r\n    - Components in AI coding agents\r\n    - How your prompts get processed\r\n    - How to convert scripts in actions\r\nPros and cons of using AI coding agent\r\n    - benefit of using AI coding agents\r\n    - what to be aware of when using AI coding agents\r\nConclusions and Q&A", "recording_license": "", "do_not_record": false, "persons": [{"code": "8EGVC9", "name": "Cheuk Ting Ho", "avatar": "https://pretalx.com/media/avatars/8EGVC9_vBWTGiF.jpg", "biography": "After having a career as a Data Scientist and Developer Advocate, Cheuk dedicated her work to the open-source community. Currently, she is working as AI developer advocate for JetBrains. She has co-founded Humble Data, a beginner Python workshop that has been happening around the world. She has served the EuroPython Society board for two years and is now a fellow and director of the Python Software Foundation.", "public_name": "Cheuk Ting Ho", "guid": "716d26c2-170b-5a5e-86e5-9d4cecf3bbdd", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/8EGVC9/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/UWTH7C/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/UWTH7C/", "attachments": []}, {"guid": "0f9f8f31-9274-5c57-abc9-e5b33628363e", "code": "LNW3KE", "id": 61303, "logo": null, "date": "2025-04-23T15:10:00+02:00", "start": "15:10", "duration": "00:45", "room": "Platinum3", "slug": "pyconde-pydata-2025-61303-inclusive-data-for-1-3-billion-designing-accessible-visualizations", "url": "https://pretalx.com/pyconde-pydata-2025/talk/LNW3KE/", "title": "Inclusive Data for 1.3 Billion: Designing Accessible Visualizations", "subtitle": "", "track": "PyData: Visualisation & Jupyter", "type": "Talk (long)", "language": "en", "abstract": "According to the World Health Organization (WHO), an estimated 1.3 billion people (1 in 6 individuals) experience a disability, and nearly 2.2 billion people (1 in 5 individuals) have vision impairment. Improving the accessibility of visualizations will enable more people to participate in and engage with our data analyses.\r\n\r\nIn this talk, we\u2019ll discuss some principles and best practices for creating more accessible data visualizations. It will include tips for individuals who create visualizations, as well as guidelines for the developers of visualization software to help ensure your tools can help downstream designers and developers create more accessible visualizations.", "description": "Specifically, we will cover:\r\n\r\n- What makes data visualizations inaccessible? We will cover accessibility fundamentals like color contrast, alternative text descriptions, keyboard navigation support, screen reader compatibility, and more, with specific examples and demonstrations.\r\n- Are Python data visualization tools accessible? We will teach how to analyze the visualization landscape and discuss how tool developers can begin and prioritize improvements. \r\n- How accessible is my visualization? We will demonstrate how to conduct accessibility audits for data visualization tools by performing and documenting two accessibility evaluation tests live.\r\n\r\nThis talk will include specific examples from our ongoing work to improve the accessibility of Bokeh, a Python library for creating interactive data visualizations for web browsers. We hope this talk enables you to take the first few steps in making your next data visualization and your visualization tools, more accessible.", "recording_license": "", "do_not_record": false, "persons": [{"code": "LLJSBE", "name": "Pavithra Eswaramoorthy", "avatar": "https://pretalx.com/media/avatars/LLJSBE_ORGbB9T.jpg", "biography": "Pavithra Eswaramoorthy is a Developer Advocate at Quansight, where she works to improve the developer experience and community engagement for several open source projects in the PyData community. Currently, she maintains the Bokeh visualization library, and contributes to the Nebari (adjacent to the Jupyter community), and conda-store (part of the conda ecosystem).\r\n\r\nPavithra has been involved in the open source community for over 5 years, notable as an emeritus contributor to the Dask library and Wikimedia Foundation projects. In her spare time, she enjoys a good book and hot coffee. :)", "public_name": "Pavithra Eswaramoorthy", "guid": "1e289f7b-bd99-5631-92fb-f28eb817cdc1", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/LLJSBE/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/LNW3KE/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/LNW3KE/", "attachments": []}, {"guid": "5aa7dfb1-c772-5066-b41c-7ce8f69d1bd4", "code": "7CAVX7", "id": 66440, "logo": null, "date": "2025-04-23T16:10:00+02:00", "start": "16:10", "duration": "00:30", "room": "Platinum3", "slug": "pyconde-pydata-2025-66440-jeannie-an-agentic-field-worker-assistant", "url": "https://pretalx.com/pyconde-pydata-2025/talk/7CAVX7/", "title": "Jeannie: An Agentic Field Worker Assistant", "subtitle": "", "track": "PyData: Generative AI", "type": "Sponsored Talk", "language": "en", "abstract": "Jeannie is an LLM-based agentic workflow implemented in Python to automate task management for field workers in the energy sector. This system addresses inefficiencies and safety risks in tasks like PV panel installation and powerline repair.\r\n\r\nUsing open-source tools (LangChain family, OpenStreetMap and OpenWeatherMap APIs), Jeannie retrieves tasks, fetches weather and directions, identifies past incidents via RAG, and emails tailored reports with safety warnings.\r\n\r\nThis presentation offers a case study of Jeannie\u2019s implementation for E.ON in Germany, demonstrating how daily task automation enhances worker safety and efficiency. Attendees will discover how to create agentic systems with Python, integrate APIs, and apply RAG for safety applications, with access to open-source code and data for replicating the workflow.", "description": "This talk showcases Jeannie, an Agentic LLM workflow which we designed and implemented to automate task management for field workers in the energy sector with a focus on E.ON\u2019s daily routines in Germany.\r\nField workers at E.ON are meant to manage many ongoing and urgent daily tasks, such as installing Photovoltaic panels, repairing powerlines, and revising smart meters, often under tight schedules and varying environmental conditions. Thorough preparation is key to efficient task accomplishment. Preparation steps may include weather assessments at the incident location, navigation guidelines, and knowledge of past incidents to ensure safety. However, manual coordination of these elements is time-consuming and error-prone, leading to inefficiencies and safety risks. \r\nJeannie addresses this problem by automating the entire task management lifecycle.\r\nThe talk will focus on the practical aspects of the system design and implementation using Python and state-of-the-art LLM and an open-source Agentic Workflow stack.\r\nThe core system drives an agent fleet through the following steps: Agents in parallel\r\n\u2022\tretrieve upcoming tasks from a storage facility,\r\n\u2022\tgather critical information for the task location (weather, driving directions),\r\n\u2022\tassess historical accidents at the given location and for similar tasks in the past,\r\n\u2022\tgenerate tailored reports,\r\n\u2022\tsend the reports to workers assigned to the task,\r\n\u2022\tfollow up on task completion,\r\n\u2022\tand log incidents.\r\nThe workflow is orchestrated with LangGraph, leveraging libraries such as SQLAlchemy for database management, requests for API calls to fetch weather and directions (e.g., OpenWeatherMap and OpenStreetMap APIs with Reverse GeoCoding), smtplib for email automation, and an Azure OpenAI 4o endpoint as the LLM powering the Agents. The RAG component uses a vector store (built with the PGVector extension) to identify past incidents, ensuring workers are warned of potential risks specific to their task and locations.\r\nIn the talk, we critically evaluate the system's current state and outline the directions for its further development.", "recording_license": "", "do_not_record": false, "persons": [{"code": "9Q7SVE", "name": "Andrei Beliankou", "avatar": "https://pretalx.com/media/avatars/9Q7SVE_9r7E9jx.png", "biography": "Technical Lead Data & AI working on GenAI topics for E.ON Digital Technology GmbH. Happy to present our work publicly.", "public_name": "Andrei Beliankou", "guid": "b7b724b3-14f9-52f0-bbcc-344af1623eed", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/9Q7SVE/"}, {"code": "K7FQ7S", "name": "Jose Moreno Ortega", "avatar": "https://pretalx.com/media/avatars/K7FQ7S_4v2ef8u.png", "biography": "Jose Moreno Ortega (aka Pepe) is a GenAI Lead at E.ON Digital Technology, shaping AI strategy and driving enterprise adoption. With extensive experience in NLP and GenAI, he has worked as both a consultant and developer, building scalable AI solutions and fostering innovation in the field.", "public_name": "Jose Moreno Ortega", "guid": "04394d70-00f2-57ac-858a-2382487cfdd6", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/K7FQ7S/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/7CAVX7/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/7CAVX7/", "attachments": []}, {"guid": "2ce79815-7bdc-59ec-a3c5-70a31c87c125", "code": "GGJDTW", "id": 66036, "logo": null, "date": "2025-04-23T17:10:00+02:00", "start": "17:10", "duration": "00:30", "room": "Platinum3", "slug": "pyconde-pydata-2025-66036-generative-ai-usecase-specific-evaluation-of-llm-powered-applications", "url": "https://pretalx.com/pyconde-pydata-2025/talk/GGJDTW/", "title": "Generative-AI: Usecase-Specific Evaluation of LLM-powered Applications", "subtitle": "", "track": "PyData: Natural Language Processing & Audio (incl. Generative AI NLP)", "type": "Sponsored Talk", "language": "en", "abstract": "This talk addresses the critical need for usecase-specific evaluation of Large Language Model (LLM)-powered applications, highlighting the limitations of generic evaluation benchmarks in capturing domain-specific requirements. It proposes a workflow for designing more reliable evaluatios to optimize LLM-based applications, consisting of three key activities: human-expert evaluation and benchmark dataset curation, creation of evaluation agents, and alignment of these agents with human evaluations using the curated datasets. The workflow produces two key outcomes: a curated benchmark dataset for testing LLM applications and an evaluation agent that scores their responses. The presentation further addresses the limitations, and best practices to enhance the reliability of evaluations, ensuring LLM applications are better tailored to specific use cases.", "description": "Large Language Models (LLMs) are transformative technology, enabling a wide array of applications, from content generation to interactive chatbots. This technology is leveraged in creating LLM-powered applications. A wide variety of LLMs are offered, followed by independent and generic evaluation of their performance by the LLM community. The requirements and domain-specificity of the usecases behind the LLM-applications, renders this generic evaluation of the LLMs insufficient in revealing their performance issues. Furthermore, the usecase-specific performance evaluation of LLM-applications becomes a necessary component in the design and continuous development of the LLM-applications. \r\nIn this talk, we address the need for usecase-specific evaluation of LLM-applications by proposing a workflow for creating evaluation models that support the selection and optimization of the design of LLM-applications. The workflow is comprised of three main activities: \r\n1)\tHuman-expert evaluation of LLM-applications & benchmark dataset curation \r\n2)\tCreating evaluation agents\r\n3)\tAligning evaluation agents with human evaluation based on the curated dataset\r\nAnd it leads to two concrete outcomes:\r\n1)\tCurated benchmark dataset: against which the LLM-applications will be tested. \r\n2)\tEvaluation Agent: this is the scoring model which automatically evaluates the responses of the LLM-applications. \r\nThe talk will elaborate on the workflow, the limitations, and best practices to increase the reliability of the evaluations considering the limitations.", "recording_license": "", "do_not_record": false, "persons": [{"code": "AU3UXW", "name": "Dr. Homa Ansari", "avatar": "https://pretalx.com/media/avatars/AU3UXW_OcIs0b7.jpg", "biography": "Lead AI/ML scientist at ZEISS Meditec with 10+ years of experience in algorithm design for multimodal unstructured data (image, time series, geospatial data). Expert in developing innovative algorithms with statistical methods, shallow and deep\r\nmachine learning, and pre-trained Large Language Models (LLMs); specifically for satellite data and niche medical sensors. Recipient of innovation awards from the German Aerospace Center (DLR) and IEEE for novel algorithms and data products for satellite missions. Previous work experience at German Aerospace Center (DLR) and DataRobot Inc.", "public_name": "Dr. Homa Ansari", "guid": "fe45738a-7621-50c4-8588-40eb9c3155db", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/AU3UXW/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/GGJDTW/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/GGJDTW/", "attachments": []}, {"guid": "3059eabd-d520-5e7f-8fbe-6117aa8db264", "code": "J7YKEE", "id": 67583, "logo": null, "date": "2025-04-23T17:50:00+02:00", "start": "17:50", "duration": "00:30", "room": "Platinum3", "slug": "pyconde-pydata-2025-67583-from-idea-to-integration-an-intro-to-the-model-context-protocol-mcp", "url": "https://pretalx.com/pyconde-pydata-2025/talk/J7YKEE/", "title": "From Idea to Integration: An Intro to the Model Context Protocol (MCP)", "subtitle": "", "track": "PyData: Generative AI", "type": "Talk", "language": "en", "abstract": "The Model Context Protocol (MCP) has emerged as a standard for connecting Large Language Models with diverse data sources and enabling interactions with other systems. In this talk, we\u2019ll introduce the MCP standard and demonstrate how to build  a MCP Server using real world examples. We\u2019ll then explore its applications, showing how it empowers developers and makes data from complex systems accessible to non-technical users. Finally, we\u2019ll dive into recent protocol updates, including improvements to Streamable HTTP transport and security enhancements, and share practical strategies for deploying MCP servers as well as clients.", "description": "### From Idea to Integration: An Intro to the Model Context Protocol (MCP)\r\n\r\nThe Model Context Protocol (MCP) has emerged as a standard for connecting Large Language Models with diverse data sources and enabling interactions with other systems. In this talk, we\u2019ll introduce the MCP standard and demonstrate how to build  a MCP Server using real world examples. We\u2019ll then explore its applications, showing how it empowers developers and makes data from complex systems accessible to non-technical users. Finally, we\u2019ll dive into recent protocol updates, including improvements to Streamable HTTP transport and security enhancements, and share practical strategies for deploying MCP servers as well as clients.\r\n\r\n **Talk Outline:**\r\n\r\n**Introduction to MCP**\r\n  - What is the Model Context Protocol?\r\n  - Core concepts: context exposure, streaming, and stateless interaction\r\n\r\n**MCP Architecture**\r\n  - Overview of MCP Servers and Clients\r\n\r\n**Building an MCP Server**\r\n  - Creating an MCP Server for Home Assistant\r\n  - Connecting to a SQLite Database\r\n\r\n**Real-World Use Cases**\r\n  - Demo: How MCP empowers developers with contextual tooling\r\n  - Demo: How MCP enables non-technical users to access complex data\r\n\r\n**Recent Protocol Updates**\r\n  - Streamable HTTP transport improvements\r\n  - Security and authentication updates for MCP servers\r\n\r\n**Deployment Best Practices**\r\n  - Deploying MCP servers and clients", "recording_license": "", "do_not_record": false, "persons": [{"code": "PTZEZA", "name": "Julian Beck", "avatar": "https://pretalx.com/media/avatars/PTZEZA_P3ccTBL.png", "biography": "Cloud Platform Engineer @\u00a0inovex, I specialize in designing scalable cloud infrastructure solutions. My expertise spans cloud architecture, container orchestration, and infrastructure automation. Beyond my core work, I maintain active interests in web technologies and mobile app development, exploring solutions that bridge the gap between platforms", "public_name": "Julian Beck", "guid": "7e4eab87-dc03-5155-b67c-5f3cde836256", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/PTZEZA/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/J7YKEE/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/J7YKEE/", "attachments": []}], "Europium2": [{"guid": "ba8b3688-6c80-5e36-a5ce-26cc5da27751", "code": "NF8UPF", "id": 61124, "logo": null, "date": "2025-04-23T12:25:00+02:00", "start": "12:25", "duration": "00:30", "room": "Europium2", "slug": "pyconde-pydata-2025-61124-building-an-open-source-rag-system-for-the-united-nations-negotiations-on-global-plastic-pollution", "url": "https://pretalx.com/pyconde-pydata-2025/talk/NF8UPF/", "title": "Building an Open Source RAG System for the United Nations Negotiations on Global Plastic Pollution", "subtitle": "", "track": "PyData: Natural Language Processing & Audio (incl. Generative AI NLP)", "type": "Talk", "language": "en", "abstract": "Plastic pollution is a significant global challenge. Every year, millions of tons of plastic enter the oceans, impacting marine ecosystems and human health. To address this issue, the United Nations is negotiating a legally binding treaty with representatives from 180 countries, aiming to reduce plastic pollution and promote sustainable practices. \r\n\r\nWe have developed NegotiateAI, an open-source chat application that supports delegations during the UN negotiations on a legally binding agreement to combat plastic pollution. The tool demonstrates how generative AI and Retrieval Augmented Systems (RAG) can address complex global challenges. Built with Haystack 2.0, Qdrant, HuggingFace Spaces, and Streamlit, it showcases the potential of open-source technologies in tackling issues of global relevance. \r\n\r\nAs a beginner or advanced developer, this talk will give you valuable insights into developing impactful AI applications with open source tools in the public sector.", "description": "Plastic pollution is a global crisis that requires urgent action. An estimated 4.8 to 12.7 million tons of plastic end up in the oceans every year. Forecasts show that global plastic waste will triple by 2060. In response, the United Nations is currently negotiating a legally binding agreement to end plastic pollution, involving representatives from 180 countries in a multi-year process. Tools that can streamline these complex negotiations can help support the negotiations.\r\n\r\nThis talk introduces NegotiateAI, an open-source application developed to support delegations during the UN negotiations on a legally binding treaty to end plastic pollution. Developed with Haystack 2.0, Qdrant Vector Storage, HuggingFace Spaces, and Streamlit, NegotiateAI is a concrete example of how generative AI can be harnessed to address global challenges. While RAG is no longer a new concept, its variety continues to make it an essential approach for tackling real-world problems with LLMs, as demonstrated in this application.\r\n\r\nWe will take you on a journey through the development of NegotiateAI from choosing the right tools, to overcoming technical challenges, to using the app in live UN negotiations. Along the way, we will explore the development of a robust RAG system. We\u2019ll also discuss how we leveraged Streamlit to build a user-friendly interface, showcasing features such as multi-tab navigation and custom layouts that make the app intuitive and accessible to end users.\r\n\r\nDuring the session, we will highlight key challenges and present best practices for the coding structure. We will also show how we designed the app to be extensible and allow for the integration of additional data.\r\n\r\nBeginners will gain practical knowledge about building RAG systems and their real-world applications, while advanced developers will be inspired by the technical innovations, tool integration, and the potential of generative AI in the public sector. The talk will also provide insights into how organizations like the GIZ (German International Cooperation Society*)* are using AI to tackle pressing global issues and offer inspiration for anyone interested in the intersection of technology and sustainable development.", "recording_license": "", "do_not_record": false, "persons": [{"code": "YPZZ3S", "name": "Rahkakavee Baskaran", "avatar": "https://pretalx.com/media/avatars/YPZZ3S_2ARmOJL.jpg", "biography": "Rahkakavee Baskaran studied Political Science and Social and Economic Data Science at the University of Konstanz. At &effect, she works as a Data Scientist, Machine Learning Engineer, and Backend Developer, with over four years of experience in Natural Language Processing (NLP).\r\nHer work focuses on leveraging data science and software development to create social impact, particularly in projects related to social sciences and the public sector.", "public_name": "Rahkakavee Baskaran", "guid": "eeeb7f19-7698-50b4-9a58-d2349a250dd5", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/YPZZ3S/"}, {"code": "E3MAF3", "name": "Teresa Kroesen", "avatar": null, "biography": null, "public_name": "Teresa Kroesen", "guid": "947ad6bc-474e-50c2-84dc-1ed9964711aa", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/E3MAF3/"}, {"code": "833C9P", "name": "Anna-Lisa Wirth", "avatar": null, "biography": null, "public_name": "Anna-Lisa Wirth", "guid": "a801851b-e49f-5889-82ff-70d90dae172a", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/833C9P/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/NF8UPF/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/NF8UPF/", "attachments": []}, {"guid": "29bbaa20-d358-5af7-ad36-0b6238731035", "code": "VDG9YG", "id": 61119, "logo": null, "date": "2025-04-23T14:30:00+02:00", "start": "14:30", "duration": "00:30", "room": "Europium2", "slug": "pyconde-pydata-2025-61119-taking-control-of-llm-outputs-an-introductory-journey-into-logits", "url": "https://pretalx.com/pyconde-pydata-2025/talk/VDG9YG/", "title": "Taking Control of LLM Outputs: An Introductory Journey into Logits", "subtitle": "", "track": "PyData: Natural Language Processing & Audio (incl. Generative AI NLP)", "type": "Talk", "language": "en", "abstract": "This talk explores logits - the raw confidence scores that language models generate before selecting each token. Understanding and manipulating these scores gives you practical control over how models generate text.\r\n\r\nIn this introductory session, we'll explore the token-by-token generation process, examining how tokenizers work and why vocabulary matters. You'll learn about the relationship between logits, probabilities, and tokens. Then we will cover constrained decoding approaches and talk about structured generation.", "description": "Logits are the raw numerical scores that language models compute for each token in their vocabulary before making a selection. These scores are converted to probabilities and used internally for token selection. Accessing and analyzing them directly opens up possibilities for controlling and understanding model behavior.\r\n\r\nWe'll cover common sampling techniques like temperature adjustment, top-k, and top-p filtering, and beam search. \r\n\r\nThen we will see how logits can be used to evaluate model uncertainty, causing hallucinations.\r\n\r\nAnd we will talk about structured generation to use language models in deterministic projects. We will see how the logit values can be used to guide the generation process. Lastly we will explore the libraries like outlines and guidance by showcasing some example snippets about how to use them.\r\n\r\nIf \"token by token\" is your only answer when someone asks how LLMs generate text, come join us and let's dig deeper together!", "recording_license": "", "do_not_record": false, "persons": [{"code": "VMEMAF", "name": "Emek G\u00f6zl\u00fckl\u00fc", "avatar": "https://pretalx.com/media/avatars/VMEMAF_jOyaomI.jpeg", "biography": "techie, software engineer & researcher building ai/ml tools with keen interest in edtech. co-founder and builder of Quipu. \r\n\r\nalso working as a part-time engineer at MICE Portal, where he supports transformation of the company processes with agentic ai-backed approaches.", "public_name": "Emek G\u00f6zl\u00fckl\u00fc", "guid": "f6c8371e-c078-5335-a3d6-c1b3400d8181", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/VMEMAF/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/VDG9YG/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/VDG9YG/", "attachments": []}, {"guid": "a6548c73-2405-5d3b-8c0d-b8f1c95328b5", "code": "DQTMJB", "id": 61309, "logo": null, "date": "2025-04-23T15:10:00+02:00", "start": "15:10", "duration": "00:45", "room": "Europium2", "slug": "pyconde-pydata-2025-61309-beyond-basic-prompting-supercharging-open-source-llms-with-lmql-s-structured-generation", "url": "https://pretalx.com/pyconde-pydata-2025/talk/DQTMJB/", "title": "Beyond Basic Prompting: Supercharging Open Source LLMs with LMQL's Structured Generation", "subtitle": "", "track": "PyData: Natural Language Processing & Audio (incl. Generative AI NLP)", "type": "Talk (long)", "language": "en", "abstract": "This intermediate-level talk demonstrates how to leverage Language Model Query Language (LMQL) for structured generation and tool usage with open-source models like Llama. You will learn how to build a RAG system that enforces output constraints, handles tool calls, and maintains response structure - all while using open-source components. The presentation includes hands-on examples where audience members can experiment with LMQL prompts, showcasing real-world applications of constrained generation in production environments.", "description": "1. Introduction to structured generation with LMQL and open-source LLMs\r\n   - Key differences between constrained and free-form generation\r\n   - Why structure matters for production applications\r\n   - Setting up LMQL with Llama\r\n\r\n2. Building a RAG system with structured outputs\r\n   - Implementing context retrieval with constraints\r\n   - Enforcing response formats through LMQL decorators\r\n   - Handling edge cases and error states\r\n\r\n3. Tool usage and function calling\r\n   - Implementing tool calls through LMQL\r\n   - Managing tool execution flow\r\n   - Error handling and fallbacks\r\n\r\n4. Interactive segment\r\n   - Audience members will write and test their own LMQL prompts through a live demo environment\r\n\r\n5. Production considerations\r\n   - Scaling structured generation\r\n   - Monitoring and logging strategies\r\n\r\nAttendees will leave with practical knowledge of how to implement structured generation in their own projects using LMQL, understanding both the technical implementation and best practices for production deployment.", "recording_license": "", "do_not_record": false, "persons": [{"code": "QWNPUD", "name": "Christiaan Swart", "avatar": "https://pretalx.com/media/avatars/QWNPUD_kG5KD2H.jpeg", "biography": "On a mission to structure unstructured text with NLP\r\n\r\nEx-cofounder with 8 years experience in NLP\r\n\r\nI come from a mixed Hungarian-Dutch background and live in Nuremberg at the moment\r\n\r\nIn my free time I enjoy improv theatre and swimming", "public_name": "Christiaan Swart", "guid": "8458078b-c79b-5f38-b036-bfe5efc49c0b", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/QWNPUD/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/DQTMJB/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/DQTMJB/", "attachments": []}, {"guid": "01fd0ced-954f-5aea-be84-798bca8ff56c", "code": "MSUCAS", "id": 61339, "logo": null, "date": "2025-04-23T16:10:00+02:00", "start": "16:10", "duration": "00:30", "room": "Europium2", "slug": "pyconde-pydata-2025-61339-beyond-fomo-keeping-up-to-date-in-ai", "url": "https://pretalx.com/pyconde-pydata-2025/talk/MSUCAS/", "title": "Beyond FOMO \u2014 Keeping Up-to-Date in AI", "subtitle": "", "track": "General: Education, Career & Life", "type": "Talk", "language": "en", "abstract": "The rapid evolution of AI technologies, particularly since the emergence of Large Language Models, has transformed the data science landscape from a field of steady progress to one of constant breakthroughs. This acceleration creates unique challenges for practitioners, from managing FOMO to battling imposter syndrome. Drawing from personal experience transitioning from mathematical modeling to modern AI development, this talk explores practical strategies for staying current while maintaining sanity. We'll discuss building effective learning structures, creating collaborative knowledge-sharing environments, and finding the right balance between innovation and implementation. Attendees will leave with actionable insights on navigating technological change while fostering sustainable growth in their teams and careers.", "description": "The landscape of data science and AI is evolving at an unprecedented rate. What started as a relatively stable field of mathematical modeling and time-series analysis has transformed into a whirlwind of weekly breakthroughs, especially since the emergence of Large Language Models. How do we stay current without succumbing to FOMO or imposter syndrome?\r\n\r\nIn this talk, I'll share my personal journey from traditional mathematical modeling to modern AI development, exploring how the field's pace has shifted dramatically. Drawing from real-world experiences as a consultant and team lead, I'll discuss practical strategies for maintaining technical excellence while managing the psychological challenges of rapid technological change. We'll examine how to build effective learning structures within teams, the importance of creating safe spaces for knowledge sharing, and why sometimes it's okay to not be at the cutting edge of every new development.\r\n\r\nThrough concrete examples and lessons learned, I'll offer insights on balancing client expectations, team growth, and personal development in an era where the technological landscape shifts weekly. Whether you're a seasoned data scientist or just entering the field, this talk will provide practical frameworks for navigating the exciting yet overwhelming world of modern AI development.", "recording_license": "", "do_not_record": false, "persons": [{"code": "CFQECP", "name": "Carsten Frommhold", "avatar": "https://pretalx.com/media/avatars/CFQECP_nkjg8Fh.jpg", "biography": "Hi! I am Carsten. I have been working in the data and analytics environment for seven years.  As a data scientist, I am excited by the challenge of translating the business into optimizable algorithms and creating real impact. As a self-taught programmer, I am just as absorbed in the technical challenges, preferably in the cloud.", "public_name": "Carsten Frommhold", "guid": "5d7c3bd8-4938-5908-ae06-bc58cc19e435", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/CFQECP/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/MSUCAS/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/MSUCAS/", "attachments": []}, {"guid": "2fb1bbfb-1244-5851-a343-aa30142b50c9", "code": "7RLYSQ", "id": 66254, "logo": null, "date": "2025-04-23T17:10:00+02:00", "start": "17:10", "duration": "00:30", "room": "Europium2", "slug": "pyconde-pydata-2025-66254-secure-human-in-the-loop-interactions-for-ai-agents", "url": "https://pretalx.com/pyconde-pydata-2025/talk/7RLYSQ/", "title": "Secure \u201cHuman in the Loop\u201d Interactions for AI Agents", "subtitle": "", "track": "PyData: Generative AI", "type": "Sponsored Talk", "language": "en", "abstract": "Explore the power of Human-in-the-Loop (HITL) for GenAI agents! Learn how to build AI systems that augment your abilities, not replace your judgment, especially when high-stakes actions are involved. This session will focus on practical implementation using Python and Langchain to stay in control.", "description": "Imagine a world where AI agents handle complex tasks on your behalf \u2013 managing your finances, optimizing energy consumption in your home, or even coordinating logistics for a global supply chain. The potential benefits are enormous, but what happens when these agents need to perform critical actions?\r\n\r\nMost of us would probably prefer to have a say in those decisions. We want AI to augment our abilities, not replace our judgment, especially when high-stakes actions are involved. In this session we explore how to add a Human in the Loop (HITL) capabilities to your GenAI agents using Python and Langhain.", "recording_license": "", "do_not_record": false, "persons": [{"code": "7HCKTC", "name": "Juan Cruz Martinez", "avatar": "https://pretalx.com/media/avatars/7HCKTC_xveYGF7.jpeg", "biography": "Juan Cruz Martinez is a curious software engineer who loves building things. From web apps to AI integrations, he enjoys crafting solutions that make a difference. He's always learning and experimenting with new tech.", "public_name": "Juan Cruz Martinez", "guid": "70936512-9ce4-58ed-9eff-c66d122a5ad2", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/7HCKTC/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/7RLYSQ/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/7RLYSQ/", "attachments": []}, {"guid": "3e381b69-2ff8-5d00-abee-5344adc6c19b", "code": "BLKYGU", "id": 60719, "logo": null, "date": "2025-04-23T17:50:00+02:00", "start": "17:50", "duration": "00:30", "room": "Europium2", "slug": "pyconde-pydata-2025-60719-streamlining-python-deployment-with-pixi-a-perspective-from-production", "url": "https://pretalx.com/pyconde-pydata-2025/talk/BLKYGU/", "title": "Streamlining Python deployment with Pixi:  A Perspective from production", "subtitle": "", "track": "PyCon: MLOps & DevOps", "type": "Talk", "language": "en", "abstract": "In our quest to improve Python deployments, we explored Pixi, a tool designed to enhance dependency management within the Conda ecosystem. This talk recounts our experience integrating Pixi into a setup used in production. We leveraged Pixi to create lockfiles, ensuring consistent builds, and to automate deployments via CI/CD pipelines. This integration led to greater reliability and efficiency, minimizing deployment errors and allowing us to concentrate more on development. Join us as we share how Pixi transformed our deployment process and offer insights into optimizing your own workflows.", "description": "In modern software development, managing dependencies effectively is crucial for ensuring that applications run smoothly across various environments. This talk explores our journey to optimize Python deployments by integrating Pixi into our workflow. As a tool that enhances the Conda ecosystem, Pixi offers a reliable and efficient solution to the common challenges in dependency management. While concepts such as consistent builds, reproducibility, and automated deployments are well-established, Pixi simplifies their implementation within a Conda-based environment, making these practices more accessible and manageable.\r\n\r\nThe talk will cover\r\n- DevOps Concept\r\n  Introducing concepts like lockfile, reproducible environments and CI/CD pipeline to set out a good\r\n  baseline for deploying python code productively\r\n- Conda vs Pypi comparison\r\n  Considering the tradeoffs between isolation and development comfort\r\n- Pixi introduction\r\n  An introduction to the philosphy of pixi and how it compares to other conda tooling.\r\n  This also covers how Pixi streamlines the implementation of DevOps concepts\r\n- Implementing DevOps concepts using pixi\r\n  \r\nThis talk is designed for professional software developers who prioritize a robust setup for deploying Python code as services into production. While familiarity with the Conda ecosystem is beneficial, it is not a prerequisite for this session.", "recording_license": "", "do_not_record": false, "persons": [{"code": "CSHKRY", "name": "Dennis Weyland", "avatar": "https://pretalx.com/media/avatars/CSHKRY_hDrOCF9.jpg", "biography": "Hey there! I'm Dennis Weyland, and I've been part of the Blue Yonder team for the last five years. I kicked off my career as a Data Scientist but soon found my groove in Data Engineering. Python has been my go-to language for the past seven years, and I love diving into project setups to make everything run smoothly. Before diving into my professional career, I studied Physics at KIT, where I completed my master's thesis and discovered my passion for Python software development and machine learning.\r\n\r\nOutside of work I'm passionate about running, diving, and climbing.", "public_name": "Dennis Weyland", "guid": "dbe02752-0362-5170-9c22-253b1e834fd7", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/CSHKRY/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/BLKYGU/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/BLKYGU/", "attachments": [{"title": "Slides", "url": "/media/pyconde-pydata-2025/submissions/BLKYGU/resources/Stream_KNWw6We.pdf", "type": "related"}]}], "Hassium": [{"guid": "e218fd95-97c7-5159-9543-e1b3b6cba088", "code": "EN3QPQ", "id": 60405, "logo": null, "date": "2025-04-23T11:45:00+02:00", "start": "11:45", "duration": "00:30", "room": "Hassium", "slug": "pyconde-pydata-2025-60405-are-llms-the-answer-to-all-our-problems", "url": "https://pretalx.com/pyconde-pydata-2025/talk/EN3QPQ/", "title": "Are LLMs the answer to all our problems?", "subtitle": "", "track": "General: Ethics & Privacy", "type": "Talk", "language": "en", "abstract": "Generative AI models have shaken up the German market. Since the release of ChatGPT, AI is available and usable for everyone. The number of ChatGPT-based agents is growing rapidly, but concerns about privacy, copyright and ethics remain. Regulation and ethical AI go hand in hand, but are often seen as barriers. The presentation will cover the different aspects of ethics and how they are addressed by regulation. It will give an overview of how to use large language models in a safe and practical way. This won't only address the various ethical issues, but also convince your next customer to invest in your AI-based product.", "description": "The talk will delve into the complexities of large language models (LLMs), exploring their capabilities and challenges. We'll look at bias in face recognition and word2vec, highlighting cases such as the COMPAS system and Amazon's recruitment tool, which have raised concerns about fairness and accuracy. The intersection of LLM and copyright will also be discussed, including the use of copyrighted material in training data and potential infringement issues.  When talking about data, regulations such as the EU AI Act, the CLOUD Act and data privacy will be examined, raising important questions about data sovereignty and cross-border data transfers. \r\n\r\nThe environmental impact of LLMs will be addressed, focusing on their significant carbon footprint and the need for sustainable solutions. An overview of the LLM landscape will be provided, including English models and European alternatives. By exploring these topics, participants will gain a deeper understanding of the opportunities and challenges presented by LLMs, as well as the regulatory frameworks and best practices that can help mitigate their risks.", "recording_license": "", "do_not_record": false, "persons": [{"code": "LT7LHM", "name": "Dr. Maria B\u00f6rner", "avatar": "https://pretalx.com/media/avatars/LT7LHM_GglaO1L.jpg", "biography": "Dr Maria B\u00f6rner is a legal tech expert in the use of AI and heads the AI Competence Centre at Westernacher Solutions. In her role, she is responsible for the development of AI tools in the government, legal and church sectors. She has been working in AI for more than 8 years and bridges the gap between AI development and customers. She volunteers to support the Women in AI network by organising partnerships and visibility.", "public_name": "Dr. Maria B\u00f6rner", "guid": "71e27123-f3ab-5a5a-b328-ca90ab90adff", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/LT7LHM/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/EN3QPQ/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/EN3QPQ/", "attachments": []}, {"guid": "24638e24-6b4b-5643-a577-583317a41eea", "code": "933YXH", "id": 61350, "logo": null, "date": "2025-04-23T12:25:00+02:00", "start": "12:25", "duration": "00:30", "room": "Hassium", "slug": "pyconde-pydata-2025-61350-the-aesthetics-of-ai-from-cyberpunk-to-fascism", "url": "https://pretalx.com/pyconde-pydata-2025/talk/933YXH/", "title": "The aesthetics of AI: from cyberpunk to fascism", "subtitle": "", "track": "General: Others", "type": "Talk", "language": "en", "abstract": "Let\u2019s explore the visual grammars, references and cultural norms at play in the field of AI; from Kismet to Spot\u00ae, from Clippy to Claude. As a sector we can be hyper-focused on technical process and function, to the extent that it blinkers our understanding of the cultural and political impacts of our work. Aesthetics infuse every aspect of technology. Aesthetic interpretations are manifold and mutable, constructed in-congress with the observer and not fully defined by the original designer. AI technologies add additional layers of subtext: character, consciousness, agency, intent.\r\n\r\nDespite this murkiness, or perhaps because of it, this talk makes an passionate argument for engaging with historical aesthetic movements, for building our shared professional knowledge of fads and fashions\u23afnot just from the past 40 years of internet culture\u23afbut also the past 140 years of ideology, technology, and thought.", "description": "**Talk Outline** \r\n- Define aesthetics\r\n- Differentiate aesthetics from visual design\r\n- Aesthetics of contemporary AI\r\n- History of aesthetics in technology\r\n- Link current technologies to historical aesthetic movements\r\n\r\n**Detailed description**\r\nThe field of artificial intelligence has long been dominated by discussions of technical capabilities, algorithmic improvements, and functional benchmarks. Beneath this technical layer lies a rich but often unexamined tapestry of visual and cultural decisions that profoundly shape how we perceive, interact with, and ultimately integrate AI systems into our society. \r\n\r\nThis talk moves beyond simple visual design to explore how aesthetics \u2013 study of the principles of beauty and artistic taste \u2013 shapes both the creation and interpretation of AI technologies. The aesthetic choices woven into today\u2019s AI interfaces, both for end-users and industry practitioners, reveal our deep-seated assumptions about the world. \r\n\r\nPhilosophers of art ask us to introspect: What is goodness? What is beauty? Which endeavours are most worthy of our attention? We can use these same questions to explore the ideas framing AI.\r\n\r\n\u201cAll watched over by machines of loving grace\u201d, a poem by Richard Brautigan, imagines a utopian future where the natural and technological worlds achieve balance and harmony, and where humans are free to pursue creative, embodied pursuits, freed of menial labour. Is this the utopia imagined by OpenAI or DeepMind when they describe the imminent arrival of AGI? Does the world as described by the big brands of AI actually align with our own imaginings of progress, of utopia? \r\n\r\nA brief historical overview will trace how technological aesthetics have evolved, examining how different eras have visualized and presented technological innovations. This context sets the stage for drawing direct connections between current AI aesthetics and historical movements \u2013 revealing how contemporary design choices often unconsciously echo past ideological and artistic approaches.\r\n\r\nThrough these connections, I\u2019ll demonstrate why developing a broader aesthetic literacy is crucial for AI practitioners. Understanding these historical and cultural reference points can lead to more thoughtful and effective uses of AI. As our field continues to shape the future of human-machine interaction, this aesthetic awareness becomes not just an academic exercise, but a practical necessity: providing both the groundwork for nuanced critique, and the capacity to clearly define how we expect technologies to fit into and improve our lives.", "recording_license": "", "do_not_record": false, "persons": [{"code": "NYBL9E", "name": "Laura Summers", "avatar": "https://pretalx.com/media/avatars/NYBL9E_5PTdV1E.jpg", "biography": "Laura is a very technical designer\u2122\ufe0f, working at  Pydantic as Lead Design Engineer. Her side projects include Sweet Summer Child Score (summerchild.dev) and Ethics Litmus Tests (ethical-litmus.site). Laura is passionate about feminism, digital rights and designing for privacy. She speaks, writes and runs workshops at the intersection of design and technology.", "public_name": "Laura Summers", "guid": "df9768b0-6085-5a20-942c-ec81c2c91343", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/NYBL9E/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/933YXH/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/933YXH/", "attachments": []}, {"guid": "10794e4e-61e9-5c73-83dd-0bde7e9eb991", "code": "HYE8EX", "id": 61092, "logo": null, "date": "2025-04-23T14:30:00+02:00", "start": "14:30", "duration": "00:30", "room": "Hassium", "slug": "pyconde-pydata-2025-61092-autonomous-browsing-using-large-action-models", "url": "https://pretalx.com/pyconde-pydata-2025/talk/HYE8EX/", "title": "Autonomous Browsing using Large Action Models", "subtitle": "", "track": "PyData: Natural Language Processing & Audio (incl. Generative AI NLP)", "type": "Talk", "language": "en", "abstract": "The browser serves as our gateway to the internet\u2014the largest repository of knowledge in human history. Proficiency in its use is a core skill across nearly all professions and is becoming increasingly important for Artificial Intelligence. But can Large Action Models (LAMs) autonomously operate a browser? What exactly are LAMs that promise to translate human intentions into actions? We report on a project that fully automates the job application process using AI: from navigating unfamiliar website structures and filling out forms to handling document uploads and cookie banners.", "description": "Large Action Models (LAMs) were first introduced by Rabbit with the launch of their R1 device, aiming to create end-to-end trained models that automatically translate human instructions into actions. Since then, the definition of LAMs has evolved to encompass Large Language Models (LLMs) utilized in multi-agent settings. Notable examples include Anthropic's \"Computer Use\" feature in their Claude model and Google's Project Mariner. These projects allow LLMs to operate a web browser or computer in a human-like manner by viewing the screen, moving the cursor, clicking buttons, and typing text, thereby fulfilling the original promise of LAMs by effectively translating human instructions into automated actions.\r\n\r\nWe present an innovative application of LAMs that automates the job application process using AI. Our system autonomously navigates unfamiliar website structures, fills out forms, handles document uploads, and manages cookie banners without human intervention. This level of automation streamlines the application process for job seekers while ensuring accurate and timely submissions.\r\n\r\nTo achieve this, we leveraged the LaVague framework, which employs a modular, agent-based approach:\r\n\r\n1.\t**Coordinator Agent:** A central agent powered by a multimodal model coordinates the entire process. It has access to website visuals, user data (e.g., personal details, CV information), previous instructions, and the overall objective. Based on this information, it delegates tasks to specialized agents.\r\n2.\t**Navigation Control Agent:** For simple website navigation, this agent utilizes a browser driver such as Selenium to directly interact (e.g., scroll) with the webpage.\r\n3.\t**Knowledge Agent:** When additional information is required, this agent performs knowledge-intensive tasks using an LLM. Examples include researching specific details or restructuring CV data.\r\n4.\t**Navigation Engine Agent:** For complex website interactions like inputting values or uploading files, this agent generates custom code for the browser driver. Using an LLM with access to the HTML code, it creates the necessary commands.\r\n\r\nThese agents work iteratively, performing tasks step by step until either the objective is achieved, or a maximum number of steps is reached.\r\n\r\nBy building a custom solution around the LaVague framework tailored specifically for the job application process, we successfully automated the entire workflow. In our presentation, we discuss our overall architecture, the challenges encountered during development and share valuable lessons learned for practical adoption.\r\n\r\nLarge Action Models like these highlight the transformative potential of AI in automating intricate tasks, bridging the gap between understanding human intentions and executing them in dynamic, real-world scenarios.", "recording_license": "", "do_not_record": false, "persons": [{"code": "LDZFAV", "name": "Arne Grobr\u00fcgge", "avatar": "https://pretalx.com/media/avatars/LDZFAV_9t8ZISl.jpeg", "biography": "Arne Grobr\u00fcgge, M. Sc. Wirtschaftsinformatiker mit Schwerpunkt Maschinelles Lernen und Informationssicherheit, arbeitet als Data Scientist bei der scieneers GmbH. Im Rahmen von diversen Kundenprojekten entwickelt und \u00fcberwacht er den Einsatz von Sprachmodellen und Mulit-Agenten Systemen in Unternehmen, um innovative und wertsch\u00f6pfende L\u00f6sungen zu schaffen.", "public_name": "Arne Grobr\u00fcgge", "guid": "b4f611aa-0c9b-5eb4-8d58-76b33f1e8f83", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/LDZFAV/"}, {"code": "HWZVSQ", "name": "Nico Kreiling", "avatar": "https://pretalx.com/media/avatars/HWZVSQ_AYYtX1X.jpg", "biography": null, "public_name": "Nico Kreiling", "guid": "33a762b5-efbe-555e-a8c8-7298323b38c9", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/HWZVSQ/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/HYE8EX/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/HYE8EX/", "attachments": []}, {"guid": "9d3c4a9c-b246-5c57-a179-eeb9a094406b", "code": "UVPALT", "id": 60158, "logo": null, "date": "2025-04-23T15:10:00+02:00", "start": "15:10", "duration": "00:30", "room": "Hassium", "slug": "pyconde-pydata-2025-60158-pdfs-when-a-thousand-words-are-worth-more-than-a-picture-or-table", "url": "https://pretalx.com/pyconde-pydata-2025/talk/UVPALT/", "title": "PDFs - When a thousand words are worth more than a picture (or table).", "subtitle": "", "track": "PyData: Generative AI", "type": "Talk", "language": "en", "abstract": "PDF, a must-have in RAG systems, ensures visual fidelity across platforms and devices, at the expense of compromising what would be the core condition for computers to properly process and interpret text: semantics. That means any logical arrangement of text, upon rendering, explodes into dummy visual shards of data that literally portrait the bigger picture for the human eye to perceive, but no longer convey the information computers should grasp. Such a bottleneck already makes proper ingestion of text-only documents a big challenge, let alone when tables or figures come into play, the ultimate nightmare for PDF parsers, not to say developers. The rest you must have already foreseen: a RAG system barfing unreliable knowledge from bad chunks (based on regular PDF parsing), if those ever get to be retrieved from a vector database. In this talk you can gather some vision-driven insights on how to leverage the strengths of PDF and language models towards good chunks to be ingested. Or, in other words, how multimodal models can go beyond trivial reverse engineering by decomposing tables into its building blocks, in plain language, as how those would be explained to another human; or better yet, as how humans would ask questions about such pieces of knowledge. And from such a strategy, we transfer the same rationale to figures. Come along, gather some insights, and get inspired to break down tables and figures from your own PDFs, and to improve retrieval in your RAG systems.", "description": "PDF, a must-have in RAG systems, ensures visual fidelity across platforms and devices, at the expense of compromising what would be the core condition for computers to properly process and interpret text: semantics. That means any logical arrangement of text, upon rendering, explodes into dummy visual shards of data that literally portrait the bigger picture for the human eye to perceive, but no longer convey the information computers should grasp. Such a bottleneck already makes proper ingestion of text-only documents a big challenge, let alone when tables or figures come into play, the ultimate nightmare for PDF parsers, not to say developers. The rest you must have already foreseen: a RAG system barfing unreliable knowledge from bad chunks (based on regular PDF parsing), if those ever get to be retrieved from a vector database.\r\n\r\nIn this talk you can gather some vision-driven insights on how to leverage the strengths of PDFs and language models towards good chunks to be ingested in a vector database. Or, in other words, how multimodal models can go beyond trivial reverse engineering by decomposing tables into its building blocks, in plain language, as how those would be explained to another human; or better yet, as how humans would ask questions about such pieces of knowledge. Consequently, it brings robustness to retrieval, the backbone of RAG. And from such a strategy, we can transfer the same rationale to figures.\r\n\r\nGet ready to boost your retrieval skills, as we:\r\n- Analyze the semantical bottlenecks, from the anatomy of a PDF stream, to how parsers traverse it;\r\n- (Briefly) approach the never-ending debate on the ideal chunk format for ingestion in vector databases;\r\n- Build some chunks using multimodal models to decompose tables into its building blocks, preserving plain language;\r\n- Conduct an experiment on measuring quality of retrieval and compare the decomposition strategy against PDF parsers and reverse engineering techniques;\r\n- And last, but not least, transfer the same rationale to figures.\r\n\r\nBy then, you'll have enough food for thought to get your hands dirty, clone the repo, and give tweaks to the experiment yourself. Come along, gather some insights, and get inspired to break down tables and figures from your own PDF files, and to improve retrieval in your RAG systems.", "recording_license": "", "do_not_record": false, "persons": [{"code": "9B8PQS", "name": "Caio Benatti Moretti", "avatar": "https://pretalx.com/media/avatars/9B8PQS_N2vIbm0.jpg", "biography": "Caio holds a PhD in Computer Science and has been working with\u00a0data\u00a0and AI both in academia and industry since 2014. Currently working as a DS/MLE Consultant at Xebia Data, he is particularly keen on neural networks in its many forms and applications. His enthusiasm even led him to make a neural network fit inside a business card. With experience designing and taking applications into production, Caio has been recently focusing on how (Generative)AI can augment human productivity.", "public_name": "Caio Benatti Moretti", "guid": "ce45fb0b-7968-5aa6-96bb-8746be43678f", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/9B8PQS/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/UVPALT/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/UVPALT/", "attachments": []}, {"guid": "40b07fc9-ca4d-5665-bf30-f7f0247e0a69", "code": "NPMNCE", "id": 61882, "logo": null, "date": "2025-04-23T16:10:00+02:00", "start": "16:10", "duration": "00:30", "room": "Hassium", "slug": "pyconde-pydata-2025-61882-driving-trust-and-addressing-ethical-challenges-in-transportation-through-explainable-ai", "url": "https://pretalx.com/pyconde-pydata-2025/talk/NPMNCE/", "title": "Driving Trust and Addressing Ethical Challenges in Transportation through Explainable AI", "subtitle": "", "track": "General: Ethics & Privacy", "type": "Talk", "language": "en", "abstract": "Machine Learning can transform transportation\u2014improving safety, optimizing routes, and reducing delays\u2014yet it also presents ethical concerns. In this talk,I will show how Explainable AI (XAI) can offer practical solutions these ethical dilemmas like lack of trust in AI solutions. Instead of focusing on the technical underpinnings, we will discuss how transparency can be enhanced in AI-supported transportation systems. Using a real-world example, I will demonstrate how XAI provides the groundwork for building ethical, trustworthy, and socially responsible AI solutions in public transportation systems.", "description": "AI systems in transportation make decisions that directly impact people's lives, such as route optimization, safety measures, and resource allocation. These decisions often rely on complex algorithms, which can be opaque to stakeholders, including operators, regulators, and passengers. \r\n\r\n**One possible solution: Explainable AI (XAI)**\r\n\r\nExplainable AI (XAI) refers to methods and tools that make AI systems more transparent by providing interpretable insights into their decision-making processes. By integrating XAI, stakeholders can understand, validate, and trust the outputs of AI systems. \r\n\r\n**KARL: A Case Study in XAI for Public Transportation**\r\n\r\nThe *KARL* (KI in Arbeit und Lernen in der Region Karlsruhe) project is an exemplary initiative showcasing how XAI can address ethical challenges in AI-suported public transportation. \r\n\r\n**Technical Implementation**\r\n\r\nWhile the presentation will not delve deeply into technical specifics, it will touch upon key elements such as:\r\n* The use of open-source libraries like *SHAP* (SHapley Additive exPlanations) to provide interpretability.\r\n* Integration of XAI tools into the operational dashboard used by tram operators.\r\n* Collaboration with domain experts to ensure the explanations are meaningful and actionable.\r\n\r\n**Takeaways for the Audience**\r\n\r\nAt the end of this talk, attendees will:\r\n1. Understand the ethical challenges posed by AI in transportation and how they can undermine trust.\r\n2. Learn how XAI tools can address these challenges by enhancing transparency.\r\n3. Gain insights into the practical implementation of XAI in a real-world setting through the KARL project.\r\n4. Be inspired to incorporate XAI principles into their own AI projects to build ethical and socially responsible solutions.", "recording_license": "", "do_not_record": false, "persons": [{"code": "LMCW3N", "name": "Natalie Beyer", "avatar": "https://pretalx.com/media/avatars/LMCW3N_EBxGvLE.jpg", "biography": "Natalie co-founded  Lavrio.solutions, a company specializing in AI implementation. Since then, she has helped numerous organizations integrate AI into their processes and optimize their workflows. She has also conducted AI training sessions for businesses and professionals, bridging the gap between technical innovation and real-world usability.", "public_name": "Natalie Beyer", "guid": "da4a2053-27cd-5060-bec5-5e7a3d91ab43", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/LMCW3N/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/NPMNCE/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/NPMNCE/", "attachments": [{"title": "Slides to my talk", "url": "/media/pyconde-pydata-2025/submissions/NPMNCE/resources/Drivin_K3Kjuwe.pdf", "type": "related"}]}, {"guid": "8ea25412-fe2d-5160-9138-f0cbf2891749", "code": "M98YBR", "id": 61189, "logo": null, "date": "2025-04-23T17:10:00+02:00", "start": "17:10", "duration": "00:30", "room": "Hassium", "slug": "pyconde-pydata-2025-61189-enhancing-software-supply-chain-security-with-open-source-python-tools", "url": "https://pretalx.com/pyconde-pydata-2025/talk/M98YBR/", "title": "Enhancing Software Supply Chain Security with Open Source Python Tools", "subtitle": "", "track": "PyCon: Security", "type": "Talk", "language": "en", "abstract": "The Cyber Resilience Act (CRA) is focused on improving the security and resilience of digital products. But to comply with the CRA, businesses will need to start preparing the necessary evidence to ensure compliance if they want to continue to deliver digital products to the EU market once the CRA is in force. \r\n\r\nKey requirements within the CRA include implementing robust security measures throughout the product life-cycle, adopting secure development practices and implementing proactive vulnerability management processes.\r\n\r\nThis session will show how a number of the requirements for the CRA can be achieved by use of a number of open source Python tools.", "description": "The Cyber Resilience Act (CRA) is aimed at improving the security and resilience of the software components within a digital product. This session will provide a high level overview of the CRA and demonstrate how to enhance software supply chain transparency, manage risks effectively throughout the Software Development Lifecycle (SDLC), and achieve the necessary compliance by leveraging a suite of open-source Python tools. \r\n\r\nKey areas to be addressed will include:\r\n\r\n- Learn how to create comprehensive and high quality SBOMs to gain a clear understanding of all components within your software.\r\n- Discover how to identify and mitigate potential risks and threats within the software supply chain throughout the entire SDLC.\r\n- Explore effective strategies for identifying, assessing, prioritising and remediating software vulnerabilities.\r\n- Understand how to adopt best practices to ensure compliance with relevant regulations and industry standards.\r\n\r\nThe Python tools/applications to be referenced will include sbom4python, lib4sbom, lib4vex, lib4package, distro2sbom, sbomdiff, sbomaudit and cve-bin-tool.", "recording_license": "", "do_not_record": false, "persons": [{"code": "ERZXBC", "name": "Anthony Harrison", "avatar": "https://pretalx.com/media/avatars/ERZXBC_CxPQkBx.jpg", "biography": "Anthony Harrison has been developing and delivering mission-critical applications for over 40 years working on various complex programs where he held various roles in software, systems and cyber engineering, as well as providing technical leadership for a number of programmes.\r\n\r\nHe is the Founder and Director of APH10, and co-founder of SBOM Europe, and is a leading source of expertise in Software Bill of Materials (SBOM). He has been developing open source software actively for a number of years; most recently, the applications have been related to supporting the software supply chain through utilities to generate and analyse software bills of materials (SBOMs).\r\n\r\nHe has been a mentor for the Google Summer of Code for the past four years via the Python Software Foundation and is a mentor for his local CoderDojo in Manchester teaching students Python.", "public_name": "Anthony Harrison", "guid": "dd14a003-1d4d-5554-990e-97218b1c7a28", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/ERZXBC/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/M98YBR/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/M98YBR/", "attachments": [{"title": "Slides", "url": "/media/pyconde-pydata-2025/submissions/M98YBR/resources/Securi_6PRdTe6.pdf", "type": "related"}]}, {"guid": "bd15c847-1018-57fa-8dc3-c23022fa1c94", "code": "F9EFXA", "id": 61184, "logo": null, "date": "2025-04-23T17:50:00+02:00", "start": "17:50", "duration": "00:30", "room": "Hassium", "slug": "pyconde-pydata-2025-61184-modern-nlp-for-proactive-harmful-content-moderation", "url": "https://pretalx.com/pyconde-pydata-2025/talk/F9EFXA/", "title": "Modern NLP for Proactive Harmful Content Moderation", "subtitle": "", "track": "PyData: Natural Language Processing & Audio (incl. Generative AI NLP)", "type": "Talk", "language": "en", "abstract": "Despite an array of regulations implemented by governments and social media platforms worldwide (i.e. famous DSA), the problem of digital abusive speech persists. At the same time, rapid advances in NLP and large language models (LLMs) are opening up new possibilities\u2014and responsibilities\u2014for using this technology to make a positive social impact. Can LLMs streamline content moderation efforts? Are they effective at spotting and countering hate speech, and can they help produce more proactive solutions like text detoxification and counter-speech generation?\r\n\r\nIn this talk, we will dive into the cutting-edge research and best practices of automatic textual content moderation today. From clarifying core definitions to detailing actionable methods for leveraging multilingual NLP models, we will provide a practical roadmap for researchers, developers, and policymakers aiming to tackle the challenges of harmful online content. Join us to discover how modern NLP can foster safer, more inclusive digital communities.", "description": "The rise of large language models (LLMs) has revolutionized natural language processing (NLP), creating opportunities to address complex societal challenges, including the pervasive issue of harmful online content. Despite global regulations and platform-specific policies, abusive speech and toxic content continue to plague digital spaces, highlighting the need for smarter, scalable, and multilingual solutions.\r\n\r\nThis talk explores how modern NLP technologies can play a transformative role in content moderation, moving beyond traditional detection methods to proactive measures that promote healthier online interactions. We will cover key topics, including:\r\n\r\n* Understanding the Landscape: Definitions and nuances of harmful content categories, including hate speech, misinformation, and harassment. We will bring practices not only from CS field, but from communication with social scientists and NGOs.\r\n* Hate Speech Detection: Can LLMs detect hate speech? How the models can be adapted to new languages?\r\n* Text Detoxification: Diving into nuances of toxicity of 9 languages (from our recent shared task) and sharing best practice on LLMs prompting for texts detoxification.\r\n* Counter-Speech Generation: Our recent research results on how make LLMs generate not a very general \"Please, it is not ok to talk like this report\" but indeed address the targeted group.\r\n* Ethical Considerations: Who, in the end, responsible for the content moderation? How the community can help to bring best practices? How the measure the \"effectiveness\" of LLMs for content moderation?", "recording_license": "", "do_not_record": false, "persons": [{"code": "937CJZ", "name": "Daryna Dementieva", "avatar": "https://pretalx.com/media/avatars/937CJZ_R4XBJui.jpg", "biography": "Hello, I\u2019m Dr. Daryna Dementieva. Driven by both personal experiences and a deep passion, I am a dedicated advocate and researcher focused on leveraging AI and NLP for Positive Social Impact. Currently (as a technical person) I am exploring collaborations with NGOs and social scientists to bridge the gap between cutting-edge AI technology and societal needs. My goal is to share insights on responsible AI and Data Science, inspiring and enabling projects in these fields to transition from concept to impactful reality.", "public_name": "Daryna Dementieva", "guid": "0abf737a-3e36-5b8a-92d5-6f65ac92d786", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/937CJZ/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/F9EFXA/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/F9EFXA/", "attachments": []}], "Palladium": [{"guid": "884e71e8-f85a-5839-9d8d-0a51c21f63d7", "code": "ABWHSD", "id": 61385, "logo": null, "date": "2025-04-23T12:25:00+02:00", "start": "12:25", "duration": "00:30", "room": "Palladium", "slug": "pyconde-pydata-2025-61385-from-tensors-to-clouds-a-practical-guide-to-zarr-v3-and-zarr-python-3", "url": "https://pretalx.com/pyconde-pydata-2025/talk/ABWHSD/", "title": "From Tensors to Clouds \u2014 A Practical Guide to Zarr V3 and Zarr-Python 3", "subtitle": "", "track": "PyData: Data Handling & Engineering", "type": "Talk", "language": "en", "abstract": "A key feature of the Python data ecosystem is the reliance on simple but efficient primitives that follow well-defined interfaces to make tools work seamlessly together (Cf. http://data-apis.org/). NumPy provides an in-memory representation for tensors. Dask provides parallelisation of tensor access. Xarray provides metadata linking tensor dimensions. **Zarr** provides a missing feature, namely the scalable, persistent storage for annotated hierarchies of tensors. Defined through a community process, the Zarr specification enables the storage of large out-of-memory datasets locally and in the cloud. Implementations exist in C++, C, Java, Javascript, Julia, and Python, enabling.\r\n\r\nThis talk presents a systematic approach to understanding and implementing the newer version of [Zarr-Python](https://github.com/zarr-developers/zarr-python), i.e. Zarr-Python 3 by explaining the new API, deprecations, new storage backend, improved codec pipeline, etc.", "description": "Zarr is a data format for storing chunked, compressed N-dimensional arrays and is sponsored by [NumFOCUS]((https://numfocus.org/project/zarr)) under their umbrella.\r\n\r\nIt is based on open-source technical specification and has implementations in several languages, with [Zarr-Python](https://github.com/zarr-developers/zarr-python) being the most used.\r\n\r\nAfter the successful adoption of Specification V3, our team has worked tirelessly over the last year to ensure the Python library's compliance with the latest spec.\r\n\r\n## Outline\r\n\r\nFirst, I\u2019d be talking about:\r\n\r\n### Understanding Zarr basics (5 mins.)\r\n\r\n- What is Zarr, and how it works?\r\n    - The inner workings of Zarr using illustrated graphics\r\n- What is the Zarr Specification?\r\n    - What's new in Zarr Spec V3?\r\n\r\nThen, I'll be talking about the new Zarr-Python 3 and its significant features:\r\n\r\n### What's new in Zarr-Python 3? (15 mins.)\r\n\r\n- Major design updates\r\n    - New storage backend\r\n    - Creating Zarr arrays and groups asynchronously\r\n    - New and improved codec pipeline\r\n    - Native GPU support for creating and writing arrays\r\n- Changes and deprecations\r\n    - Overview of the new API\r\n    - Optimising performance for large arrays\r\n    - Deprecation of several stores like LMDBStore, SQLStore, MongoDBStore, etc.\r\n- 3.0 Migration guide\r\n    - Steps to migrate from Zarr-Python 2 to Zarr-Python 3\r\n- Extensions\r\n    - How can Zarr-Python 3 be extended to add new custom data types, stores, chunking strategies, etc.?\r\n\r\nThen, I\u2019d be doing a hands-on session, which would cover the following:\r\n\r\n### Hands-on (5 mins.)\r\n\r\n- Creating Zarr arrays and groups using Zarr-Python 3\r\n    - Plus walkthrough of the new features (mentioned above)\r\n- Looking under the hood\r\n    - Use store and info functions to explain how your Zarr data is stored and display important information\r\n\r\n### Conclusion (5 mins.)\r\n \r\n- Key takeaways\r\n- How can you get involved?\r\n- QnA\r\n\r\nThis talk aims to address an audience that works with large amounts of data and is looking for a transparent, open-source, reliable, cloud-optimised, and environmentally friendly format.\r\n\r\nThe tone of the talk is set to be informative, story-telling and fun.\r\n\r\nIntermediate knowledge of Python and NumPy arrays is required for the attendees to attend this talk.\r\n\r\n### After this talk, you\u2019d:\r\n\r\n- understand the basics of Zarr and what's new in V3,\r\n- leverage the new functionalities of Zarr-Python 3 with improved performance,\r\n- make an informed decision on what data format to use for your data", "recording_license": "", "do_not_record": false, "persons": [{"code": "A7ACFE", "name": "Sanket Verma", "avatar": "https://pretalx.com/media/avatars/A7ACFE_CWefmUa.jpg", "biography": "Sanket is a data scientist based out of New Delhi, India. He likes to build data science tools and products and has worked with startups, governments, and organisations. He loves building community and bringing everyone together and is Chair of PyData Delhi and PyData Global.\r\n\r\nCurrently, he's taking care of the community and OSS at Zarr as their Community Manager.\r\n\r\nWhen he\u2019s not working, he likes to play the violin and computer games and sometimes thinks of saving the world!", "public_name": "Sanket Verma", "guid": "b66fda83-603e-5800-9d21-04088119b753", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/A7ACFE/"}], "links": [{"title": "Presentation Slides", "url": "https://docs.google.com/presentation/d/1OSQtJECLh6_KSV22BeOOnAfUZU46Fp71Ofnvxl-wtGQ/edit?usp=sharing", "type": "related"}], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/ABWHSD/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/ABWHSD/", "attachments": []}, {"guid": "64838170-131f-52eb-a8ac-8ae45680bc2a", "code": "TQLGA8", "id": 61178, "logo": null, "date": "2025-04-23T14:30:00+02:00", "start": "14:30", "duration": "00:30", "room": "Palladium", "slug": "pyconde-pydata-2025-61178-reinforcement-learning-without-a-phd-a-python-developer-s-journey", "url": "https://pretalx.com/pyconde-pydata-2025/talk/TQLGA8/", "title": "Reinforcement Learning Without a PhD: A Python Developer\u2019s Journey", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Statistics", "type": "Talk", "language": "en", "abstract": "Reinforcement Learning (RL) has shown superhuman performance in games and is already delivering value in Big Tech. But despite its potential, RL remains largely inaccessible to most developers. Why? Because real-world RL is hard\u2014it demands data, infrastructure, and tools that are often built for researchers, not practitioners.\r\n\r\nThis talk shares the journey of applying RL to a real-world use case without having a PhD. It\u2019s a story of figuring things out through hands-on experimentation, trial and error, and building what didn\u2019t exist. We\u2019ll explore what makes RL powerful, why it\u2019s still rare in practice, and how you can get started. Along the way, you\u2019ll learn about the key challenges of production RL, how to work around them, and how the open-source toolkit pi_optimal can help bridge the gap. Whether you're just RL-curious or ready to dive in, this talk offers practical insights and a demo to help you take your first steps.", "description": "Reinforcement Learning (RL) has made headlines for beating humans at Go and StarCraft, and it\u2019s already being used by companies like Google, Amazon, and Lyft to optimize real-world systems. But outside of big tech and research labs, RL is still rarely applied. Why? Because even though RL is powerful, it's also complex, resource-intensive, and hard to implement without the right tools.\r\n\r\nIn this talk, we explore what it really takes to bring RL into production\u2014without a PhD, a research team, or unlimited infrastructure. I\u2019ll share the story of how we applied RL to a real-world business problem: optimizing digital campaign management in a fast-changing environment. We faced all the classic challenges\u2014limited data, no simulator, and no out-of-the-box tools that actually worked for our use case.\r\n\r\nWe\u2019ll look at how we built a training environment from historical data, dealt with uncertainty using ensemble models, and iterated through a long cycle of trial, error, and learning. That experience eventually led us to create pi_optimal, an open-source toolkit designed to make RL more accessible to Python developers and data scientists.\r\n\r\nYou\u2019ll walk away with a clear understanding of:\r\n- Why RL is powerful, but rarely applied in practice\r\n- What makes real-world RL so challenging\r\n- How we got a working RL system off the ground without a PhD in RL\r\n- How pi_optimal helps lower the barrier to entry\r\n- How you can get started with RL, either through theory or hands-on practice\r\n\r\nWhether you're RL-curious or looking to apply it in your own projects, this talk offers practical insights and a live demo to help you take your first steps.", "recording_license": "", "do_not_record": false, "persons": [{"code": "R3BXQU", "name": "Jochen Luithardt", "avatar": "https://pretalx.com/media/avatars/R3BXQU_Ziq66wJ.png", "biography": "I'm the Co-Founder of pi_optimal, where we're working to democratize reinforcement learning and make it usable for real-world decision-making. My passion lies in building AI systems that don't just work in theory, but actually solve meaningful problems in practice.\r\n\r\nBefore that, I was Lead Data Scientist at Stellwerk3 GmbH, where I led the development of a model-based reinforcement learning project for campaign control. I also had the chance to represent the company at Cyber Valley Incubator events and build a strong, collaborative data team.\r\n\r\nMy academic journey brought me to the Max Planck Institute for Intelligent Systems, where I focused on challenges in autonomous learning \u2014 from sparse rewards in model-free RL to structured world models and graph networks in model-based approaches. Earlier on, I also worked in digital advertising technology at Gruner + Jahr, developing deep learning models for ad click prediction.\r\n\r\nAcross all these experiences, one thing has stayed the same: I love taking complex machine learning concepts and turning them into impactful, real-world applications.", "public_name": "Jochen Luithardt", "guid": "34879f4e-f9bc-5ee9-9dae-b2a059805f27", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/R3BXQU/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/TQLGA8/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/TQLGA8/", "attachments": []}, {"guid": "24389b62-4b3f-5647-9259-c058ad2e1fba", "code": "F7RDPT", "id": 61812, "logo": null, "date": "2025-04-23T15:10:00+02:00", "start": "15:10", "duration": "00:30", "room": "Palladium", "slug": "pyconde-pydata-2025-61812-building-reliable-ai-agents-for-publishing-a-dspy-based-quality-assurance-framework", "url": "https://pretalx.com/pyconde-pydata-2025/talk/F7RDPT/", "title": "Building Reliable AI Agents for Publishing: A DSPy-Based Quality Assurance Framework", "subtitle": "", "track": "PyData: Natural Language Processing & Audio (incl. Generative AI NLP)", "type": "Talk", "language": "en", "abstract": "As publishers increasingly adopt AI agents for content generation and analysis, ensuring output quality and reliability becomes critical. This talk introduces a novel quality assurance framework built with DSPy that addresses the unique challenges of evaluating AI agents in publishing workflows. Using real-world examples from newsroom implementations, I will demonstrate how to design and implement systematic testing pipelines that verify factual accuracy, content consistency, and compliance with editorial standards. Attendees will learn practical techniques for building reliable agent evaluation systems that go beyond simple metrics to ensure AI-generated content meets professional publishing standards.", "description": "This presentation addresses one of the most pressing challenges in professional publishing today: ensuring quality and reliability when deploying AI agents in editorial environments. We'll take a deep dive into how DSPy's programmatic approach to language model development can be leveraged to create robust testing and validation pipelines that meet the demanding standards of modern newsrooms.\r\nThe discussion begins by exploring the current landscape of AI evaluation in publishing workflows, examining why traditional testing approaches fall short when dealing with language models, and identifying the specific quality requirements unique to journalistic and editorial content. We'll then move into a detailed technical exploration of solutions built with DSPy, demonstrating how to design modular evaluation pipelines, implement publishing-specific metrics, and create automated systems for fact-checking and consistency validation. Special attention will be given to the integration of knowledge graphs for reference-based evaluation and the incorporation of these systems into broader MLOps workflows.\r\nTo ground these concepts in reality, we'll examine a detailed case study of implementing this framework in an actual newsroom environment. This will include practical discussions of handling various content types, along with strategies for managing test data and evaluation criteria. We'll share real-world performance monitoring approaches and concrete improvement strategies that have proven successful in production environments.\r\nThe presentation concludes with hard-won insights and best practices, including practical strategies for finding the right balance between automated testing and human review, effective approaches to handling edge cases, and methods for scaling quality assurance processes across diverse content teams. Throughout the talk, we'll share code examples and practical implementations that attendees can adapt for their own projects.\r\nThis session is specifically designed for technical leads and machine learning engineers, though the principles and approaches discussed will be valuable for anyone involved in AI quality assurance. Attendees will leave with a comprehensive understanding of how to design and implement QA processes for AI agents, practical knowledge of DSPy implementation for automated testing, and concrete strategies for maintaining high quality standards in AI-assisted workflows.", "recording_license": "", "do_not_record": false, "persons": [{"code": "AZ7FNH", "name": "Simonas \u010cerniauskas", "avatar": "https://pretalx.com/media/avatars/AZ7FNH_bf7aRPo.jpg", "biography": "Dr.-Ing. Simonas \u010cerniauskas is the founder and CTO of tisix.io, specializing in developing practical LLM solutions for media and publishers. With a doctorate from RWTH Aachen and experience as a principal researcher at Research Center J\u00fclich, he combines deep technical expertise with hands-on implementation experience. His work focuses on multi-modal content generation and media processing. Drawing from his background in mechanical engineering, quality assurance and machine learning engineering, Simonas develops scalable AI solutions while maintaining a strong focus on quality assurance and risk management. He regularly shares insights through speaking engagements and technical publications, helping organizations navigate the complexities of AI implementation with practical, business-focused approaches.", "public_name": "Simonas \u010cerniauskas", "guid": "80386181-7e74-5611-bb46-c16ceeb97330", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/AZ7FNH/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/F7RDPT/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/F7RDPT/", "attachments": []}, {"guid": "bbf75747-31b2-5e06-b4f7-096dd92842f6", "code": "CMTKZS", "id": 59426, "logo": null, "date": "2025-04-23T16:10:00+02:00", "start": "16:10", "duration": "00:30", "room": "Palladium", "slug": "pyconde-pydata-2025-59426-deploying-synchronous-and-asynchronous-django-applications-for-hobby-projects", "url": "https://pretalx.com/pyconde-pydata-2025/talk/CMTKZS/", "title": "Deploying Synchronous and Asynchronous Django Applications for Hobby Projects", "subtitle": "", "track": "PyCon: Django & Web", "type": "Talk", "language": "en", "abstract": "Simplify deploying hybrid Django applications with synchronous views and asynchronous apps. This session covers ASGI support, Docker containerization, and Kamal for seamless, zero-downtime deployments on single-server setups, ideal for hobbyists and small-scale projects.", "description": "Hobby projects often start small but can quickly grow in complexity, especially when incorporating Django\u2019s support for asynchronous applications alongside traditional synchronous views. Deploying such hybrid projects on a single server\u2014whether in the cloud or on-premise\u2014can be daunting without the right tools and workflows.  \r\n\r\nThis talk focuses on simplifying the deployment process for hobbyists and developers who want to create and manage robust Django applications without requiring extensive infrastructure or expertise. We\u2019ll cover:  \r\n- Deploying Django projects that combine synchronous views and asynchronous apps using Django\u2019s ASGI support.  \r\n- Containerizing the application with Docker for consistent and manageable environments.  \r\n- Utilizing Kamal, an open-source deployment tool, to enable zero-downtime deployments, rolling updates, and seamless app management.  \r\n- Demonstrating the workflow on a single cloud server, with insights on adapting it to on-premise servers.  \r\n\r\nWhether you're building a passion project or experimenting with modern Django features, this session will provide you with practical tools and approaches to deploy hybrid Django applications effortlessly, keeping the process accessible and scalable for hobby-level development.", "recording_license": "", "do_not_record": false, "persons": [{"code": "XDSGYW", "name": "melhin", "avatar": "https://pretalx.com/media/avatars/XDSGYW_fVP9Ya2.jpeg", "biography": "Software tinkerer, stumbling through software creation, with a deep enthusiasm for history and understanding human migration.", "public_name": "melhin", "guid": "8f589a71-cc4f-5a47-9e9a-7688c11f86e8", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/XDSGYW/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/CMTKZS/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/CMTKZS/", "attachments": []}, {"guid": "d9ee05a3-230c-5a2d-ac30-7bddda060df7", "code": "U9KHNA", "id": 61204, "logo": null, "date": "2025-04-23T17:10:00+02:00", "start": "17:10", "duration": "00:30", "room": "Palladium", "slug": "pyconde-pydata-2025-61204-getting-started-with-bayes-in-engineering-implementing-kalman-filters-with-rxinfer-jl", "url": "https://pretalx.com/pyconde-pydata-2025/talk/U9KHNA/", "title": "Getting Started with Bayes in Engineering: Implementing Kalman Filters with RxInfer.jl", "subtitle": "", "track": "PyData: Research Software Engineering", "type": "Talk", "language": "en", "abstract": "Bayesian methods are not commonly seen in Civil Engineering and Structural Dynamics. In this talk we explore how RxInfer.jl and the Julia Programming Language can simplify Bayesian modeling by implementing a Kalman filter for tracking the dynamics of a structural system. Perfect for engineers, researchers, and data scientists eager to apply probabilistic modelling and Bayesian methods to real-world engineering challenges.", "description": "Bayesian methods are renowned for their ability to incorporate domain knowledge and quantify uncertainty, making them valuable across various engineering and data science fields. However, finding practical examples of these methods in civil engineering, especially within structural dynamics, can be challenging.\r\n\r\nThis talk aims to make Bayesian inference accessible to engineering practitioners by demonstrating how RxInfer.jl, a Julia package for probabilistic programming, can be used to implement a Kalman filter for tracking the dynamics of a structural system. The session covers:\r\n\r\n1. Bayesian Modelling in Python and Julia: A brief comparison of probabilistic programming languages, highlighting Python and Julia\r\n2. State Space Modelling of Structural Dynamical Systems: A brief introduction to state space models and their use in structural dynamics\r\n3. Linking State Space Modelling to Finite Element Modelling: Making the connection between FEM and SSM\r\n4. A Simplified Overview of Bayesian Filtering and Kalman Filters for Dynamical Systems\r\n5. Bayesian Filtering Made Simple with RxInfer.jl: a step-by-step guide to setting up a user-friendly and readable Bayesian filter using Rxinfer.jl\r\n6. Full Workflow Example\r\n7. Interpreting the Results and Next Steps\r\n8. Connections to Julia, Python and Open-Source Ecosystems: exploring integrations with tools like FreeCAD and other open-source platforms\r\n\r\nBy the end of the talk, attendees will have a clear understanding of how to start using Bayesian methods in their engineering projects, supported by reproducible and open-source code.", "recording_license": "", "do_not_record": false, "persons": [{"code": "N7S8VF", "name": "Victor Flores Terrazas", "avatar": null, "biography": "I\u2019m a data scientist and consultant specializing in Bayesian modeling in Civil Engineering, currently focusing on applying machine learning to anomaly detection in mechanical systems. My work also explores how machine learning and Bayesian methods can provide clearer insights in engineering applications. When I\u2019m not working with data, you\u2019ll probably find me swimming, cooking, or listening to anything from black metal to Japanese jazz.", "public_name": "Victor Flores Terrazas", "guid": "8e8a1834-4006-5470-9e62-a96568921a15", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/N7S8VF/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/U9KHNA/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/U9KHNA/", "attachments": []}, {"guid": "bd2b6a97-433b-55e0-bd92-28c33c31f2e1", "code": "NBFH7G", "id": 60536, "logo": null, "date": "2025-04-23T17:50:00+02:00", "start": "17:50", "duration": "00:30", "room": "Palladium", "slug": "pyconde-pydata-2025-60536-streamlining-the-cosmos-pythonic-workflow-management-for-astronomical-analysis", "url": "https://pretalx.com/pyconde-pydata-2025/talk/NBFH7G/", "title": "Streamlining the Cosmos: Pythonic Workflow Management for Astronomical Analysis", "subtitle": "", "track": "PyData: PyData & Scientific Libraries Stack", "type": "Talk", "language": "en", "abstract": "Astronomical surveys are growing rapidly in complexity and scale, necessitating accurate, efficient, and reproducible reduction and analysis pipelines. In this talk we explore Pythonic workflow managers to streamline processing large datasets on distributed computing environments.\r\n\r\nModern astronomy generates vast datasets across the electromagnetic spectrum. NASA's flagship James Webb Space Telescope (JWST) provides unprecedented observations that enable deep studies of distant galaxies, cosmic structures, and other astrophysical phenomena. However, these datasets are complex and require intricate calibration and analysis pipelines to transform raw data into meaningful scientific insights.\r\n\r\nWe will discuss the development and deployment of Pythonic tools, including snakemake and pixi, to construct modular, parallelized workflows for data reduction and analysis. Attendees will learn how these tools automate complex processing steps, optimize performance in distributed computing environments, and ensure reproducibility. Using real-world examples, we will illustrate how these workflows simplify the journey from raw data to actionable scientific insights.", "description": "As astronomical surveys continue to grow in size and sophistication, researchers face mounting challenges in building efficient, scalable, and reproducible data processing pipelines. Modern observatories, like NASA's James Webb Space Telescope (JWST), are delivering unprecedented volumes of complex and specialized data, requiring innovative approaches to transform raw observations into meaningful, scientifically valid results. This talk focuses on leveraging Pythonic workflow management tools to address the unique challenges of processing large-scale astronomical datasets efficiently and reproducibly.\r\n\r\nI will provide a brief overview of JWST including its capabilities and the groundbreaking science it has enabled. In particular we will focus on the Pure Parallel mode which collect serendipitous observations from regions of the sky adjacent to primary science targets. These opportunistic datasets are a powerful resource for blind extragalactic surveys, offering unique opportunities to uncover faint galaxies, cosmic structures, and rare astrophysical phenomena. However, their \u201cunscheduled\u201d and heterogeneous nature presents significant challenges: the data arrive in raw, uncalibrated formats and require intricate, multi-step workflows\u2014such as artifact masking, background subtraction, and galaxy spectral analysis\u2014before becoming scientifically usable.\r\n\r\nIn this talk, I will demonstrate how tools like Snakemake and Pixi offer powerful, Pythonic solutions for these challenges. I\u2019ll show how these tools allow scientists to design modular, scalable, and highly parallel workflows that automate the reduction and analysis process while efficiently distributing computation across high-performance computing (HPC) clusters and cloud environments. By breaking workflows into smaller, reusable components, we can improve computational performance and maintain flexibility to adapt pipelines to new datasets, instruments, or evolving scientific goals.\r\n\r\nReproducibility remains a critical pillar of modern science, and I will highlight how combining workflow managers with environment management tools ensures version-controlled pipelines, transparent data lineage tracking, and reliable replication of results. This enables consistent analyses across diverse systems, fostering collaboration and long-term usability of scientific products.\r\n\r\nThis talk is designed for (data) scientists and researchers working with large-scale or complex datasets with multi-step reduction/analysis pipelines. This talk will provide a GitHub repository containing resources for building modular workflows, including examples of existing infrastructures, to provide actionable takeaways. Attendees will leave with a clear understanding of how to apply modern workflow management techniques to streamline their processing pipelines, improve reproducibility, and scale their analyses to meet the demands of their datasets.", "recording_license": "", "do_not_record": false, "persons": [{"code": "QHTKP7", "name": "Raphael Hviding", "avatar": "https://pretalx.com/media/avatars/QHTKP7_8cyRoLq.png", "biography": "Hello! I am Raphael Hviding, a Postdoctoral Researcher in the Data Science Department at the Max-Planck Institute for Astronomy. \r\nScientifically I am interested in studying the lives of galaxies beyond our own Milky Way, how they formed, and the role that supermassive black holes play in governing galaxy evolution. \r\nComputationally my skills are in workflow management for complex data processing pipelines and in data-driven Bayesian modelling of astronomical data.", "public_name": "Raphael Hviding", "guid": "10617cd9-a4f5-5195-83de-0a2370be323f", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/QHTKP7/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/NBFH7G/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/NBFH7G/", "attachments": []}], "Ferrum": [{"guid": "b97e6b35-f881-5e09-977f-778989ebac52", "code": "UH7FXA", "id": 61846, "logo": null, "date": "2025-04-23T14:30:00+02:00", "start": "14:30", "duration": "01:30", "room": "Ferrum", "slug": "pyconde-pydata-2025-61846-instrumenting-python-applications-with-opentelemetry", "url": "https://pretalx.com/pyconde-pydata-2025/talk/UH7FXA/", "title": "Instrumenting Python Applications with OpenTelemetry", "subtitle": "", "track": "PyCon: MLOps & DevOps", "type": "Tutorial", "language": "en", "abstract": "Observability is challenging and often requires vendor-specific instrumentation. Enter OpenTelemetry: a vendor-agnostic standard for logs, metrics, and traces. Learn how to instrument Python applications with OpenTelemetry and send telemetry to your preferred observability backends.", "description": "Understanding the behaviour and performance characteristics of the software we deploy, especially distributed software, is quite tricky. While observability tooling helps, implementing vendor-specific instrumentation creates tight coupling and technical debt. \r\n\r\nEnter OpenTelemetry: A one-stop-shop for observability instrumentation, collection and routing. It aims to solve the above problem by providing SDKs, libraries and a unified semantic model for describing telemetry signals like logs, metrics and traces. These signals can be collected, transformed and then routed to many observability backends that support the OpenTelemetry protocol - avoiding vendor lock-in and platform specific observability code.\r\n\r\nIn this workshop, we'll guide you through what OpenTelemetry is, how it works, how to instrument your Python applications to emit telemetry data, and how to ingest this data into observability backends - enabling you to make better decisions about your application's performance.\r\n\r\n***Note*: We will be using docker & docker compose during this workshop, so please make sure it is installed! Familiarity with Flask is also a plus!**\r\n\r\nWe'll be working from this repository: https://github.com/autophagy/pycon-2025-otel-workshop \r\n\r\nYou're welcome to clone the repository in advance and pull the images we'll use for the workshop. You can pull these images by running following command from the root of the repo: `docker compose pull`.", "recording_license": "", "do_not_record": false, "persons": [{"code": "JJ8BUD", "name": "Mika Naylor", "avatar": "https://pretalx.com/media/avatars/JJ8BUD_6aWJpf4.png", "biography": "Mika is a Berlin-based lifeform mostly working with devops, distributed systems and Apache Flink. She also loves Rust, making ceramics and baking bread.", "public_name": "Mika Naylor", "guid": "fcbee88e-5eb2-5208-b047-1c498b36c9c8", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/JJ8BUD/"}, {"code": "7QVZDD", "name": "Emily Woods", "avatar": "https://pretalx.com/media/avatars/7QVZDD_zJqJiNo.jpg", "biography": "Emily is a software engineer with an interest in developer tooling and platform engineering. When she's not working with computers, she can usually be found making misshapen pottery or exploring Berlin's parks with her dog.", "public_name": "Emily Woods", "guid": "90f0cd7e-c6cd-5479-ba2b-9eeb756a3933", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/7QVZDD/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/UH7FXA/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/UH7FXA/", "attachments": []}, {"guid": "895fe9fe-b5ac-5e0c-b42b-580b4b87e701", "code": "LYDDDC", "id": 67553, "logo": null, "date": "2025-04-23T16:10:00+02:00", "start": "16:10", "duration": "00:30", "room": "Ferrum", "slug": "pyconde-pydata-2025-67553-building-serverless-python-ai-skills-as-wasm-components", "url": "https://pretalx.com/pyconde-pydata-2025/talk/LYDDDC/", "title": "Building Serverless Python AI skills as WASM components", "subtitle": "", "track": "PyData: Generative AI", "type": "Sponsored Talk", "language": "en", "abstract": "Frameworks like llama-stack and langchain allow for quick prototyping of generative AI applications. However, companies often struggle to deploy these applications into production quickly. This talk explores the design of a Python SDK that enables the development of AI skills in Python and their compilation into WebAssembly (WASM) components, targeting a specific host runtime that offers interfaces for interacting with LLMs and associated tooling.", "description": "Why do companies struggle so hard to get their AI skills into production quickly?\r\n\r\nThis talk is about building an SDK that enables the development of production-ready AI skills in Python that can be run as serverless functions within a WASM runtime and interact with LLMs via a WIT (WASM Interface Type) world. On a less technical note, we will explore the design of an SDK that offers a streamlined development experience for AI skills.\r\n\r\nWe will explore the implications for topics such as testability, traceability, and the evaluation of AI logic. How can software engineering best practices, such as separation of concerns and modularity, be applied to the design of AI applications?\r\n\r\nFred Brooks' excellent essay, No Silver Bullet, distinguishes between accidental complexity and essential complexity. This talk will explore how an SDK for AI skills can reduce accidental complexity during development and deployment, providing developers with a focused environment for innovating prompts and retrieval strategies.\r\n\r\nHow can a WIT that supports running AI applications be designed, and how can bindings to such a WIT world be generated and consumed in a Python module? We will examine abstractions that allow local testing and debugging without the compilation step by encapsulating the WIT host interface behind a Protocol.\r\n\r\nThe talk also covers benefits of running AI skills as WASM components: When compiling a Python module to WebAssembly, the Python interpreter is part of the compiled component. Although this results in longer start-up times compared to components written in compiled languages like Rust, it provides a key advantage: The interpreter can securely execute Python code generated by an LLM within a highly restricted environment, ensuring no network or file system access.\r\n\r\nKey takeaways include developing a foundational understanding of WASM and WIT, and how they can interface with Python. You will gain insights into the challenges of deploying AI skills into production and discover how testing, tracing, and evaluation can simplify this process.", "recording_license": "", "do_not_record": false, "persons": [{"code": "LGPEHF", "name": "Moritz Althaus", "avatar": "https://pretalx.com/media/avatars/LGPEHF_zvD1YGO.jpg", "biography": "I currently work as a Software Engineer at Aleph Alpha. Before joining Aleph Alpha, I founded WeGlide and worked on error back-propagation in Spiking Neural Networks. I enjoy spending time outdoors and flying gliders.", "public_name": "Moritz Althaus", "guid": "5c623aa9-fa24-5355-9bdf-9e066507f8b0", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/LGPEHF/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/LYDDDC/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/LYDDDC/", "attachments": []}, {"guid": "976cee78-35d9-567e-8e15-c429d6092c37", "code": "CRNJWQ", "id": 61762, "logo": null, "date": "2025-04-23T17:10:00+02:00", "start": "17:10", "duration": "00:30", "room": "Ferrum", "slug": "pyconde-pydata-2025-61762-supercharge-your-testing-with-inline-snapshot", "url": "https://pretalx.com/pyconde-pydata-2025/talk/CRNJWQ/", "title": "Supercharge Your Testing with inline-snapshot", "subtitle": "", "track": "PyCon: Testing", "type": "Talk", "language": "en", "abstract": "Snapshot tests are invaluable when you are working with large, complex, or frequently changing expected values in your tests.\r\nIntroducing inline-snapshot, a Python library designed for snapshot testing that integrates seamlessly with pytest, allowing you to embed snapshot values directly within your source code.\r\nThis approach not only simplifies test management but also boosts productivity by improving the maintenance of the tests.\r\nIt is particularly useful for integration testing and can be used to write your own abstractions to test complex Apis.", "description": "This Talk gives you an introduction into inline-snapshot and how it can transform your testing strategy:\r\n* Foundations of Snapshot Testing: Start with an introduction to what snapshot testing is and why it's a game-changer for Python developers.\r\n* Basic Usage: Learn the core functionality of the `snapshot()` function. Understand how it captures and manages snapshots inline with your tests.\r\n\r\nAdvanced Techniques:\r\n* Dirty Equals: Explore how you can leverage dirty-equals within your snapshots for more flexible assertions, allowing for partial matching which is particularly useful for complex data structures.\r\n* Parametrized Tests: See how inline-snapshot can be applied to parametrized tests, ensuring each parameter set has its own snapshot.\r\n* Customizable: Learn to create your own test functions to test your specific problems.", "recording_license": "", "do_not_record": false, "persons": [{"code": "B7CMVH", "name": "Frank Hoffmann", "avatar": "https://pretalx.com/media/avatars/B7CMVH_2Mx6k0x.jpg", "biography": "I'm a software developer. My goal with my opensource work is to help other people to develop better software faster.", "public_name": "Frank Hoffmann", "guid": "2143b09c-8d8b-52ca-9044-505b828f7bf3", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/B7CMVH/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/CRNJWQ/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/CRNJWQ/", "attachments": []}, {"guid": "fdc0c5e0-2314-5625-ac1f-872a6e9ef983", "code": "PNQB7C", "id": 61342, "logo": null, "date": "2025-04-23T17:50:00+02:00", "start": "17:50", "duration": "00:30", "room": "Ferrum", "slug": "pyconde-pydata-2025-61342-zero-code-change-acceleration-familiar-interfaces-and-high-performance", "url": "https://pretalx.com/pyconde-pydata-2025/talk/PNQB7C/", "title": "Zero Code Change Acceleration: familiar interfaces and high performance", "subtitle": "", "track": "PyData: PyData & Scientific Libraries Stack", "type": "Talk", "language": "en", "abstract": "The PyData ecosystem is home to some of the best and most popular tools for doing data-science. Every data-scientist alive today has used pandas and scikit-learn and even Large Language Models know how to use them! For many years there have also been alternative implementations with similar interfaces and libraries with completely new approaches that focus on achieving the ultimate in performance and hardware acceleration. This talk will look at the recent efforts to give users the best of both worlds: a familiar and widely used interface as well as high performance.", "description": "The interfaces defined by libraries like Numpy, pandas or scikit-learn are the defacto standard APIs in each library's domain. Data scientists use these libraries directly as well as indirectly through libraries that depend on them.\r\n\r\nThis talk will look at the different approaches that recent efforts have taken to give users both a familiar interface and GPU acceleration. This means users do not have to rewrite their code, learn a new library and benefit from acceleration when using existing libraries.\r\n\r\nThe cuml team built a scikit-learn accelerator by diving deep into the import system of Python. By hooking into the import system you can replace the result of `import sklearn` with a library that uses cuml where possible and falls back to scikit-learn where necessary.\r\n\r\nThe scikit-learn team is adding experimental support to handle PyTorch and CuPy inputs by using the array API standard. Instead of using the Numpy API to perform array computations, scikit-learn is switching to using the array API. This is a subset of the Numpy API that is supported by several other array libraries. The API of Numpy and PyTorch is similar but not exactly the same, this makes writing code that works with both hard. The array API addresses this problem by providing a unified API. Users can accelerate their scikit-learn code by passing in a CuPy or PyTorch array instead of a Numpy array.", "recording_license": "", "do_not_record": false, "persons": [{"code": "G9FDBT", "name": "Tim Head", "avatar": "https://pretalx.com/media/avatars/G9FDBT_0krrzKt.jpg", "biography": "I am a scikit-learn core maintainer and work at NVIDIA.\r\n\r\nBefore working on scikit-learn I helped build mybinder.org and worked on JupyterHub.\r\n\r\nMany years ago I was a particle physicist at CERN in Geneva.", "public_name": "Tim Head", "guid": "22b2e092-09d8-5036-b5e6-ea4cda0fff99", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/G9FDBT/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/PNQB7C/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/PNQB7C/", "attachments": []}], "Dynamicum": [{"guid": "02d01c69-09c2-5ec5-b340-c54b5c1a2cbd", "code": "CP3TKB", "id": 60450, "logo": null, "date": "2025-04-23T11:45:00+02:00", "start": "11:45", "duration": "01:30", "room": "Dynamicum", "slug": "pyconde-pydata-2025-60450-power-up-your-polars-code-with-polars-extention", "url": "https://pretalx.com/pyconde-pydata-2025/talk/CP3TKB/", "title": "Power up your Polars code with Polars extention", "subtitle": "", "track": "PyData: Data Handling & Engineering", "type": "Tutorial", "language": "en", "abstract": "While Polars is written in Rust and has the advantages of speed and multi-threaded functionalities., everything will slow down if a Python function needs to be applied to the DataFrame. To avoid that, a Polar extension can be used to solve the problem. In this workshop, we will look at how to do it.", "description": "We love Polars because it is written in Rust so we can use Rust's security and speed. However, it is not the most efficient if we still have to call in a Python function to perform specific aggregation. In this workshop, we will use the Polars plugin. You will be writing simple functions in Rust, and then you will use it together with Polars in your Python data pipeline.\r\n\r\n#### Target Audience\r\n\r\nEngineers and data scientists who use Polars and are confident to write a bit of Rust code. We expect you to have knowledge of Python and Polars and have a bit of Rust experience (or be able to pick it up relatively quickly). Not all concepts in Rust will be explained but we will link to material where you can find explanations.\r\n\r\n#### Goal\r\n\r\nTo empower Polars users who want to do more and do better with Polars. For folks who don't mind learning a new programming language, it is also a good opportunity to learn and practice writing in Rust.\r\n\r\n---\r\n\r\n## Preflight check\r\n\r\nIn this workshop, we expect you to have knowledge of Python and Polars and have a bit of Rust experience (or be able to pick it up relatively quickly). Not all concepts in Rust will be explained but we will link to material where you can find explanations.\r\n\r\nHere are the things that you should have installed when you started this workshop:\r\n\r\n- [Install/ Update Rust](https://www.rust-lang.org/tools/install)(we are using rustc version 1.86.0 here)\r\n- Make sure having Python 3.9 or above (assuming 3.13 in this workshop)\r\n- Make sure using virtual environment (recommend using uv >= 0.4.25)\r\n\r\n## Windows checklist\r\n\r\nIn this workshop we recommend using Unix OS (Mac or Linux). *If you use Windows, you may encounter problems with Rust and Maturin.* To minimise issues that you may encounter, please go through the extra checklist below:\r\n\r\n- Install the [c++ build tools](https://visualstudio.microsoft.com/downloads/)\r\n- [Check the `dll` files are linked correctly](https://pyo3.rs/v0.21.2/faq#im-trying-to-call-python-from-rust-but-i-get-status_dll_not_found-or-status_entrypoint_not_found)\r\n\r\n## Learning resources for Rust and PyO3\r\n\r\nTo wirte a Polars plugin, you will have to develop in Rust. If you are not familiar with Rust, we highly recommend you first check out some of the Rust learning resources so you can be prepare for the workshop. Here are some of our recommendations:\r\n\r\n- [The Rust Book](https://doc.rust-lang.org/book/title-page.html)\r\n- [Rustlings (Exerciese in Rust)](https://github.com/rust-lang/rustlings)\r\n- [Rust by Example](https://doc.rust-lang.org/rust-by-example/)\r\n- [Teach-rs (GitHub repo)](https://github.com/tweedegolf/teach-rs)\r\n\r\nAnother tool that we will be using will be PyO3 and Maturin. To learn more about them, please check out the following:\r\n\r\n- [The PyO3 user guide](https://pyo3.rs/)\r\n- [PyO3 101 - Writing Python modules in Rust](https://github.com/Cheukting/py03_101)\r\n\r\n## Setting up\r\n\r\n1. create a new working directory\r\n\r\n```\r\nmkdir polars-plugin-101\r\ncd polars-plugin-101\r\n```\r\n\r\n2. Set up virtual environment and activate it\r\n\r\n```\r\nuv venv .venv\r\nsource .venv/bin/activate\r\npython -m ensurepip --default-pip\r\n```\r\n*Note: the last command is needed as maturin develop cannot find pip otherwise*\r\n\r\n3. Install **polars** and **maturin**\r\n\r\n```\r\nuv pip install polars maturin\r\n```\r\n\r\nThese are the versions that we are using here:\r\n\r\n+ maturin==1.8.3\r\n+ polars==1.27.1\r\n\r\n---\r\n\r\nWorkshop materials: https://github.com/Cheukting/polars_plugin_101\r\n\r\n---\r\n\r\n#### Outline\r\n\r\n- Introduction (15 mins): \r\n    1. What is Polars plugin \r\n    2. How does it work (using Maturin to develop packages)\r\n    3. How to use it with Polars (exercises)\r\n- Simple numerical functions (35 mins): \r\n    1. Creating numerical functions with 1 input (exercise)\r\n    2. Creating numerical functions with multiple inputs in the same row (exercise)\r\n    3. Creating numerical functions that support multiple types (exercise)\r\n- Advance usage with Polars plugin (40 mins):\r\n    1. Creating functions with multiple inputs across different rows (exercise)\r\n    2. Functions with user-set parameters (exercise)\r\n    3. Working with strings and lists (exercise)", "recording_license": "", "do_not_record": false, "persons": [{"code": "8EGVC9", "name": "Cheuk Ting Ho", "avatar": "https://pretalx.com/media/avatars/8EGVC9_vBWTGiF.jpg", "biography": "After having a career as a Data Scientist and Developer Advocate, Cheuk dedicated her work to the open-source community. Currently, she is working as AI developer advocate for JetBrains. She has co-founded Humble Data, a beginner Python workshop that has been happening around the world. She has served the EuroPython Society board for two years and is now a fellow and director of the Python Software Foundation.", "public_name": "Cheuk Ting Ho", "guid": "716d26c2-170b-5a5e-86e5-9d4cecf3bbdd", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/8EGVC9/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/CP3TKB/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/CP3TKB/", "attachments": []}, {"guid": "37df43b6-91b2-597d-a5fa-767a1ac6faf0", "code": "N9CAUM", "id": 60853, "logo": null, "date": "2025-04-23T14:30:00+02:00", "start": "14:30", "duration": "01:30", "room": "Dynamicum", "slug": "pyconde-pydata-2025-60853-supplyseer-computational-supply-chain-with-python", "url": "https://pretalx.com/pyconde-pydata-2025/talk/N9CAUM/", "title": "supplyseer: Computational Supply Chain with Python", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Statistics", "type": "Tutorial", "language": "en", "abstract": "This talk introduces supplyseer, an open-source Python library that brings advanced analytics to Supply Chain and Logistics. By combining time series embedding techniques, stochastic process modeling, and geopolitical risk analysis, supplyseer helps organizations make data-driven decisions in an increasingly complex global supply chain landscape. The library implements novel approaches like Takens embedding for demand forecasting, Hawkes processes for modeling supply chain events, and Bayesian methods for inventory optimization. Through practical examples and real-world use cases, we'll explore how these mathematical concepts translate into actionable insights for supply chain practitioners.", "description": "Supplyseer bridges the gap between theoretical supply chain analytics and practical implementation by providing a pythonic interface to advanced mathematical concepts. This talk will walk through the library's core components and demonstrate how they solve real-world supply chain challenges.\r\n\r\nOutline:\r\n\r\n1. Introduction to Modern Supply Chain Analytics\r\n- The need for sophisticated analytics in today's complex supply chains\r\n- Why traditional methods fall short\r\n- The role of probabilistic modeling and topological analysis\r\n\r\n2. Core Mathematical Foundations\r\n- Time series embedding techniques using Takens' theorem\r\n- Stochastic process modeling for demand forecasting\r\n- Bayesian approaches to Economic Order Quantity (EOQ)\r\n- Point process modeling with Hawkes processes\r\n- Network analysis for supply chain risk assessment\r\n\r\n3. Library Architecture and Design Philosophy\r\n- Object-oriented design for supply chain analytics\r\n- Integration of multiple analytical approaches\r\n- Extensible architecture for custom analytics\r\n- Performance considerations and optimizations\r\n\r\n4. Key Features Deep Dive\r\na) Demand Forecasting Module\r\n   - Stochastic demand process simulation\r\n   - Time-delay embedding for pattern recognition\r\n   - Mixture density networks for uncertainty quantification\r\n\r\nb) Risk Analysis Tools\r\n   - Geopolitical risk assessment\r\n   - Supply chain network visualization\r\n   - Real-time monitoring and alerting\r\n   - Trade restriction impact analysis\r\n\r\nc) Inventory Optimization\r\n   - Bayesian EOQ implementation\r\n   - Multi-echelon inventory optimization\r\n   - Stockout probability calculation\r\n   - Vector field analysis for inventory dynamics\r\n\r\n5. Practical Applications\r\n- Route optimization with geopolitical risk consideration\r\n- You and your suppliers play cooperative games: game-theoretic Supply Chain\r\n- Supply Chain Digital Twins\r\n- Real-time risk monitoring and mitigation\r\n\r\n6. Integration with Data Science Ecosystem\r\n- Compatibility with pandas and polars\r\n- Integration with scikit-learn pipeline\r\n- Visualization with matplotlib and seaborn\r\n- Performance optimization with numpy\r\n\r\n7. Future Directions\r\n- Planned features and enhancements\r\n- Community contribution opportunities\r\n- Integration with other supply chain tools\r\n- Research directions in supply chain analytics\r\n\r\n8. Interactive Demonstrations\r\n- Live coding examples\r\n- Real-world data analysis\r\n- Visualization of supply chain dynamics\r\n- Risk assessment workflows\r\n\r\nThe talk will include code examples and practical demonstrations, showing how to:\r\n- Implement stochastic demand forecasting\r\n- Analyze supply chain risks using network analysis\r\n- Optimize inventory levels using Bayesian methods\r\n- Visualize supply chain dynamics using vector fields\r\n- Monitor and assess geopolitical risks\r\n\r\nTarget Audience:\r\nThis talk is aimed at data scientists, supply chain analysts, and Python developers interested in applying advanced analytics to supply chain problems. Attendees should have intermediate Python knowledge and basic familiarity with data science libraries like pandas and numpy.\r\n\r\nPrerequisites:\r\n- Python programming experience\r\n- Basic understanding of supply chain concepts\r\n- Familiarity with pandas and numpy\r\n- Basic knowledge of probability and statistics\r\n\r\nTakeaways:\r\nAttendees will learn:\r\n- How to implement advanced supply chain analytics in Python\r\n- Practical applications of mathematical concepts in supply chain\r\n- Best practices for supply chain data analysis\r\n- Techniques for visualizing and monitoring supply chain dynamics\r\n- Methods for quantifying and managing supply chain risks\r\n\r\nAll code examples and demonstrations will be available in a GitHub repository, allowing attendees to experiment with the concepts presented and apply them to their own supply chain challenges.\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b", "recording_license": "", "do_not_record": false, "persons": [{"code": "SJPXAQ", "name": "Jako Rostami", "avatar": "https://pretalx.com/media/avatars/SJPXAQ_5P4Zusw.jpeg", "biography": "I am a Machine Learning Engineer at H&M Group, former Data Scientist at Lidl Sweden, as a professional I am designing Machine Learning services, extracting insights and arranging meaningful stories for my clients by conducting high-quality modeling, engineering, data mining and analytics. \r\n\r\nI have a Bachelor degree in Statistics and Probability theory from Uppsala University of Sweden. Because I am a Statistician at core I have good experience with Data Sciencr, Python, R, time series modeling, simulations, machine learning algorithms, SQL, Excel, Spark and database technologies, as well as good communication skills. \r\n\r\nYou\u2019ll find two comprehensive Python libraries I have open-sourced. One is based on an emerging modern statistical hypothesis testing framework using e-values and martingales based on game-theoretic statistics. The other is for computational Supply Chain and Logistics. The first one is called \u2019expectation\u2019 and the second one is called \u2019supplyseer\u2019 and you can find both on my GitHub.", "public_name": "Jako Rostami", "guid": "5c03a26a-3999-531d-bc32-b6e67578bd2f", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/SJPXAQ/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/N9CAUM/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/N9CAUM/", "attachments": []}, {"guid": "8e129eb2-db03-5eaf-aff6-9531d138aa00", "code": "FGEUJJ", "id": 60141, "logo": null, "date": "2025-04-23T16:10:00+02:00", "start": "16:10", "duration": "00:30", "room": "Dynamicum", "slug": "pyconde-pydata-2025-60141-conformal-prediction-uncertainty-quantification-to-humanise-models", "url": "https://pretalx.com/pyconde-pydata-2025/talk/FGEUJJ/", "title": "Conformal Prediction: uncertainty quantification to humanise models", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Statistics", "type": "Talk", "language": "en", "abstract": "Quantifying model uncertainties is critical to improve model reliability and make sound decisions. Conformal Prediction is a framework for uncertainty quantification that provides mathematical guarantees of true outcome coverage, allowing more informed decisions to be made by stakeholders", "description": "Quantifying uncertainties of Machine Learning models is crucial to improve their reliability, accurately assess risks and make more robust decisions. By quantifying and understanding uncertainty, we can build more reliable and trustworthy systems.\r\n\r\nImagine we have a model that predicts whether or not a CT scan contains a tumour: traditional approaches tend to provide binary predictions, while not providing information on the model\u2019s confidence in each prediction.\r\n\r\nConformal Prediction (CP) is a framework for uncertainty quantification that offers an estimate of the confidence in the model\u2019s predictions: instead of providing just a point estimate, it provides a set of possible outcomes (prediction set), together with a measure of confidence in each outcome. These prediction sets come with a (mathematical!) guarantee of coverage of the true outcome, ensuring that they will detect at least a pre-fixed percentage of true values. CP is a model-agnostic paradigm, requiring no retraining of the model and making no major assumptions about the distribution of the data.\r\n\r\nWe Humans, when faced with uncertainty, tend to express indecision and offer alternatives. We will see that CP can be a key tool to include a human in the decision-making loop, once the \u2018humanised\u2019 machine is able to express its uncertainty.\r\n\r\nCP therefore offers a robust framework that allows stakeholders to make more informed decisions, even more so in high-risk sectors such as healthcare, finance and autonomous systems.", "recording_license": "", "do_not_record": false, "persons": [{"code": "FRRAE7", "name": "Vincenzo Ventriglia", "avatar": "https://pretalx.com/media/avatars/FRRAE7_coP1z5u.jpg", "biography": "A results-driven data professional \u2013 focused on hype-free solutions tailored to business needs.\r\n\r\nI am currently creating value at the National Institute of Geophysics and Volcanology (INGV), where I develop machine learning models in the Space Weather domain. My job is complemented by finding the hidden stories in data and make them accessible to stakeholders. I studied Physics in Italy (Napoli) and Germany (Frankfurt am Main), previously worked on Analytics in the strategic division of the world's largest professional services network, and in the Data Science department of the leading Italian publisher.\r\n\r\nWhen not at work, I enjoy theatre, talking about finance or learning a new language.", "public_name": "Vincenzo Ventriglia", "guid": "524ac8af-2da0-540e-946f-1676f3146ee6", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/FRRAE7/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/FGEUJJ/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/FGEUJJ/", "attachments": []}, {"guid": "35e5a287-372d-5742-9b46-682fd32e1381", "code": "VJR39N", "id": 61390, "logo": null, "date": "2025-04-23T17:10:00+02:00", "start": "17:10", "duration": "00:30", "room": "Dynamicum", "slug": "pyconde-pydata-2025-61390-citation-is-collaboration-software-recognition-in-research-and-industry", "url": "https://pretalx.com/pyconde-pydata-2025/talk/VJR39N/", "title": "Citation is Collaboration: Software Recognition in Research and Industry", "subtitle": "", "track": "PyData: Research Software Engineering", "type": "Talk", "language": "en", "abstract": "The development of open source software is increasingly recognized as a critical contribution across many disciplines, yet the mechanisms for credit and citation vary significantly. This talk uses astronomy as a case study to explore shared challenges in attributing software contributions across research and industry. It will review the evolution of journal recommendations and policies over the past decade, alongside emerging publishing practices offering insights into their impact on the recognition of software contributions. An analysis of citation patterns for widely used libraries (numpy, scipy, astropy) highlights trends over time and their dependence on publication venues and policies. The talk will conclude with strategies for both developers and users for improving the recognition of software, fostering collaboration and sustainability in software ecosystems. All data and analysis code will be made available in a public repository, supporting transparency and further study.", "description": "In many fields, including research and industry, software is essential for driving innovation and scientific discovery, yet mechanisms for crediting software developers remain inconsistent and underdeveloped. This lack of recognition, particularly for open-source contributions, can discourage participation in software development and limit career opportunities for developers. Astronomy, as a computationally intensive discipline with a rich history of open-source software contributions, offers a valuable case study to examine these challenges. Over the past decade, changes in journal policies and emerging publishing practices have sought to address the issue, but their impact on credit attribution remains unclear.\r\n\r\nThis talk addresses the issue of software credit by analyzing publication and citation practices in astronomy. It evaluates how existing policies acknowledge software contributions and examines variations across journals and over time. Drawing on bibliometric data from the past decade, the analysis focuses on citation patterns for commonly used libraries, trends in citation rates, and the influence of journal policies. The study includes both foundational libraries, such as NumPy, and astronomy-specific libraries, such as Astropy. Based on these findings, the talk will offer recommendations to enhance the attribution of software contributions.\r\n\r\nThe issue of recognizing research software is not unique to astronomy or research. Participants from industry, other computationally driven fields, open-source communities, and publishing will find the insights applicable to their own disciplines. Understanding how software is cited and credited is critical for shaping more equitable recognition systems, which in turn support sustainable software development and community growth. The audience will leave with a clear understanding of how astronomy\u2019s experience can inform broader efforts to address similar challenges in their respective fields.\r\n\r\n The data and code for the analysis will be shared with participants giving participants access to a reproducible framework for analyzing software citation practices in other disciplines or software ecosystems.", "recording_license": "", "do_not_record": false, "persons": [{"code": "LAMEH3", "name": "Ivelina Momcheva", "avatar": "https://pretalx.com/media/avatars/LAMEH3_0bZqlVj.jpg", "biography": "I am the Lead of the Data Science Group at the Max Planck Institute for Astronomy in Heidelberg, Germany and an editor for the Journal of Open Source Software (JOSS). My scientific work focuses on galaxy evolution. I get my thrills from gravitational lenses, spectra, databases and well-documented APIs.", "public_name": "Ivelina Momcheva", "guid": "b361fc34-63d0-55b4-935a-9f2d040d5de2", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/LAMEH3/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/VJR39N/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/VJR39N/", "attachments": []}, {"guid": "3bbdcc48-25fe-5316-9135-da0edd4f0dc2", "code": "8S3RC3", "id": 61908, "logo": null, "date": "2025-04-23T17:50:00+02:00", "start": "17:50", "duration": "00:30", "room": "Dynamicum", "slug": "pyconde-pydata-2025-61908-build-a-personalized-commute-agent-in-python-with-hopsworks-langgraph-and-llm-function-calling", "url": "https://pretalx.com/pyconde-pydata-2025/talk/8S3RC3/", "title": "Build a personalized Commute agent in Python with Hopsworks, LangGraph and LLM Function Calling", "subtitle": "", "track": "PyData: Data Handling & Engineering", "type": "Sponsored Talk", "language": "en", "abstract": "The invention of the clock and the organization of time in zones have helped synchronize human activities across the globe. While timekeepers are better at planning and sticking to the plan, time optimists somehow believe that time is malleable and extends the closer the deadline. Nevertheless, whether you are an organized timekeeper or a creative timebender, external factors can affect your commute.\r\n\r\nIn this talk, we will define the different components necessary to build a personalized commute virtual agent in Python. The agent will help you analyze your historical lateness records, estimate future delays, and suggest the best time to leave home based on these predictions. It will be powered by a LLM and will use a technique called Function Calling to recognize the user intent from the conversation history and provide informed answers.", "description": "The invention of the clock and the organization of time in zones have helped synchronize human activities across the globe. While timekeepers are better at planning and sticking to the plan, time optimists somehow believe that time is malleable and extends the closer the deadline. Nevertheless, whether you are an organized timekeeper or a creative timebender, external factors can affect your commute.\r\n\r\nIn this talk, we will define the different components necessary to build a personalized commute virtual agent in Python. The agent will help you analyze your historical lateness records, estimate future delays, and suggest the best time to leave home based on these predictions. It will be powered by a LLM and will use a technique called Function Calling to recognize the user intent from the conversation history and provide informed answers.\r\n\r\nThe ML system will be built in Python, following the best practices of the FTI (feature/training/inference) pipeline architecture, on top of the open-source Hopsworks AI lakehouse, which will provide the necessary ML infrastructure, such as the feature store, model serving, and a model registry. The agent will be designed with LangGraph and powered by a LLM running on the vLLM inference engine.", "recording_license": "", "do_not_record": false, "persons": [{"code": "RUSBZM", "name": "Javier de la R\u00faa Mart\u00ednez", "avatar": "https://pretalx.com/media/avatars/RUSBZM_KMNK45p.png", "biography": "Javier is a Research Engineer at Hopsworks, where he actively contributes to advancing the Hopsworks AI Lakehouse. He is currently pursuing his Ph.D. at KTH Royal Institute of Technology in Sweden with a primary focus on large-scale machine learning systems.", "public_name": "Javier de la R\u00faa Mart\u00ednez", "guid": "65c45afd-14d2-509d-b5e2-901653149804", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/RUSBZM/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/8S3RC3/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/8S3RC3/", "attachments": []}]}}, {"index": 2, "date": "2025-04-24", "day_start": "2025-04-24T04:00:00+02:00", "day_end": "2025-04-25T03:59:00+02:00", "rooms": {"Zeiss Plenary (Spectrum)": [{"guid": "80fbf8d2-ae9a-5452-ba4a-dcab51e19916", "code": "EGNBHD", "id": 65262, "logo": null, "date": "2025-04-24T09:05:00+02:00", "start": "09:05", "duration": "00:45", "room": "Zeiss Plenary (Spectrum)", "slug": "pyconde-pydata-2025-65262-chasing-the-dark-universe-with-euclid-and-python-unveiling-the-secrets-of-the-cosmos", "url": "https://pretalx.com/pyconde-pydata-2025/talk/EGNBHD/", "title": "Chasing the Dark Universe with Euclid and Python: Unveiling the Secrets of the Cosmos", "subtitle": "", "track": "Keynote", "type": "Keynote", "language": "en", "abstract": "The ESA Euclid mission, launched in July 2023, is on a quest to unravel the mysteries of dark energy and dark matter: the enigmatic components that make up 95% of the Universe. By mapping one-third of the sky with unprecedented precision, Euclid is building the largest 3D map of the cosmos.\r\n\r\nThis talk explores how cosmologists bridge theory and and Euclid observation to reveal the hidden nature of dark energy and the dark matter. We will delve into the challenges of cosmological inference, where advanced statistical methods and Python-based pipelines compare theoretical models against Euclid's vast datasets, and we will explain how Bayesian inference, machine learning, and state-of-the-art simulations are revolutionizing our understanding of the cosmos.", "description": "The Euclid mission, a European Space Agency-led mission launched in July 2023, is set to transform our understanding of the Universe by exploring its most elusive constituents: dark energy and dark matter. Together, they account for 95% of the cosmos, dictating its structure, evolution, and eventual fate. Euclid is currently surveying one-third of the sky to construct the most extensive 3D map of the Universe ever created. By using deep imaging and spectroscopic data, it traces the distribution of galaxies and the subtle distortions caused by gravitational lensing with unparalleled precision.\r\n\r\nBy connecting theory with observations, Euclid aims to uncover the properties of dark energy driving cosmic acceleration and the distribution of dark matter shaping large-scale cosmic structures. At the heart of this endeavor lies the challenge of cosmological statistical inference: extracting robust conclusions about the nature of dark energy and dark matter from vast, complex datasets. This talk will explore how cutting-edge statistical techniques and powerful computational tools, including Python-based analysis pipelines, are being used to compare theoretical models against Euclid's observations. We will discuss the role of Bayesian inference, machine learning, and advanced simulations in constraining cosmological parameters and testing extensions to the standard model of cosmology.", "recording_license": "", "do_not_record": false, "persons": [{"code": "BH89AW", "name": "Guadalupe Canas Herrera", "avatar": "https://pretalx.com/media/avatars/BH89AW_Hwx77Fx.jpeg", "biography": "Guadalupe is a Theoretical Cosmologist working in understanding how the Universe began, how it evolved and what its ultimate fate could be. In particular, she is interested in studying alternative cosmological models with state-of-the-art astrophysical data using advanced statistical techniques and data science algorithms. Furthermore, she is interested in forecasting the performance of new experiments or new observables, for instance, Gravitational Waves.\r\n\r\nShe holds a Bachelor's in Physics from the University of Cantabria, and Master's and PhD degrees in Cosmology from Leiden University. Currently, she is a Research Fellow in Space Science at the European Space Agency. Moreover, she is an active member of the Euclid Consortium: the scientific group behind the data explotaition of the ESA Euclid mission. In particular, she is the maintainer of the code \"Cosmology Likelihood for Observables in Euclid\" or simply, CLOE. This software is part of the official data anlysics pipeline that will be eventually used to extract cosmological constraints of the Euclid data. Within the consortium, she is also co-leading the responsible group in charge of testing models beyond-Standard Cosmological Models to discernish the nature of Dark Matter or Dark Energy, or to test alternative inflationary models.", "public_name": "Guadalupe Canas Herrera", "guid": "2958aa08-3ad8-51ac-9d69-fbdcf91fdd17", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/BH89AW/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/EGNBHD/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/EGNBHD/", "attachments": []}, {"guid": "94d358eb-ea6c-5e25-8bb4-dc8688d468b8", "code": "TQN98D", "id": 61093, "logo": null, "date": "2025-04-24T10:15:00+02:00", "start": "10:15", "duration": "00:30", "room": "Zeiss Plenary (Spectrum)", "slug": "pyconde-pydata-2025-61093-algorithmic-music-composition-with-python", "url": "https://pretalx.com/pyconde-pydata-2025/talk/TQN98D/", "title": "Algorithmic Music Composition With Python", "subtitle": "", "track": "PyCon: Python Language & Ecosystem", "type": "Talk", "language": "en", "abstract": "Computers have long been an integral part of creating music. Virtual instruments and digital audio workstations make creating music easy and accessible. But how do programming languages and especially Python fit into this? Python can serve as a tool for creating musical notation\r\nand MIDI files.   \r\n\r\nThroughout the session, you\u2019ll learn how to:\r\n\r\n- Use Python to create melodies, harmonies, and rhythms.\r\n- Generate music based on rules, randomness, and mathematical principles.\r\n- Visualize and export your compositions as MIDI and sheet music.\r\n\r\nBy the end of the talk, you\u2019ll have a clear understanding of how to turn simple algorithms into expressive musical works.", "description": "This talk provides a general introduction into creating music algorithmically using Python. Little prior knowledge about music is assumed. It is helpful to know how sheet music looks and what the MIDI format is beforehand. \r\n\r\nWe will start by looking briefly into the basic building blocks of music (harmony, melody and rhythm) and what our goal is (creating sheet music and a playable MIDI file). \r\n\r\nThen we will discuss the history of algorithmic composition in music and from that we will develop ideas how we can create music from algorithms and randomness. \r\n\r\nFor creating sheet music we will look into the packages Abjad und music21. \r\n\r\nIn the end will we will create a playable MIDI file for our music using MIDIUtil.", "recording_license": "", "do_not_record": false, "persons": [{"code": "J9DJ83", "name": "Hendrik Niemeyer", "avatar": "https://pretalx.com/media/avatars/J9DJ83_I5rJQpf.jpg", "biography": "Hendrik is a C++ developer and works on software for analysis of pipeline inspection data. This includes topics like machine learning, \r\nnumerical mathematics and distributed computing. Before this he completed his PhD in physics at the University of Osnabr\u00fcck with a thesis about quantum mechanics and \r\nnumerical simulations where he got to know and and love programming and complex, mathematical tasks. \r\nHis favorite programming languages, in which he also has the most experience, are C++, Python and Rust. He describes himself as a \"learning enthusiast\" \r\nwho always gets absorbed in trying out new things. Therefore, he values being up to date with programming languages \r\nand using the latest features of them in a meaningful way.", "public_name": "Hendrik Niemeyer", "guid": "592b640d-9925-5aa6-aaab-2e4189a16a30", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/J9DJ83/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/TQN98D/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/TQN98D/", "attachments": [{"title": "Slides", "url": "/media/pyconde-pydata-2025/submissions/TQN98D/resources/algor_aMTLT5q.pptx", "type": "related"}]}, {"guid": "660cbbd0-e597-5bb8-b9bf-739f140d89e0", "code": "TAXVSC", "id": 68523, "logo": null, "date": "2025-04-24T11:00:00+02:00", "start": "11:00", "duration": "01:00", "room": "Zeiss Plenary (Spectrum)", "slug": "pyconde-pydata-2025-68523-ai-in-reality-fireside-chat-enterprise-ai-open-source-innovation", "url": "https://pretalx.com/pyconde-pydata-2025/talk/TAXVSC/", "title": "AI in Reality Fireside Chat: Enterprise AI & Open\u2011Source Innovation", "subtitle": "", "track": "General: Others", "type": "Panel", "language": "en", "abstract": "This fireside chat brings together leading voices from industry and open-source to explore how artificial intelligence is being meaningfully integrated into enterprise environments\u2014beyond the buzzwords. Moderated by Alexander CS Hendorf, the conversation features Walid Mehanna (Chief Data Officer, Merck), Dr. Alexander Beck (CTO, Quoniam), and Ines Montani (co-founder explosion.ai, spaCy), who share their diverse perspectives from pharmaceuticals, finance, and AI tooling.\r\n\r\nTogether, they\u2019ll explore the cultural, technical, and ethical dimensions of AI adoption in large organizations, the growing influence of open-source ecosystems, and the long-term vision required to build sustainable, human-centered AI systems. This session is designed for those who want to move past the hype and better understand what real-world innovation at scale looks like\u2014and what it demands from leadership, infrastructure, and community.", "description": "While headlines are dominated by generative AI breakthroughs and ever-larger models, some of the most meaningful progress is happening quietly\u2014in enterprises that are aligning AI with long-term strategy and in open-source communities driving technical excellence. This session brings together Walid Mehanna (Chief Data Officer, Merck), Dr. Alexander Beck (CTO, Quoniam), and Ines Montani (co-founder of explosion.ai/spaCy) in a live conversation moderated by Alexander CS Hendorf.\r\n\r\nTogether, they\u2019ll explore how open-source tools shape enterprise AI adoption, the cultural and organizational shifts needed to move beyond pilots and prototypes, and the responsibilities that come with deploying AI in production. From internal LLM platforms and research pipelines to industry collaboration and digital ethics, the panel will offer grounded, practical insights from vastly different domains.\r\n\r\nThis isn\u2019t another panel about AI buzzwords. It\u2019s a discussion about building AI systems that matter\u2014tools that integrate with people, processes, and purpose. The audience can expect a thoughtful, forward-looking exchange between builders, strategists, and leaders who are working at the edge of what\u2019s possible, while keeping a strong eye on what\u2019s meaningful.", "recording_license": "", "do_not_record": false, "persons": [{"code": "8F38DV", "name": "Alexander CS Hendorf", "avatar": "https://pretalx.com/media/avatars/8F38DV_QTtqqiS.jpg", "biography": "Alexander C. S. Hendorf has over 20 years of experience in digitalization, data, and artificial intelligence. As an independent consultant, he focuses on the practical implementation, adoption, and communication of data- and AI-driven strategies and decision-making processes.\r\n\r\nWhile still in law school, he worked as a DJ\u2014before dropping out to join a transatlantic music start-up. The venture evolved into a decent independent label group and, eventually, a small stock corporation, where Alexander became a partner and, at 28, took over as COO. He led the company\u2019s digital transformation and designed systems that could scale with growth. This entrepreneurial journey laid the foundation for his deep understanding of business strategy, technology, and innovation.\r\n\r\nAfter closing the chapter on digital music, Alexander turned his focus to data science and AI\u2014initially driven by curiosity, with weekends on Coursera and evenings on GPUs. That passion evolved into a career advising organizations on AI integration, data strategy, and building impact-driven teams.\r\n\r\nSome say he just picks the flashiest jobs\u2014record label owner, data scientist\u2014but really, he follows his passion: for what\u2019s new, what matters, and what connects people and technology.\r\n\r\nToday, he supports clients\u2014especially in regulated or legacy-heavy industries\u2014in aligning emerging technologies with real-world business goals. His work emphasizes cultural impact, sustainable change, and interdisciplinary thinking.\r\n\r\nAlexander is a recognized expert in data intelligence and a frequent speaker and chair at international conferences, including PyCon DE & PyData, Data2Day, and EuroPython. He\u2019s a Python Software Foundation Fellow, EuroPython Fellow, and board member of the Python Software Verband (Germany).\r\n\r\nSince 2024, he has been driving [Pioneers Hub](https://pioneershub.org), a non-profit supporting vibrant, inclusive tech communities\u2014and helping innovators keep pace in a rapidly changing world.", "public_name": "Alexander CS Hendorf", "guid": "e61ae96e-6f0d-5312-867d-6bf04eefb64f", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/8F38DV/"}, {"code": "PYUNBG", "name": "Dr. Alexander Beck", "avatar": null, "biography": null, "public_name": "Dr. Alexander Beck", "guid": "d5140713-b17b-5e48-b149-d2c29bbe13f4", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/PYUNBG/"}, {"code": "SMXFQK", "name": "Walid Mehanna", "avatar": null, "biography": null, "public_name": "Walid Mehanna", "guid": "843d44d5-9c85-5a06-a5a4-60676d1d1d68", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/SMXFQK/"}, {"code": "FZKG9N", "name": "Ines Montani", "avatar": "https://pretalx.com/media/avatars/FZKG9N_5iBQp5R.jpg", "biography": "Ines Montani is a developer specializing in tools for AI and NLP technology. She\u2019s the co-founder and CEO of Explosion and a core developer of spaCy, a popular open-source library for Natural Language Processing in Python, and Prodigy, a modern annotation tool for creating training data for machine learning models.", "public_name": "Ines Montani", "guid": "b60e58b3-bd41-534c-a286-22ae8481a00a", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/FZKG9N/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/TAXVSC/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/TAXVSC/", "attachments": []}, {"guid": "bcd47be5-330b-553c-88ac-820ab1217faf", "code": "3FUYVH", "id": 64179, "logo": null, "date": "2025-04-24T13:25:00+02:00", "start": "13:25", "duration": "00:45", "room": "Zeiss Plenary (Spectrum)", "slug": "pyconde-pydata-2025-64179-machine-learning-models-in-a-dynamic-environment", "url": "https://pretalx.com/pyconde-pydata-2025/talk/3FUYVH/", "title": "Machine Learning Models in a Dynamic Environment", "subtitle": "", "track": "Keynote", "type": "Keynote", "language": "en", "abstract": "\"We've only tested the happy path - now users are finding all sorts of creative ways to break the app.\"\r\n\r\nWhat is already a cause for headaches in traditional software engineering turns into a large challenge when the application is based on machine learning models: Data distribution may change from training phase to deployment. Even worse, humans interacting with the model may adjust their behaviour to the model making the gap between original training environment and deployment even larger. When deployed in a public environment the model may be exposed to users trying to game the system. When re-trained it may be exposed to users trying to poison the pool of training data.\r\n\r\nWe will take a tour of historic cases of models being gamed: What are the lessons we learnt a long time ago building e-mail spam filters? What happened when high search engine rankings started to be linked to monetary income? How can personalization and targeted advertising be exploited to influence public discourse? \r\n\r\n\u201c\u2026 it should be clear that improvements in communication tend to divide mankind \u2026\u201d by Harold Innis in Changing Concepts of Time\r\n\r\nThis keynote will turn interactive engaging the audience in sharing their stories on users playing interesting games with deployed models - including counter moves rolled out. \r\n\r\nIf we are to learn from IT security experience, one important ingredient to address these issues is a combination of collaboration and transparency - across organisations.", "description": "\"Collect data, choose an algorithm, train a model to match your target metric and deploy to production.\" ... sounds easy enough.\r\n\r\nBut what if user behaviour changes after the model was deployed? What if the deployment of the model itself causes a change in user behaviour? \r\n\r\nThis talk will look at examples for models changing user behaviour. In the interactive part the talk will collect stories from the audience.", "recording_license": "", "do_not_record": false, "persons": [{"code": "8SZGZW", "name": "Isabel Drost-Fromm", "avatar": "https://pretalx.com/media/avatars/8SZGZW_Byqma2O.png", "biography": "Isabel Drost-Fromm was up to recently the Chair of the board of directors of the InnerSource Commons Foundation, as well as (former board) member of the Apache Software Foundation. Interested in all things search and text mining with a thorough background in open source collaboration, she is working at Europace AG as Open Source Strategist. True to the nature of people living in Berlin she loves giving friends a reason for a brief visit - as a result she co-founded and is still one of the creative heads behind Berlin Buzzwords, a tech conference on all things search, scale and storage and FOSS Backstage.", "public_name": "Isabel Drost-Fromm", "guid": "547bb40d-96cf-5f3e-b168-cd4d8effcddc", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/8SZGZW/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/3FUYVH/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/3FUYVH/", "attachments": []}, {"guid": "4a4ebfe4-de4a-5d58-a2ac-9c7eda651b94", "code": "K9ACTV", "id": 61261, "logo": null, "date": "2025-04-24T14:20:00+02:00", "start": "14:20", "duration": "00:30", "room": "Zeiss Plenary (Spectrum)", "slug": "pyconde-pydata-2025-61261-safeguard-your-precious-api-endpoints-built-on-fastapi-using-oauth-2-0", "url": "https://pretalx.com/pyconde-pydata-2025/talk/K9ACTV/", "title": "Safeguard your precious API endpoints built on FastAPI using OAuth 2.0", "subtitle": "", "track": "PyCon: Security", "type": "Talk", "language": "en", "abstract": "Is implementing authorization on your API endpoints an afterthought? Who should have access to your API endpoints? Is it secure? This talk covers using OAuth 2.0 to secure API endpoints built on FastAPI following industry-recognized best practices. Come on a journey with me from taking your API endpoints to being functional AND secure. When you follow secure identity standards, you\u2019ll be equipped with a deeper understanding of the critical need for authorization.", "description": "Audience Level: Beginners, Pythonistas who build on FastAPI who are not necessarily security experts but still need to deploy secure APIs.\r\n\r\nHistory of OAuth 2.0? (3 mins)\r\n- Background/history on OAuth \r\n- Why do we need OAuth 2.0?\r\n\r\nAuthorization Challenge (2 mins)\r\n- Why implement secure authorization now rather than later?\r\n- Data sensitivity\r\n\r\nOAuth 2.0 Overview (3 mins)\r\n- Core concepts\r\n- Key features: What are JWTs?\r\n- Benefits of using OAuth 2.0\r\n\r\nTechnical Implementation (4 mins)\r\n- Components of OAuth 2.0 \r\n- Different types of authorization flows and use cases\r\n- API setup on FastAPI\r\n\r\nDemo with FastAPI (12 mins) \r\n- Create an endpoint in FastAPI framework and secure it with OAuth 2.0\r\n- What are the different identity providers that can provide authorization?\r\n- Troubleshooting common issues \r\n\r\nBest Practices (4 mins)\r\n- Industry-standard protocol\r\n- Token-based security \r\n- Should you build your authorization server?\r\n\r\nNext Steps (2 mins)\r\n- Ability to integrate/provide SSO with various IdPs\r\n- Share resources to learn more including blogs, GitHub repo, etc.\r\n- Got questions? Connect with me!", "recording_license": "", "do_not_record": false, "persons": [{"code": "KZSJAZ", "name": "Semona Igama", "avatar": "https://pretalx.com/media/avatars/KZSJAZ_ZhEbzCH.jpg", "biography": "Semona is a Developer Advocate at Okta. She enjoys chatting about OpenID Connect, OAuth 2.0, and web security, but most of all, learning how developers learn best. Outside work, Semona is a Pythonista, loves kombucha, and plays board/role-playing games and Ultimate!", "public_name": "Semona Igama", "guid": "a4184001-c41e-5854-9813-18782466c1e0", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/KZSJAZ/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/K9ACTV/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/K9ACTV/", "attachments": []}, {"guid": "26a465e0-7a94-5372-a8d2-59c0da795ddf", "code": "VFE78U", "id": 61234, "logo": null, "date": "2025-04-24T15:00:00+02:00", "start": "15:00", "duration": "00:45", "room": "Zeiss Plenary (Spectrum)", "slug": "pyconde-pydata-2025-61234-they-are-not-unit-tests-a-survey-of-unit-testing-anti-patterns", "url": "https://pretalx.com/pyconde-pydata-2025/talk/VFE78U/", "title": "They are not unit tests: a survey of unit-testing anti-patterns", "subtitle": "", "track": "PyCon: Testing", "type": "Talk (long)", "language": "en", "abstract": "The entire industry approves of unit testing but almost no one can fully agree on how to do it correctly, or even on what unit tests are. This results in unit tests often being associated with slower development cycle and an overall less enjoyable workflow. I'll show you how testing turns into hell in real enterprises with the most common anti-patterns and then I'll show you that most of them are avoidable with modern tooling like mutation testing, snapshot testing, dirty-equals, and many more. We'll discuss how to make tests speed up your development and make refactoring easy.", "description": "Similar to TDD, unit tests are one of the most misunderstood concepts in software engineering. In this session, I will cover the most important fallacies about unit testing and the most common anti-patterns. I will also show you how modern infrastructure (pytest-fixture-classes, inline-snapshot, dirty-equals, import-linter, mutmut, and pytest-xdist) makes it possible to avoid most of them. \r\n\r\nWe will discuss that the real goal of tests is not always stability and how tests often make refactoring and restructuring your project easy, not hard. I will define my criteria for good tests and then for the rest of the session, we will be using it to analyze anti-patterns and explore modern solutions to them. You will see:\r\n\r\n1. How people make their \"units\" too small and how you can prevent it using import-linter\r\n2. How people make their \"units\" too big and what architectural patterns can you use to make them smaller\r\n3. How the real value of tests is in the quality of their assertions and how mutation testing can measure it for you\r\n4. How people end up with asserting too much, and how inline-snapshot and dirty-equals make this problem obsolete\r\n5. How people try to cover the volatile parts of their software, and how coveragepy already has tooling to prevent it\r\n6. How slow tests hurt you, and how to make your tests fast even if you tried it many times and failed\r\n7. How to build an architecture that makes writing tests hard, and how to make it easy using inline-snapshot, pytest-fixture-classes, and a few clever tricks\r\n8. How you can mock your way into making your tests useless, what you should actually mock and how testcontainers can help you with that\r\n\r\nAfter this session, your tests will become your friend instead of slowing you down.", "recording_license": "", "do_not_record": false, "persons": [{"code": "UDHJSP", "name": "Stanislav Zmiev", "avatar": "https://pretalx.com/media/avatars/UDHJSP_JIPG5fp.JPG", "biography": "Experienced platform engineer and architect with a passion for open source and developer tools. The author of Cadwyn -- a sophisticated API Versioning framework based on FastAPI. A contributor to numerous projects such as CPython and tortoise-orm. Currently building the future of finance at Monite.", "public_name": "Stanislav Zmiev", "guid": "d6f56f46-5226-5e46-b9be-1e542c1df1b3", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/UDHJSP/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/VFE78U/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/VFE78U/", "attachments": []}, {"guid": "7ab03db9-781a-5db2-a764-756a0cbcb2d5", "code": "9TRFCK", "id": 68399, "logo": null, "date": "2025-04-24T16:15:00+02:00", "start": "16:15", "duration": "01:00", "room": "Zeiss Plenary (Spectrum)", "slug": "pyconde-pydata-2025-68399-pyladies-panel-ai-skills-careers", "url": "https://pretalx.com/pyconde-pydata-2025/talk/9TRFCK/", "title": "PyLadies Panel: AI Skills & Careers", "subtitle": "", "track": "General: Education, Career & Life", "type": "Panel", "language": "en", "abstract": "As generative AI and autonomous agents rapidly transform the workplace, the skills required to thrive are evolving just as quickly. This panel will explore the essential AI skills that are driving career growth.", "description": "In this panel, we will have some of our PyLadies & Friends discuss  career challenges in the age of \"everything AI\", and how to overcome them. \r\n\r\nAs generative AI and autonomous agents rapidly transform the workplace, the skills required to thrive are evolving just as quickly. This panel will explore the needed AI skills that are driving career growth.\r\n\r\nWhether you are at the beginning of your career or a very experienced Pythonista, this panel is for you!", "recording_license": "", "do_not_record": false, "persons": [{"code": "NMACLQ", "name": "Tereza Iofciu", "avatar": "https://pretalx.com/media/avatars/NMACLQ_M0SmHO9.jpeg", "biography": "Tereza Iofciu is data leadership coach and  a data practitioner She has more than 15 years of experience in Data Science, Data Engineering, Product Management and Team Management. Alongside that she spent most of those years volunteering in the Python Community and wears many hats: PyLadies Hamburg organizer, Python Software Verband board member, Python Software Foundation Code of Conduct team member, Diversity & Inclusion working group member, PyConDE & PyData Berlin organizer, Python Pizza Hamburg organizer, and PyPodcats co-leader. In 2021 Tereza was awarded the Python Software Foundation community service award.", "public_name": "Tereza Iofciu", "guid": "9f1c4db3-3e40-5e40-a06d-ad540d3a75fc", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/NMACLQ/"}, {"code": "BJ3JTQ", "name": "Anastasia Karavdina", "avatar": null, "biography": "My background is particle physics, where I was completely spoiled by access to large amounts of data and the freedom to try out every hot ML algorithm on it. The experiments I participated in were so-called large scale experiments (e.g Large Hadron Collider) and had from 500+ up to 2.5k other people working on them. So in addition to physics, I was exposed to the best software development practices that helped us to avoid a complete mess and destroy the Universe. \r\n\r\nAfterwards I was working as Data Scientist in various fields and recently became \"Solution Architect ML/AI and BI\" at big enterprise company. \r\n\r\nDuring my free time, I like learning new tools and techniques and implementing them in end-to-end AI/ML and IoT projects. My experience has also been very helpful in guiding data analysts, data scientists, and machine learning engineers as a mentor and contributing to the growth of the next generation of data scientist elite.", "public_name": "Anastasia Karavdina", "guid": "486fc76c-9e80-5b7b-9522-aaa03e19ee45", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/BJ3JTQ/"}, {"code": "HCWQZW", "name": "Jesper Dramsch", "avatar": "https://pretalx.com/media/avatars/HCWQZW_H2mkmrg.jpg", "biography": "Jesper Dramsch works at the intersection of machine learning and physical, real-world data. Currently, they're working as a scientist for machine learning in numerical weather prediction at the coordinated organisation ECMWF.\r\n\r\nJesper is a fellow of the Software Sustainability Institute, creating awareness and educational resources around the reproducibility of machine learning results in applied science. Before, they have worked on applied exploratory machine learning problems, e.g. satellites and Lidar imaging on trains, and defended a PhD in machine learning for geoscience. During the PhD, Jesper wrote multiple publications and often presented at workshops and conferences, eventually holding keynote presentations on the future of machine learning in geoscience.\r\n\r\nMoreover, they worked as consultant machine learning and Python educator in international companies and the UK government. They create educational notebooks on Kaggle applying ML to different domains, reaching rank 81 worldwide out of over 100,000 participants and their video courses on Skillshare have been watched over 128 days by over 4500 students.  Recently, Jesper was invited into the Youtube Partner programme creating videos around programming, machine learning, and tech.", "public_name": "Jesper Dramsch", "guid": "7992ef9b-b67c-5de3-9370-79479801a771", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/HCWQZW/"}, {"code": "BH89AW", "name": "Guadalupe Canas Herrera", "avatar": "https://pretalx.com/media/avatars/BH89AW_Hwx77Fx.jpeg", "biography": "Guadalupe is a Theoretical Cosmologist working in understanding how the Universe began, how it evolved and what its ultimate fate could be. In particular, she is interested in studying alternative cosmological models with state-of-the-art astrophysical data using advanced statistical techniques and data science algorithms. Furthermore, she is interested in forecasting the performance of new experiments or new observables, for instance, Gravitational Waves.\r\n\r\nShe holds a Bachelor's in Physics from the University of Cantabria, and Master's and PhD degrees in Cosmology from Leiden University. Currently, she is a Research Fellow in Space Science at the European Space Agency. Moreover, she is an active member of the Euclid Consortium: the scientific group behind the data explotaition of the ESA Euclid mission. In particular, she is the maintainer of the code \"Cosmology Likelihood for Observables in Euclid\" or simply, CLOE. This software is part of the official data anlysics pipeline that will be eventually used to extract cosmological constraints of the Euclid data. Within the consortium, she is also co-leading the responsible group in charge of testing models beyond-Standard Cosmological Models to discernish the nature of Dark Matter or Dark Energy, or to test alternative inflationary models.", "public_name": "Guadalupe Canas Herrera", "guid": "2958aa08-3ad8-51ac-9d69-fbdcf91fdd17", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/BH89AW/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/9TRFCK/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/9TRFCK/", "attachments": []}, {"guid": "63bb1365-e628-5087-a13c-aa3892437986", "code": "SUDMDV", "id": 68195, "logo": null, "date": "2025-04-24T17:45:00+02:00", "start": "17:45", "duration": "01:30", "room": "Zeiss Plenary (Spectrum)", "slug": "pyconde-pydata-2025-68195-lightning-talks-2-2", "url": "https://pretalx.com/pyconde-pydata-2025/talk/SUDMDV/", "title": "Lightning Talks (2/2)", "subtitle": "", "track": "General: Others", "type": "Lightning Talks", "language": "en", "abstract": "Lightning Talks at PyCon DE & PyData are short, 5-minute presentations open to all attendees. They\u2019re a fun and fast-paced way to share ideas, showcase projects, spark discussions, or raise awareness about topics you care about \u2014 whether technical, community-related, or just inspiring. No slides are required, and talks can be spontaneous or prepared. It\u2019s a great chance to speak up and connect with the community!\r\n\r\nPlease note: community conference and event announcements are limited to 1 minute only.  All event announcements will be collected in a slide slide deck.", "description": "### \u26a1 Lightning Talk Rules\r\n\r\n* No promotion for products or companies.\r\n* No call for 'we are hiring' (but you may name your employer).\r\n* One LT per person per conference policy.\r\n\r\n#### Community Event Announcements\r\n\r\n* \u23f1 You want to announce a community event? You have ONE minute.\r\n* All event announcements will be collected in a single slide slide deck, see instructions at the Lightning Talk desk in the Community Space\r\n  in\r\n  the Lounge on Level 1.\r\n\r\n#### All other LTs:\r\n\r\n* \u23f1 You have exactly 5 minutes. The clock starts when you start \u2014 and ends when time\u2019s up. That\u2019s the thrill of Lightning Talks \u26a1\r\n* \ud83c\udfaf Be sharp, clear, and fun. Introduce your idea, make your point, give the audience something to remember. No pressure. (Okay, maybe a\r\n  little.)\r\n* \ud83c\udfb2 You must include at least **one entry from the [official Bingo Card list](/bingocard/)**. Every audience member will receive a Bingo\r\n  card \u2014 and they\u2019ll be\r\n  watching \ud83d\udc40 Your job? Choose at least one Bingo item from the [official Bingo Card list\u2014](/bingocard/)and drop it into your talk. Subtly or\r\n  dramatically \u2014 your style.\r\n* \ud83d\udc0d Keep it relevant to Python, PyData and the community. You can go broad \u2014 tools, workflows, stories, experiments \u2014 as long as there\u2019s\r\n  some connection to Python, PyData or the community.\r\n* \ud83d\udc4f Keep it respectful. Keep it awesome. Humor is welcome, but please be kind, inclusive, and professional.\r\n* \ud83c\udfa4 Be ready when your name is called. We\u2019re running a tight session \u2014 speakers go on stage rapid-fire. Stay close and stay hyped.\r\n* \ud83c\udfc6 Bonus prizes may be awarded. Best talk, best Bingo moment, most unexpected Hogwarts reference... who knows what could happen?\r\n\r\n#### How to Submit\r\n\r\nThe Lightning Talk desk is located in the Community Space in the Lounge on Level 1.", "recording_license": "", "do_not_record": false, "persons": [{"code": "S3GNBU", "name": "Valerio Maggio", "avatar": "https://pretalx.com/media/avatars/S3GNBU_KZhV6e4.jpg", "biography": null, "public_name": "Valerio Maggio", "guid": "78939915-227f-5f14-99fd-52e1eac75300", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/S3GNBU/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/SUDMDV/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/SUDMDV/", "attachments": []}], "Titanium3": [{"guid": "0f65723f-5a38-5830-953f-ec8cc0587f41", "code": "ZACM3E", "id": 60732, "logo": null, "date": "2025-04-24T10:15:00+02:00", "start": "10:15", "duration": "00:30", "room": "Titanium3", "slug": "pyconde-pydata-2025-60732-design-generate-deploy-contract-first-with-fastapi", "url": "https://pretalx.com/pyconde-pydata-2025/talk/ZACM3E/", "title": "Design, Generate, Deploy:  Contract-First with FastAPI", "subtitle": "", "track": "PyCon: MLOps & DevOps", "type": "Talk", "language": "en", "abstract": "This talk explores a contract-first approach to API development using the OpenAPI generator, a powerful tool for automating API generation from a standardized specification.  We will cover (1) what would you need to run to have a standard implementation of the FastAPI endpoints and data models; (2) how to customize the mustache templates that are used to generate the API stubs; (3) share some ideas how to customize the CLI and (4) how to maintain the contract and how to handle breaking changes to the contract. We will close the session with a discussion of the challenges of implementing the OpenAPI generator.", "description": "Let me share a story with you about two developers working at a Malt, Europe's leading freelance management system & marketplace. \r\n\r\nDev-1: Hi there! We have an issue on production. It seems that a request was sent where \u201ccompany id\u201d is not given. \r\nDev-2: Oops! But I thought we agreed on an anonymous mode?\r\nDev-1: That\u2019s actually a great idea. You mean that company id is not required? \r\nDev-2: Exactly!\r\nDev-1: Thanks! I will update the data model and push the changes! \r\n\r\nAs the conversation above suggests, sending data between two applications can easily fail if the requirements are not defined up front. Even for simple requests a lot of decisions have to be made: are the fields optional or mandatory? What about the returned payloads and their data types? Do we need default values? If we are not clear what we will expect (from the request) and what we will return (in the response), in the worst case, the request will fail and we spend time debugging, like above.  \r\n\r\nTo overcome this issue, we decided to move to a contract-first approach, where we define the exact request and response and generate the endpoints and data models from there using the OpenAPI generator. The OpenAPI generator is a powerful tool that allows you to automatically generate API client libraries, server stubs, documentation, and configuration from an OpenAPI specification, or a \u201ccontract\u201d between two applications. This contract forms the basis for generating the endpoint stubs for our python applications but also for the client models and code. Starting with the contract can significantly speed up the development process and improve the consistency of your API implementations.\r\n\r\nDuring this talk we will address the following topics: \r\n- The vanilla implementation that generates endpoints and data models: what would you need to run to have a first version of the FastAPI endpoints. If the setting allows for it, we would show a short demonstration. \r\n- How to use customisable templates: we customised the mustache templates that generated the endpoints and data models so we could generate our custom FastAPI app. Also we added examples to the generated data models as these were not available in the default implementation. \r\n- How to customise the CLI tool and ideas for setting up your CI pipeline: we will share some ideas how to customise the CLI and how we used it in our CI pipeline to prevent discrepancies between the contract and the generated stubs.\r\n- how to maintain the contract and how to handle breaking changes to the contract\r\nWe will close the session with a discussion of the challenges and benefits of implementing the OpenAPI Generator. While it offers standardisation and best practices, it can introduce additional complexity, especially with the tool still in beta. We'll share our experiences navigating this trade-off.", "recording_license": "", "do_not_record": false, "persons": [{"code": "NUFKS8", "name": "Dr. Evelyne Groen", "avatar": "https://pretalx.com/media/avatars/NUFKS8_y9ltPUp.jpg", "biography": "Hello! My name is Evelyne Groen and I am a Senior Machine Learning engineer at Malt. A long time ago I studied physics in Amsterdam, after which I moved to Berlin to discover the world of data science. Currently I'm working at Malt as a machine learning engineer exploring the boundaries between devops and data.", "public_name": "Dr. Evelyne Groen", "guid": "690caf28-069b-5a2a-88bc-63c37026da03", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/NUFKS8/"}, {"code": "93VSJ9", "name": "Kateryna Budzyak", "avatar": "https://pretalx.com/media/avatars/93VSJ9_AMl8Nas.jpeg", "biography": "Kat is a Senior Machine Learning Engineer at Malt, the freelancer marketplace, where she works in the relevancy and matching team. She has a background in bioinformatics and passionate about beautiful code.", "public_name": "Kateryna Budzyak", "guid": "eab4f7ea-31f2-5129-bfa2-5bbbea551790", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/93VSJ9/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/ZACM3E/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/ZACM3E/", "attachments": []}, {"guid": "5591814f-d8c6-57da-8ccd-48e5c2ae3835", "code": "AGY8CT", "id": 59751, "logo": null, "date": "2025-04-24T10:55:00+02:00", "start": "10:55", "duration": "00:30", "room": "Titanium3", "slug": "pyconde-pydata-2025-59751-serverless-orchestration-exploring-the-future-of-workflow-automation", "url": "https://pretalx.com/pyconde-pydata-2025/talk/AGY8CT/", "title": "Serverless Orchestration: Exploring the Future of Workflow Automation", "subtitle": "", "track": "PyCon: Programming & Software Engineering", "type": "Talk", "language": "en", "abstract": "Orchestration is a typical challenge in the data engineering world. Scheduling your data transformation jobs via CRON-jobs is cumbersome and error-prone. Furthermore, with an increasing number of jobs to manage it gets in-oversee able. Tools like Apache Airflow, Dagster, Luigi, and Prefect are known for addressing these challenges but often require additional resources or investment. With the advent of serverless orchestration tools, many of these disadvantages are mitigated, offering a more streamlined and cost-effective solution.\r\n\r\nThis session provides a comprehensive overview of combining serverless architecture with orchestration. We will start by defining the core concepts of orchestration and serverless technologies and discuss the benefits of integrating them. The talk will then analyze solutions available in the cloud vendor space. Attendees will leave with a well-rounded understanding of the tools and strategies available in serverless orchestration.", "description": "Orchestration is a typical challenge in the data engineering world. Scheduling your data transformation jobs via CRON-jobs is cumbersome and error-prone. Furthermore, with an increasing number of jobs to manage it gets in-oversee able. Tools like Apache Airflow, Dagster, Luigi, and Prefect are known for addressing these challenges but often require additional resources or investment. With the advent of serverless orchestration tools, many of these disadvantages are mitigated, offering a more streamlined and cost-effective solution.\r\nBeyond data engineering, serverless orchestration holds substantial potential for classical software engineering, especially as organizations explore serverless approaches for optimizing efficiency and reducing overhead.\r\n\r\nIn this talk you will explore:\r\n* Basic Introduction to Serverless Orchestration: \r\n      - What is orchestration about?\r\n      - What is serverless about?\r\n      - Why combining the two of them?\r\n* Offerings from Major Cloud Vendors:\r\n      - Analyzing solutions from leading cloud providers in the realm of serverless orchestration\r\n* Patterns and Solutions for Serverless Orchestration in Software Engineering:\r\n      - Exploring how serverless orchestration can be applied within classical software engineering contexts\r\n\r\nParticipants will leave this session equipped with a comprehensive understanding of the serverless orchestration landscape and its applications across different engineering disciplines.", "recording_license": "", "do_not_record": false, "persons": [{"code": "M3HBER", "name": "Tim Bossenmaier", "avatar": "https://pretalx.com/media/avatars/M3HBER_wZ0zzq9.jpg", "biography": "Tim is a Data Engineer at Cloudflight, based in Innsbruck. There he architects and builds modern data infrastructures in customer projects, ranging from streaming ETL pipelines to data catalogs and entire data platforms. His focus areas include software engineering, cloud technologies, data platform engineering, and DataOps. Tim is also a passionate Open Source contributor, actively working on projects like Apache StreamPipes and DataHub, among others.", "public_name": "Tim Bossenmaier", "guid": "9326d3e2-e8d5-57ff-8f8d-954dbcc5c87c", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/M3HBER/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/AGY8CT/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/AGY8CT/", "attachments": [{"title": "Slides", "url": "/media/pyconde-pydata-2025/submissions/AGY8CT/resources/2025_P_tXX3nWj.pdf", "type": "related"}]}, {"guid": "d1f58a21-5c1a-5be3-940a-b1e7b3b379cd", "code": "7CXSPN", "id": 60317, "logo": null, "date": "2025-04-24T11:35:00+02:00", "start": "11:35", "duration": "00:45", "room": "Titanium3", "slug": "pyconde-pydata-2025-60317-reinventing-streamlit", "url": "https://pretalx.com/pyconde-pydata-2025/talk/7CXSPN/", "title": "Reinventing Streamlit", "subtitle": "", "track": "PyCon: Django & Web", "type": "Talk (long)", "language": "en", "abstract": "Dreaming of creating sleek, interactive web apps with just Python? Streamlit is great for dashboards, but what if your needs go beyond that? Discover how Reflex.dev, a cutting-edge full-stack Python framework, lets you level up from dashboards to full-fledged web apps!", "description": "Have you ever wished you could build sleek, interactive web apps using just Python? Maybe you\u2019ve tried Streamlit and loved its simplicity. But maybe you also had the feeling that your dashboard is no longer a dashboard and your needs have outgrown Streamlit's data model.\r\n\r\nIn this talk, I\u2019ll introduce Reflex.dev, a powerful Python framework that makes web development effortless. Reflex combines the ease of Python with the flexibility of React, enabling you to create full-stack, interactive apps quickly.\r\n\r\nWe\u2019ll cover the basics: what Reflex.dev is and how it stacks up against familiar frameworks. Then, we\u2019ll dive into building a Streamlit-inspired app from scratch in Reflex.dev by creating an API compatibility wrapper. Along the way, I\u2019ll show you how Reflex can:\r\n\r\n- Help you build dynamic, shareable web apps with only Python.\r\n- Smoothly transition your Streamlit app into a stateful Reflex app.\r\n- Make the whole react ecosystem accessible.\r\n- Help testing your application.\r\n\r\nNo web development experience? No problem. This talk is for anyone who wants to create web apps without diving into JavaScript. We\u2019ll stick to Python and start from the ground up.\r\n\r\nBy the end, you\u2019ll leave with a working Streamlit clone and a powerful new tool in your Python arsenal. Let\u2019s make web development fun again!", "recording_license": "", "do_not_record": false, "persons": [{"code": "F7DPMR", "name": "Malte Klemm", "avatar": "https://pretalx.com/media/avatars/F7DPMR_5s216Wt.jpeg", "biography": "Malte is a seasoned Data Engineer with over 10 years at Blue Yonder, where he recently worked on  company's forecasting service. Holding a PhD from the University of Hertfordshire's Adaptive Systems Research Group, he explored Information Theory in the context of Multi-Agent Systems. Beyond tech, Malte is passionate about great UX, typography, and baking the perfect loaf of bread.", "public_name": "Malte Klemm", "guid": "b7fc7fbb-83ba-55e1-86a4-893c3f00beb4", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/F7DPMR/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/7CXSPN/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/7CXSPN/", "attachments": [{"title": "Slides", "url": "/media/pyconde-pydata-2025/submissions/7CXSPN/resources/Reinve_jT4Zrqa.pdf", "type": "related"}]}, {"guid": "595d4dae-5795-59d5-9aa4-c5158cc6f651", "code": "KZKT9W", "id": 60316, "logo": null, "date": "2025-04-24T14:20:00+02:00", "start": "14:20", "duration": "00:30", "room": "Titanium3", "slug": "pyconde-pydata-2025-60316-duplicate-code-dilemma-unlocking-automation-with-open-source", "url": "https://pretalx.com/pyconde-pydata-2025/talk/KZKT9W/", "title": "Duplicate Code Dilemma: Unlocking Automation with Open Source!", "subtitle": "", "track": "PyCon: Programming & Software Engineering", "type": "Talk", "language": "en", "abstract": "\"Don't Repeat Yourself\" \u2013 a phrase that we have all heard many times. In this talk, we will have an overview how to deal with code duplication and how open-source template libraries such as Copier can assist us in managing similarly structured repositories. Furthermore, we will explore how code updates can be automated with the help of open-source libraries like Renovate Bot. By the end of this session, you will gain insights into these solutions while also questioning whether they truly eliminate repetition or merely contribute to another cycle of automation.", "description": "\u201cDon\u2019t Repeat Yourself\u201d (DRY) is one of the first principles that every programmer encounters in the early stages of their coding journey. Some of us even had to learn it the hard way. We promised ourselves to avoid repetitive code to never again deal with the extensive refactoring required for every small change.\r\n\r\nThis simple principle has found a fundamental place in every programmer's heart. It may also be the reason why, from time to time, every programmer doubts their code and begins to refactor it in the early stages of coding.\r\n\r\nThis talk provides an overview of different solutions for preventing code repetition. We will start with the most common solutions, such as using git commands, and then explore more intermediate approaches for managing similarly structured repositories with the help of open-source template libraries such as Copier and Cookiecutter. Finally, we will address a more complex problem and examine how to automate updates using open-source tools like Renovate Bot.\r\n\r\nAs a takeaway, participants will gain insights into various solutions and a glimpse into the usability of each open-source library. Participants are also encouraged to reconsider the entire process: Are these solutions truly preventing repetitive code, or are we merely caught in an endless cycle of automation?", "recording_license": "", "do_not_record": false, "persons": [{"code": "PSWRAP", "name": "Raana Saheb-Nassagh", "avatar": "https://pretalx.com/media/avatars/PSWRAP_qn1d2rf.jpeg", "biography": "Hey, I am Raana!\r\n\r\nSome days, I am a Data Scientist with a passion for patterns, and other days, I am a Data Engineer with a love for code refactoring. I am also interested in methods for team building.\r\n\r\nBesides the nerdy stuff, I enjoy board games, bouldering, and singing in a choir!", "public_name": "Raana Saheb-Nassagh", "guid": "2feb472a-2f0e-5ee9-ae0a-6cb8152b69b8", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/PSWRAP/"}], "links": [{"title": "Slides&Sources", "url": "https://speakerdeck.com/raanasn", "type": "related"}], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/KZKT9W/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/KZKT9W/", "attachments": []}, {"guid": "8ce96ed1-1216-5a6a-a3f6-e508364a59b0", "code": "DEHZHK", "id": 60444, "logo": null, "date": "2025-04-24T15:00:00+02:00", "start": "15:00", "duration": "00:45", "room": "Titanium3", "slug": "pyconde-pydata-2025-60444-distributed-file-systems-made-easy-with-python-s-fsspec", "url": "https://pretalx.com/pyconde-pydata-2025/talk/DEHZHK/", "title": "Distributed file-systems made easy with Python's fsspec", "subtitle": "", "track": "PyData: Data Handling & Engineering", "type": "Talk (long)", "language": "en", "abstract": "The cloud native revolution has impacted all aspects of engineering, and data engineering is not exempt. One of the ongoing challenges in the data engineering world remains the local and distributed cloud native storage. In this talk we\u2019ll explore working with distributed file systems in Python, through an intro to fsspec: a popular python library that is well-positioned to address the growing challenge of interacting with storage systems of different kinds in a consistent way.\r\n\r\nIn this talk we\u2019ll show hands-on examples of working with fsspec with some of the most popular data tools in the Python community: Pandas, Tensorflow and PyArrow. We\u2019ll demonstrate a real world implementation of fsspec and how it provides easy extensibility through open source tooling.\r\n\r\nYou\u2019ll come away from this session with a better understanding for how to implement and extend fsspec to work with different cloud native storage systems.", "description": "### **1. Setting the Stage: Local vs. Distributed Storage (5 minutes)**  \r\n- **What\u2019s the Big Deal with Storage?**  \r\n  - First, let\u2019s talk about the shift from local storage (where we keep files on our own machines) to cloud-native storage (where data is spread across servers in the cloud).  \r\n  - This shift is awsome but comes with new challenges: distributed systems can be tricky to work with, especially when you need to access them in a consistant way.  \r\n\r\n### **2. Enter fsspec: A Game Changer for File Systems (10 minutes)**  \r\n- **What is fsspec?**  \r\n  - fsspec is a Python library that makes working with any kind of file system\u2014whether it's local, in the cloud, or on a distributed system\u2014much easier.  \r\n  - It does this by giving us a unified way to interact with storage, no matter where the files actaully live.  \r\n\r\n- **Why is fsspec Awesome?**  \r\n  - It simplifies file operations (like opening and reading files) across different storage systems, saving us time and mental enery.  \r\n  - Plus, it\u2019s open-source, which means you can extend it and make it work for your own unique storage setup.  \r\n\r\n### **3. fsspec in Action: How It Works with Popular Python Tools (15 minutes)**  \r\n\r\n#### **A. Using fsspec with Pandas**  \r\n- **Pandas & fsspec:**  \r\n  - If you work with Pandas, you\u2019re probably familiar with loading and saving data. fsspec helps make this process smoother by letting you pull data from cloud storage (like AWS S3) with no fuss.  \r\n  - We\u2019ll see how this works in practise, making it easy to work with large datasets in the cloud. \r\n\r\n#### **B. Using fsspec with TensorFlow**  \r\n- **TensorFlow & fsspec:**  \r\n  - If you\u2019re building machine learning models, TensorFlow needs to access training data and models, sometimes stored in the cloud.  \r\n  - With fsspec, TensorFlow can seamlessly interact with cloud storage, making your ML pipelines more streamlined and less frustraiting.  \r\n\r\n#### **C. Using fsspec with PyArrow**  \r\n- **PyArrow & fsspec:**  \r\n  - PyArrow is great for high-performance data processing. When working with big data files like Parquet, fsspec makes it easy to load and save them from cloud storage without missing a beat.  \r\n\r\n### **4. Extending fsspec: Building Your Own Solutions (5 minutes)**  \r\n- **What if I Need Something Custom?**  \r\n  - Sometimes, you need to work with storage systems that aren\u2019t \u201cout of the box.\u201d The cool part about fsspec is that it\u2019s highly extensible.  \r\n  - I\u2019ll walk through how you can easily extend fsspec to work with your own custom storage systems, using a real-world example of how we did this.  \r\n\r\n### **5. Wrap-Up & Key Takeaways (5 minutes)**  \r\n- **The Big Picture:**  \r\n  - fsspec is a simple yet powerful tool for making cloud-native storage work seamlessly with Python data tools like Pandas, TensorFlow, and PyArrow.  \r\n  - It\u2019s the tool you didn\u2019t know you needed to simplify your cloud storage tasks.  \r\n\r\n- **Final Thought:**  \r\n  - With fsspec, working with distributed storage doesn\u2019t have to be hard. It makes everything feel like you\u2019re working with local files, even when they\u2019re scattered across the cloud.  \r\n\r\n### **6. Q&A Session (5 minutes)**", "recording_license": "", "do_not_record": false, "persons": [{"code": "SYPCGN", "name": "Einat Orr", "avatar": "https://pretalx.com/media/avatars/SYPCGN_wS6Z8hb.jpeg", "biography": "Dr. Einat Orr has 20+ years of experience building R&D organizations and leading the technology vision at multiple companies, the latest being Similarweb, that IPO in NYSE last May.  Currently she serves as  Co-founder and CEO of Treeverse, the company behind lakeFS, an open source platform that delivers a git-like experience to object-storage based data lakes. She received her PhD. in Mathematics from Tel Aviv University, in the field of optimization in graph theory.", "public_name": "Einat Orr", "guid": "7de660e5-65b3-5bc5-a271-33983e0599bb", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/SYPCGN/"}, {"code": "ZG33XS", "name": "Barak Amar", "avatar": "https://pretalx.com/media/avatars/ZG33XS_aH8SBfu.jpg", "biography": "Barak Amar is a principal engineer at lakeFS with over 25 years of experience spanning startups and enterprise environments. He specializes in distributed systems and backend architecture, designing scalable solutions. He is passionate about programming languages and contributes to open-source projects.\r\n\r\nAs part of the founding team at lakeFS, he has helped build the product while recently also gaining experience in product management.\r\n\r\nWhen not on the keyboard, he is an avid runner who maintains a regular training schedule.", "public_name": "Barak Amar", "guid": "4e3ae651-5559-565a-9ea1-bd615a89f0c4", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/ZG33XS/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/DEHZHK/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/DEHZHK/", "attachments": []}, {"guid": "4cbaf50e-af4d-5894-91d1-6730ba37576f", "code": "EDJ8N7", "id": 60738, "logo": null, "date": "2025-04-24T16:15:00+02:00", "start": "16:15", "duration": "00:30", "room": "Titanium3", "slug": "pyconde-pydata-2025-60738-learnings-from-migrating-a-flask-app-to-fastapi", "url": "https://pretalx.com/pyconde-pydata-2025/talk/EDJ8N7/", "title": "Learnings from migrating a Flask app to FastAPI", "subtitle": "", "track": "PyCon: Django & Web", "type": "Talk", "language": "en", "abstract": "FastAPI has been constantly growing in popularity during the last years. A lot of this growth is driven by its relative simplicity and ease-of-use. In this talk, we'll discuss some practical insights into building a FastAPI application, based on my experience of migrating an existing Flask prototype to FastAPI. \r\n\r\nWe'll explore how FastAPI's core features like Pydantic integration and dependency injection can improve API development, while also talking about the drawbacks of FastAPI.", "description": "Building HTTP APIs has become a normal part of the work as a software or data engineer within the last 10 to 15 years. In the Python ecosystem Flask was the only option to build an HTTP API for many years. After its initial release in 2018 FastAPI quickly became a serious alternative to build such APIs with Python.\r\n\r\nIn this talk I will share my experiences from migrating an existing HTTP API built with flask to a FastAPI-based API. \r\n\r\nWe will discuss the following topics: \r\n\r\n- Why did we migrate at all?\r\n- Data modeling\r\n- Async is overrated\r\n- Problems you **will** encounter \r\n- Migration strategy\r\n\r\nThe talk will show you the practical differences between developing APIs with FastAPI or Flask.\r\n\r\nMaterial: https://github.com/orgarten/pycon-de-2025/blob/main/2025-pycon-learnings-from-migrating-flask-app-to-fastapi.pdf", "recording_license": "", "do_not_record": false, "persons": [{"code": "7H3JY8", "name": "Orell Garten", "avatar": "https://pretalx.com/media/avatars/7H3JY8_5lKuNhM.jpg", "biography": "Freelance backend software and data engineer. I try to make data behave the way it needs to by finding data quality problem and building custom software solutions to automate data processing.", "public_name": "Orell Garten", "guid": "c8748514-701f-5f1b-ab33-15b37bed1df8", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/7H3JY8/"}], "links": [{"title": "Presentation", "url": "https://github.com/orgarten/pycon-de-2025/blob/main/2025-pycon-learnings-from-migrating-flask-app-to-fastapi.pdf", "type": "related"}], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/EDJ8N7/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/EDJ8N7/", "attachments": []}, {"guid": "aecb218a-c075-53c8-a697-4e2e08471ddc", "code": "XLZQFA", "id": 61144, "logo": null, "date": "2025-04-24T16:55:00+02:00", "start": "16:55", "duration": "00:45", "room": "Titanium3", "slug": "pyconde-pydata-2025-61144-lessons-learned-in-bringing-a-rag-chatbot-with-access-to-50k-diverse-documents-to-production", "url": "https://pretalx.com/pyconde-pydata-2025/talk/XLZQFA/", "title": "Lessons learned in bringing a RAG chatbot with access to 50k+ diverse documents to production", "subtitle": "", "track": "PyData: Generative AI", "type": "Talk (long)", "language": "en", "abstract": "Retrieval-Augmented Generation (RAG) chatbots are a key use case of GenAI in organizations, allowing users to conveniently access and query internal company data. A first RAG prototype can often be created in a matter of days. But why are the majority of prototypes still in the pilot stage? [\\[1\\]](https://www2.deloitte.com/content/dam/Deloitte/us/Documents/consulting/us-state-of-gen-ai-q3.pdf)\r\n\r\nIn this talk we share our insights from developing a production-grade chatbot at Merck. Our RAG chatbot for R&D experts accesses over 50,000 documents across numerous SharePoint sites and other sources. We identified three technical key success factors:\r\n1. Building a robust data pipeline that syncs documents from source systems and that handles enterprise features such as replicating user permissions. \r\n2. Developing a chatbot workflow from user question to answer with retrieval components such as hybrid search and reranking\r\n3. Establishing a comprehensive evaluation framework with a clear optimization metric.\r\n\r\nWe think that many of these lessons are broadly applicable to RAG chatbots, making this talk valuable for practitioners aiming to implement GenAI solutions in business contexts.", "description": "Building a prototype RAG chatbot with frameworks like LangChain can be straightforward. However, scaling it into a production-grade application introduces complex challenges. In this talk, we share our lessons learned from developing a RAG chatbot designed to assist research and development (R&D) experts.\r\n\r\nOur chatbot was developed to effectively handle and provide access to a large collection of unstructured knowledge, consisting of over 50,000 documents stored across more than 20 SharePoint sites and other sources. We faced significant hurdles in:\r\n- **Data Pipeline Engineering**: Crafting a modular and scalable pipeline capable of periodically syncing documents, handling dynamic user permissions, and efficiently processing large volumes of unstructured data.\r\n- **RAG Design and Prompting Strategies**: Addressing challenges in document chunking, citation integration, reranking retrieved results, and applying permission and PII filters to ensure compliance and accuracy in responses.\r\n- **Evaluation Framework Development**: Implementing an effective testing strategy without the availability of static ground truth data. We employed automated testing with frameworks like pytest, utilized LLM-as-a-judge, and integrated tracing to iteratively refine our dataset and maintain high answer quality.\r\n- **User Adoption**: Driving user adoption through onboarding training and ongoing engagement, such as regular office hours and feedback mechanisms.\r\n\r\nWe emphasize the importance of applying data science principles to GenAI projects:\r\n- **Start Simple and Iterate**: Begin with a basic implementation as a baseline and iteratively enhance functionality based on testing and user feedback.\r\n- **Test-Driven Development**: Identify key test scenarios early and use them to drive development, ensuring that improvements are measurable and aligned with growing user needs.\r\n- **Focus on Key Metrics**: Establish clear metrics to optimize against, aiding in making informed decisions throughout the development process.\r\n  \r\n**Main Takeaways for the Audience:**\r\n- Understand the critical role of robust, modular data pipelines in handling dynamic and unstructured data sources for LLM applications.\r\n- Learn strategies for developing effective evaluation frameworks in complex domains where traditional ground truth data may be lacking.\r\n- Gain insights into advanced RAG design techniques that enhance chatbot performance and reliability.\r\n- Recognize the substantial data engineering and software development efforts required to transition a prototype to a production-grade LLM solution.\r\n\r\nBy sharing our experiences, attendees will gain practical insights into deploying robust RAG chatbots, transforming a functional prototype into a reliable, scalable application that fulfills enterprise requirements.", "recording_license": "", "do_not_record": false, "persons": [{"code": "GGSPQV", "name": "Bernhard Sch\u00e4fer", "avatar": "https://pretalx.com/media/avatars/GGSPQV_sqrfCNB.jpg", "biography": "Bernhard is a Senior Data Scientist at Merck with a PhD in deep learning and over 5 years of experience in applying data science and data engineering within different industries. For more information you can connect with him on LinkedIn. \ud83d\ude42", "public_name": "Bernhard Sch\u00e4fer", "guid": "9d1c14ab-4f68-59d3-9956-689c2c264de4", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/GGSPQV/"}, {"code": "8MXU3M", "name": "Nico Mohr", "avatar": "https://pretalx.com/media/avatars/8MXU3M_uUhX05O.jpg", "biography": "Nico works as a Senior Machine Learning Engineer at Merck, focusing on developing applications powered by LLMs. His background bridges software engineering and data science, with experience spanning classical data science, computer vision, and discrete optimization, where he has deployed several machine learning solutions in production environments.", "public_name": "Nico Mohr", "guid": "7f26f896-c4f1-5b86-a077-3af8d278e4e6", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/8MXU3M/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/XLZQFA/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/XLZQFA/", "attachments": [{"title": "Slides Lessons learned Productive RAG chatbot", "url": "/media/pyconde-pydata-2025/submissions/XLZQFA/resources/PyData_g1W0Jw1.pdf", "type": "related"}]}], "Helium3": [{"guid": "4ce04ffe-9055-545b-b512-155ae9173321", "code": "SZFRRA", "id": 61837, "logo": null, "date": "2025-04-24T10:15:00+02:00", "start": "10:15", "duration": "00:30", "room": "Helium3", "slug": "pyconde-pydata-2025-61837-unforgettable-that-s-what-you-are-evaluating-machine-unlearning-and-forgetting", "url": "https://pretalx.com/pyconde-pydata-2025/talk/SZFRRA/", "title": "Unforgettable, that's what you are: Evaluating Machine Unlearning and Forgetting", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Statistics", "type": "Talk", "language": "en", "abstract": "Can deep learning/AI models forget? In this talk, you'll explore the realm of machine unlearning, where researchers and practitioners aim to remove memorized examples from machine learning models. This is relevant for training increasingly overparameterized models and growing GDPR/Privacy concerns with large scale model development and use.", "description": "Deep learning memorization is a known phenomena, where deep learning / AI models memorize parts of their training dataset. This happens often for repeated examples, novel examples and occurs more often in overparameterized models.\r\n\r\nThis presents problems for guiding machine learning behavior, requiring much effort in guardrails and output monitoring, as well as questioning whether the models can be GDPR-compliant (i.e. the right to be forgotten).\r\n\r\nA growing area of research on machine unlearning or machine forgetting has emerged to investigate ways a model might unlearn or forget particular memorized examples. In this talk, you'll learn about the field of machine unlearning and related topics like data anonymization to evaluate exactly what's truly unforgettable. Jokes aside: you'll have some practical take-aways to apply to your work in data and machine learning development.", "recording_license": "", "do_not_record": false, "persons": [{"code": "K9B9W9", "name": "Katharine Jarmul", "avatar": "https://pretalx.com/media/avatars/K9B9W9_gUIiN9l.jpg", "biography": "Katharine Jarmul is a privacy activist and an internationally recognized data scientist and lecturer who focuses her work and research on privacy and security in data science and machine learning. You can follow her work via her newsletter, Probably Private (https://probablyprivate.com) or in her recently published book, Practical Data Privacy (O'Reilly 2023) now also available in German as Data Privacy in der Praxis.", "public_name": "Katharine Jarmul", "guid": "ffd0574e-11b0-52d1-a847-6a92c5e1ec5e", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/K9B9W9/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/SZFRRA/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/SZFRRA/", "attachments": []}, {"guid": "83285569-3cd1-5b9b-b1f1-c6935e969267", "code": "UXTCZC", "id": 61752, "logo": null, "date": "2025-04-24T10:55:00+02:00", "start": "10:55", "duration": "00:30", "room": "Helium3", "slug": "pyconde-pydata-2025-61752-oh-no-users-love-my-genai-prototype-and-want-to-use-it-more", "url": "https://pretalx.com/pyconde-pydata-2025/talk/UXTCZC/", "title": "Oh, no! Users love my GenAI-Prototype and want to use it more.", "subtitle": "", "track": "PyCon: MLOps & DevOps", "type": "Talk", "language": "en", "abstract": "Demos and prototypes for generative AI (GenAI) projects can be quickly created with tools like Streamlit, offering impressive results for users within hours. However, scaling these solutions from prototypes to robust systems introduces significant challenges. As user demand grows, hacks and workarounds in tools like Streamlit lead to unreliability and debugging frustrations. This talk explores the journey of overcoming these obstacles, evolving to a stable tech stack with Qdrant, Postgres, Litellm, FastAPI, and Streamlit. Aimed at beginners in GenAI, it highlights key lessons.", "description": "Demos and prototypes for projects with generative AI can be quickly put together: an API key from the preferred model provider, some source code from the online tutorial and a few small adjustments suffice. Thanks to Streamlit and the like, even beginners can achieve impressive results that can be used by users within a few hours.\r\n\r\nBut what happens when users actually like the solution? When demos and prototypes need to be expanded and connected to other systems? What if the number of users continues to rise?\r\n\r\nIt is quite impressive how far you can bend Streamlit to achieve things it was probably never meant for. But at a certain point, you pay for the hacks and workarounds with unreliability and frustrating debugging.\r\n\r\nThe speakers repeatedly reached this point in various projects and delayed the necessary architecture discussion for too long. So the path was longer and more painful than it should have been \u2013 but in the end, thanks to the wide range of open-source (Python) projects, a flexible and stable system was created. Our current tech stack includes Qdrant, Postgress, Litellm and FastAPI \u2013 as well as OpenWebUI, and of course Streamlit. \r\n\r\nThanks to modularization, we now have a stable system that we can easily run locally but also deploy in an enterprise environment. Nevertheless, we have retained a great deal of flexibility.\r\n\r\nIn our talk, we report on the trials and tribulations along the way. We report on the challenges that led to decisions for various components. We disclose which problems we were able to solve and which new problems arose.\r\n\r\nThe talk is aimed primarily at those who are taking their first steps with generative AI or have already developed their first demonstrators or prototypes. \r\n\r\nStructure:\r\n\r\n(1) GenAI applications in Streamlit are cool\r\n(2) The challenges on the way from prototype to productive deployment\r\n(3) Ramming heads through walls\r\n(4) The path to a flexible but stable stack\r\n(5) What still plagues us", "recording_license": "", "do_not_record": false, "persons": [{"code": "UZ9VMC", "name": "Thomas Prexl", "avatar": "https://pretalx.com/media/avatars/UZ9VMC_PQp5R4u.jpg", "biography": "Thomas is an expert in tech transfer and startup development, with a career focused on fostering innovation and bridging the gap between research and industry. He has led initiatives like accelerator programs, innovation networks, and hackathons. A Generative AI enthusiast and co-founder of neunzehn innovations, Thomas helps companies leverage AI technologies. He holds a doctorate from the University of Basel and is a dedicated advisor, educator, and speaker in the startup ecosystem.", "public_name": "Thomas Prexl", "guid": "81bc4e70-31b4-534d-a4fa-a18fae5359fb", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/UZ9VMC/"}, {"code": "SGPHNQ", "name": "Frank Rust", "avatar": "https://pretalx.com/media/avatars/SGPHNQ_lYMnbDJ.jpg", "biography": "Frank is deeply passionate about technological advancements and a co-founder of neunzehn innovations, a company specializing in AI solutions. His professional background combines entrepreneurial experience\u2014having established an innovation and strategy consultancy focused on strategy and deep tech\u2014with several years at a major software corporation. Throughout his tenure in the software industry, he contributed to multiple product and service launches, working across various teams to bring new offerings to market.\r\nOutside the office, he enjoys discovering new horizons in the camper van.", "public_name": "Frank Rust", "guid": "6b355da9-4d17-53d7-af9c-3cad78a46200", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/SGPHNQ/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/UXTCZC/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/UXTCZC/", "attachments": []}, {"guid": "8fcc45b3-3121-5c60-96a9-90906cffd183", "code": "7CL3KS", "id": 61276, "logo": null, "date": "2025-04-24T11:35:00+02:00", "start": "11:35", "duration": "00:45", "room": "Helium3", "slug": "pyconde-pydata-2025-61276-bridging-the-gap-unlocking-sap-data-for-data-lakes-with-python-and-pyspark-via-sap-datasphere", "url": "https://pretalx.com/pyconde-pydata-2025/talk/7CL3KS/", "title": "Bridging the gap: unlocking SAP data for data lakes with Python and PySpark via SAP Datasphere", "subtitle": "", "track": "PyData: Data Handling & Engineering", "type": "Talk (long)", "language": "en", "abstract": "SAP's data often remains locked away, hindering the creation of a complete data picture. This talk presents a hands-on proof of concept leveraging SAP Datasphere, Python and PySpark to bridge an Azure-based, data mesh-inspired open data lake with a centralized SAP BI environment. \r\n\r\nThis presentation will delve into the architecture of SAP Datasphere and its integration interfaces with Python. It will explore network integration, authentication, authorization and resource management options, as well as data integration patterns. The presentation will summarize the evaluated features and limitations discovered during the PoC.", "description": "In many enterprises relying on SAP ERP systems, a wealth of valuable master data remains trapped within a closed ecosystem. This creates significant obstacles when striving for a comprehensive, 360\u00b0 view, especially when integrating with modern, open data lakes built on platforms like Azure and designed around data mesh principles. This talk presents a practical PoC that tackles this challenge head-on, utilizing SAP Datasphere as the key integration point.\r\n\r\nOutline:\r\n\r\n1. The challenge: navigating sap's data silos and the pursuit of a unified view\r\n* The section outlines the enterprise data landscape of RATIONAL where valuable master data resides within SAP\u2019s traditionally closed ecosystem, hindering data democratization and the creation of a comprehensive, 360\u00b0 operational view. This situation is frequently encountered, particularly among German manufacturers.\r\n* The inherent conflict between the open, distributed nature of data lakes (especially those built on data mesh principles) and the centralized, closed nature of traditional SAP BI environments is discussed.\r\n\r\n2. Solution overview: leveraging sap datasphere as the integration layer\r\n* An introduction to sap datasphere and its capabilities is provided, with a focus on its ability to connect with non-SAP systems.\r\n* This part explains how datasphere was chosen as the central integration layer for the proof of concept and its role in enabling bi-directional data flow between SAP and the open data lake.\r\n\r\n3. Architecture of SAP Datasphere\r\n* Introduction in architecture of SAP Datasphere and role of underlying SAP HANA database\r\n* Explanation of openSQL schema as key integration option\r\n\r\n4. Security first: exploring network integration, authentication and authorization options\r\n* This section details the evaluation of network connectivity options between the Azure services like Azure Databricks, PostgresQL, ADLS and SAP Datasphere\r\n* The methods used to authenticate Python and Pypark to SAP datasphere are explained\r\n* The implementation and evaluation of data authorization mechanisms within SAP Datasphere are described\r\n\r\n5. Python and PySpark integration\r\n* Available interfaces for python integration (ODBC/JDBC, OData), their features and limitations\r\n* Explanation of practical data integration patterns implemented within the poc for extracting data from sap and loading it into the data lake for full and delta load scenarios\r\n\r\n6. Reflecting PoC: summary and key learnings\r\n* This section summarizes the core findings and lessons learned from the PoC, particularly regarding security and software quality best practices\r\n* A hint for the SAP open data alliance launched in 2023\r\n\r\nMain takeaways:\r\n* An understanding of SAP Datasphere's architecture and its potential for integrating non-SAP, open-source technologies like Python and PySpark\r\n* Knowledge of current features and limitations of SAP Datasphere in the area of data integration with the open source world", "recording_license": "", "do_not_record": true, "persons": [{"code": "7LKG3C", "name": "Rostislaw Krassow", "avatar": "https://pretalx.com/media/avatars/7LKG3C_FNhEV6V.JPG", "biography": "Rostislaw, a data architect at RATIONAL AG, specializes in distributed databases, the Apache Hadoop ecosystem and Azure cloud. He leverages his expertise to oversee the company's Data & Analytics platform, where his daily work involves reconciling diverse stakeholder perspectives to deliver optimal solutions.", "public_name": "Rostislaw Krassow", "guid": "71ebf006-4a4d-5dee-9f40-ea3970e419c9", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/7LKG3C/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/7CL3KS/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/7CL3KS/", "attachments": []}, {"guid": "c5e6e5a2-ff56-5c3d-99f0-3282fda87e03", "code": "TXKLWR", "id": 61134, "logo": null, "date": "2025-04-24T14:20:00+02:00", "start": "14:20", "duration": "00:30", "room": "Helium3", "slug": "pyconde-pydata-2025-61134-analyze-data-easily-with-duckdb-and-the-implications-on-data-architectures", "url": "https://pretalx.com/pyconde-pydata-2025/talk/TXKLWR/", "title": "Analyze data easily with duckdb - and the implications on data architectures", "subtitle": "", "track": "PyData: Data Handling & Engineering", "type": "Talk", "language": "en", "abstract": "duckdb is increasingly becoming a universal tool for accessing and analyzing data. In this talk I will show with slides and live demo what duckdb is capable of and will dive deeper in how it will influence modern data architectures.", "description": "duckdb - a lightweight database with a focus on data analysis and a fast query engine that can be used in a variety of ways:   \r\n- Analyze data, stored on your own hard drive or somewhere on the Internet, in the browser with SQL? No problem  \r\n- Quickly check all the JSON files in S3 using SQL? Nothing could be easier  \r\n- A huge parquet file, bigger than my working memory. And now I have to analyze it locally. Easy!  \r\n- Read csv from blob storage, process and save in a Postgres database. Just one command\r\n \r\nduckdb is developing more and more into a universal tool for accessing and analyzing data.\r\n\r\nIn this talk I will show with slides and a live demo why it is so popular and why it belongs in the toolbox of every data scientist, ML engineer or data engineer. \r\n\r\nBut I will not stop at the useful tooling. I will dive deeper into the implications for data and software architectures that arise from the rise of the embedded OLAP systems like duckdb. I will especially focus on both moving the data closer to the user for faster analytics but also on accessing data without the explicit need to move it. \r\n\r\nWhat you learn and see can be used immediately in your day-to-day work.", "recording_license": "", "do_not_record": false, "persons": [{"code": "8QDF3Q", "name": "Matthias Niehoff", "avatar": "https://pretalx.com/media/avatars/8QDF3Q_69zfLRB.jpg", "biography": "Matthias Niehoff works as Head of Data and Data Architect for codecentric AG and supports customers in the design and implementation of data architectures. His focus is on the necessary infrastructure and organization to help data and ML projects succeed.", "public_name": "Matthias Niehoff", "guid": "6aa639a8-3661-5755-9bfb-f0c4cb3576d5", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/8QDF3Q/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/TXKLWR/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/TXKLWR/", "attachments": [{"title": "Slides", "url": "/media/pyconde-pydata-2025/submissions/TXKLWR/resources/duckdb_kGLo0as.pdf", "type": "related"}]}, {"guid": "c88a95d2-6bb7-5b9f-9e93-9ca1cfc21baf", "code": "HSFR7A", "id": 61429, "logo": null, "date": "2025-04-24T15:00:00+02:00", "start": "15:00", "duration": "00:45", "room": "Helium3", "slug": "pyconde-pydata-2025-61429-scraping-lego-for-fun-a-hacky-dive-into-dynamic-data-extraction", "url": "https://pretalx.com/pyconde-pydata-2025/talk/HSFR7A/", "title": "Scraping LEGO for Fun: A Hacky Dive into Dynamic Data Extraction", "subtitle": "", "track": "PyData: Data Handling & Engineering", "type": "Talk (long)", "language": "en", "abstract": "Unlock the full potential of modern web scraping by combining Python, Scrapy, and Playwright to extract data from dynamic, JavaScript-heavy sites\u2014exemplified by LEGO product pages. This talk introduces Model Context Protocol (MCP) servers for orchestrating advanced data fetching, refining CSS selectors, and integrating Large Language Models for automated code suggestions. Learn how to scale ethically, handle concurrency, and respect site policies, while maintaining flexible, maintainable pipelines for diverse use cases from research to robotics.", "description": "# Advanced Web Scraping: From LEGO to Production\r\n\r\nToday's web landscape is teeming with JavaScript-heavy content, complex layouts, and sometimes opaque data structures. But what if you could reliably scrape rich product information\u2014images, specs, descriptions\u2014from modern e-commerce sites without hitting constant roadblocks? This session tackles advanced scraping with Python, Scrapy, and Playwright, exemplified by data extraction from LEGO product pages. We'll explore a \"grey hat\" perspective\u2014applying a slightly \"hacky\" mindset\u2014while stressing practical ethics, performance considerations, and compliance with site policies.\r\n\r\n## Outline\r\n\r\n### 1. Introduction: The Hacky Spirit vs. Ethical Constraints\r\n- Why scrape LEGO?\r\n- Setting boundaries: terms of service, rate limiting, and disclaimers\r\n- When \"scraping for fun\" crosses into potential legal pitfalls\r\n\r\n### 2. Scraping Tech Stack Overview\r\n- Scrapy for structured crawling and item pipelines\r\n- Playwright for rendering JavaScript and handling dynamic elements\r\n- Comparison to traditional HTML-only approaches\r\n- Project structure, environment setup, and practical tips\r\n\r\n### 3. Spiders in Action\r\n- Product Spider: Extracting core product data (ID, name, specifications, multiple images)\r\n- Gallery Spider: Navigating hidden galleries, handling tricky JS-based carousels, and filtering unwanted images\r\n- Ensuring consistent output (JSON or database ingestion)\r\n\r\n### 4. Model Context Protocol (MCP) Integration\r\n- Definition: Leveraging specialized helper servers for orchestrating data fetching, refining selectors, and automating debugging\r\n- Chaining Large Language Models: Code suggestions, auto-generation of selectors, and reactive error handling\r\n- Example workflow: \"Broken selector? Ask the MCP server for an LLM-aided fix\"\r\n\r\n### 5. Performance & Scale\r\n- Polite but robust concurrency: balancing speed and TOS compliance\r\n- Handling large link lists, incremental updates, and site changes\r\n- Monitoring and logging for reliability, debugging, and optimization\r\n\r\n### 6. Ethics & Privacy\r\n- Respecting site ownership, disclaimers, and usage limits\r\n- Storing scraped data securely and avoiding personal information\r\n- A discussion of \"grey hat\" territory: testing site vulnerabilities without exploiting them\r\n\r\n### 7. Use Cases & Extensions\r\n- Research software engineering: building reproducible data sets\r\n- Robotics and embedded: offline or partial data ingestion for classification or motion planning\r\n- Future directions: advanced concurrency, containerization, and HPC\r\n\r\n### 8. Demo & Q&A\r\n- Live snippet showing an MCP-powered spider reacting to a changed DOM structure\r\n- Q&A session on bridging the gap between hackery and best practices\r\n\r\n## Key Takeaways\r\n- Techniques for scraping dynamic, JS-heavy sites using Python, Scrapy, and Playwright\r\n- Practical \"hacky\" methods balanced by responsible, 'ethical approaches'\r\n- Introduction to Model Context Protocol servers for automated code refinement\r\n- Scalable patterns for data handling, from small tests to large-scale deployments\r\n\r\nWhether you're a data engineer, hobbyist, or researcher, this talk provides a robust (and slightly subversive) recipe for capturing essential data from the wild world of modern websites\u2014without crossing into unethical or unlawful territory.", "recording_license": "", "do_not_record": false, "persons": [{"code": "YUHH8E", "name": "Peter Lodri", "avatar": "https://pretalx.com/media/avatars/YUHH8E_zozAQni.JPG", "biography": "Hacker-maker, specialising in system infiltration and enhancement. Expert in reverse engineering, distributed systems architecture, and AI integration. Proven track record in high-stakes technical operations and system security.", "public_name": "Peter Lodri", "guid": "e48db2d9-2345-5684-9c58-911205671033", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/YUHH8E/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/HSFR7A/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/HSFR7A/", "attachments": []}, {"guid": "12f12828-63fc-51b4-80d4-7bf8e37338c2", "code": "BAASYV", "id": 66498, "logo": null, "date": "2025-04-24T16:15:00+02:00", "start": "16:15", "duration": "00:30", "room": "Helium3", "slug": "pyconde-pydata-2025-66498-optimizing-in-the-python-ecosystem-powered-by-gurobi", "url": "https://pretalx.com/pyconde-pydata-2025/talk/BAASYV/", "title": "Optimizing in the Python Ecosystem \u2013 Powered by Gurobi", "subtitle": "", "track": "PyCon: Python Language & Ecosystem", "type": "Sponsored Talk", "language": "en", "abstract": "Join us as we explore integrating Gurobi and prescriptive analytics into your Python ecosystem. In this session, you\u2019ll discover model-building techniques that leverage NumPy and SciPy.sparse as well as the data structures of pandas. We\u2019ll also show you how to seamlessly integrate trained regressors from scikit-learn as constraints in your optimization models. Elevate your workflows and unlock new decision-making capabilities with Gurobi in Python.", "description": "Gurobi is a prescriptive analytics technology that enables you to make optimal decisions from data. You can use prescriptive analytics to generate optimized decision recommendations, based on real-world variables and constraints. Powered by mathematical models solved by mixed-integer optimization, it enables embedded decision intelligence in all kinds of applications in an industry-agnostic fashion and in any deployment scenario.\r\n\r\nJoin us as we explore integrating Gurobi and prescriptive analytics into your Python ecosystem. In this session, you\u2019ll discover model-building techniques that leverage NumPy and SciPy.sparse as well as the data structures of pandas. We\u2019ll also show you how to seamlessly integrate trained regressors from scikit-learn as constraints in your optimization models. Elevate your workflows and unlock new decision-making capabilities with Gurobi in Python.", "recording_license": "", "do_not_record": false, "persons": [{"code": "QJDXKB", "name": "Silke Horn", "avatar": null, "biography": "Dr. Silke Horn is a Mathematical Optimization QA Engineer with the Gurobi Optimizer team. She began her journey at Gurobi in 2018 in the technical support team and transitioned to R&D in 2024. She holds a Ph.D. in Mathematics from TU Darmstadt (Germany) and has many years of experience in academic teaching and software development.", "public_name": "Silke Horn", "guid": "ffe42e3e-13d3-566e-bef2-8b40c472d1ab", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/QJDXKB/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/BAASYV/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/BAASYV/", "attachments": []}, {"guid": "e8e53a53-ffc7-508f-892a-ee233f7d2bf2", "code": "JUAF3S", "id": 60723, "logo": null, "date": "2025-04-24T16:55:00+02:00", "start": "16:55", "duration": "00:45", "room": "Helium3", "slug": "pyconde-pydata-2025-60723-challenges-and-lessons-learned-while-building-a-real-time-lakehouse-using-apache-iceberg-and-kafka", "url": "https://pretalx.com/pyconde-pydata-2025/talk/JUAF3S/", "title": "Challenges and Lessons Learned While Building a Real-Time Lakehouse using Apache Iceberg and Kafka", "subtitle": "", "track": "PyData: Data Handling & Engineering", "type": "Talk (long)", "language": "en", "abstract": "How do you build a large-scale data lakehouse architecture that makes data available for business analytics in real time, while being more cost-effective, more flexible and faster than the previous proprietary solution? With Python, Kafka and Iceberg, of course!\r\n\r\nWe built a large-scale data lakehouse based on Apache Iceberg for the Schwarz Group, Europe's largest retailer. The system collects business data from thousands of stores, warehouses and offices across Europe.\r\n\r\nIn this talk, we will present our architecture, the challenges we faced, and how Apache Iceberg is shaping up to be the data lakehouse format of the future.", "description": "The Schwarz Group is present in thirty-two countries around the world with over ten thousand stores, hundreds of warehouses, an assortment of over two thousand different products and a single ERP system to manage them all.\r\n\r\nEvery country maintains its own databases for operational purposes but all the data is also gathered in one central analytics platform for all countries. Not only does this platform need to be stable and reliable, the data also needs to be made available for the consumers in near real-time \u2013within mere minutes. The existing analytics platform was based on proprietary solutions which were expensive and required niche knowledge, severely limiting the number of available developers.\r\n\r\nTherefore, we set out on a journey to completely redesign the analytics platform; a new solution, based as much as possible on Open Source technologies like Python, Kafka and Iceberg. Leveraging Python for its great ecosystem and ease of use. Kafka for fast and reliable message processing of over one thousand tables per country into one central hub. And Iceberg at the core, as our data lakehouse format for its fully transparent schema evolution and high performance through its rich metadata layer.\r\nThrough our presentation we will showcase the different challenges we faced during the design of our new architecture, how our selected tech stack allowed us to tackle each of them and the lessons we learnt. We will focus on our challenges in four areas: scalability, performance, continuity of service and data quality.\r\n\r\nScalability: Ingesting changes on over one thousand tables coming from servers across thirty-two countries supporting operations of Europe\u2019s largest retailer is no easy task. Our architecture needs to support receiving tens of thousands of events per second. We will present how we set up Kafka to support our current load and potential for future growth, how we use Tabular\u2019s Iceberg sink connector to ingest all our tables, and how we leverage avro serialization and snappy compression of messages to reduce network traffic.\r\n\r\nPerformance: The large amounts of data we handle, paired with the influx of small files that can result from real-time data ingestion, made ensuring performance an extremely challenging aspect of our application. We will show how we designed our data lake house; using Iceberg's hidden partitioning to ensure performance while remaining flexible to evolve them over time and how we designed and implemented an effective maintenance job to reduce small files in our iceberg tables.\r\n\r\nContinuity of Service: The existing analytics platform contains the core of the business data which is used for many analytics and forecasting use cases across the organization. One of main requirements was to ensure a smooth transition to the new architecture with as little downtime as possible. This meant facilitating the access to existing users by allowing them to retrieve the data in the same way they were doing it in the past, with minimal changes. We will show how our architecture ensures flexibility by allowing access to the data from diverse query engines and show an example on how we integrated our architecture with Snowflake. \r\n\r\nData quality: We faced some challenges when it came to the consolidation of all the data we receive from all 32 countries. For some tables, the schemas diverged across countries; having different sets of columns, different data types being used and even different primary keys. We will talk about how we handled data quality issues coming from the operational databases by using iceberg\u2019s schema evolution capabilities, a schema registry and Kafka Connect single message transforms (SMTs).", "recording_license": "", "do_not_record": true, "persons": [{"code": "TP9MH3", "name": "Jonas B\u00f6er", "avatar": "https://pretalx.com/media/avatars/TP9MH3_4aZrQVF.jpg", "biography": "Software Engineer since 2018, Data Engineer at inovex since 2022 \u2013\u00a0happy to get the job done, but prefers beautiful solutions.", "public_name": "Jonas B\u00f6er", "guid": "8f849840-6fe1-5e72-a5e1-81076fdee3b6", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/TP9MH3/"}, {"code": "RFERPK", "name": "Elena Ouro Paz", "avatar": "https://pretalx.com/media/avatars/RFERPK_v4pp09r.jpg", "biography": "Data Engineer at Schwarz IT in Berlin, Germany. Where she helps power AI use cases across Europe's largest retailer: the Schwarz Group. Loves showcasing the importance of good data engineering practices, building reliable systems and bringing order to the chaos.", "public_name": "Elena Ouro Paz", "guid": "7565710c-a8ae-5ca2-941e-b4a82edb74c7", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/RFERPK/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/JUAF3S/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/JUAF3S/", "attachments": []}], "Platinum3": [{"guid": "59a7772f-e1f0-5c41-83ac-2faec35f2ffe", "code": "PW3VKG", "id": 66640, "logo": null, "date": "2025-04-24T10:15:00+02:00", "start": "10:15", "duration": "00:30", "room": "Platinum3", "slug": "pyconde-pydata-2025-66640-building-versatile-operating-setups-for-real-world-use-and-testing-with-python-and-the-raspberry-pi", "url": "https://pretalx.com/pyconde-pydata-2025/talk/PW3VKG/", "title": "Building versatile operating setups for real world use and testing with Python and the Raspberry Pi", "subtitle": "", "track": "PyData: Embedded Systems & Robotics", "type": "Sponsored Talk", "language": "en", "abstract": "**Rosenxt** is the host of a number of ventures aiming to provide next level solutions for demanding problems in a variety of industries based on decades of engineering excellence.\r\n\r\nSome of them address challenges in water environments ranging from water pipelines to offshore applications.\r\nAs differing as these areas may seem, regarding the solutions we build for them they have a lot in common.\r\n\r\nWhether its the necessary power supply, movement and steering concepts or sensing approaches.\r\nAll of them benefit from generalized, smart solutions that we design as components that can later be orchestrated and configured in various setups to fulfill quite different purposes.\r\n\r\nThis presentation explores the versatility of leveraging a Raspberry Pi based hardware platform combined with a Python based application stack to bridge development and deployment of various basic components, such as motors and motor controllers, lift foils, steering units and controls.\r\nBy utilizing a unified platform, we demonstrate how the same system can seamlessly transition from test bench measurements during hardware component development to real-world applications for various industries.\r\n\r\nThe talk highlights how this approach can create a robust framework to help streamlining workflows, enhance scalability and reduce costs.", "description": "We will specifically showcase a setup, where a custom made Raspberry Pi based hardware platform and a Python application stack is used for operating a so called functional model, where a set of components is orchestrated to showcase a final usage scenario and the same setup is used in a test rig environment to specifically benchmark a single component of the functional model. Both use cases work pretty much the same way and generate the same sort of data in the same formats and structure, which eases evaluation and handling significantly.\r\n\r\nThe solution presented showcases both 'standard' Python applications interacting with each other on a Raspberry Pi as well as **pyscript** based scripts running in a Web-Browser to visualise test data in realtime.", "recording_license": "", "do_not_record": false, "persons": [{"code": "DUSVE9", "name": "Jens Nie", "avatar": "https://pretalx.com/media/avatars/DUSVE9_x6pgG1p.jpg", "biography": "A physicist currently tackling the development of embedded devices at Rosenxt for various use cases. My journey with Python began a long, long time ago, when the interpreters version string said 1.4.\r\n\r\nBesides my current efforts I can rely on great experience from various other roles in my prior career as a scientist, technology manager and department head.", "public_name": "Jens Nie", "guid": "df23dc13-20d6-55c5-a620-5d4717e311aa", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/DUSVE9/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/PW3VKG/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/PW3VKG/", "attachments": []}, {"guid": "337dc95c-55c2-515f-b037-99c25d5a7592", "code": "7FLW7F", "id": 67105, "logo": null, "date": "2025-04-24T10:55:00+02:00", "start": "10:55", "duration": "00:30", "room": "Platinum3", "slug": "pyconde-pydata-2025-67105-composable-ai-building-next-gen-ai-agents-with-mcp", "url": "https://pretalx.com/pyconde-pydata-2025/talk/7FLW7F/", "title": "Composable AI: Building Next-Gen AI Agents with MCP", "subtitle": "", "track": "PyData: Generative AI", "type": "Sponsored Talk", "language": "en", "abstract": "At Blue Yonder, we're embarking on a journey toward building composable AI agents using Model Context Protocol (MCP). We're discovering firsthand the challenges of integrating diverse products and APIs into useful, context-aware agents. In this talk, I'll discuss our early experiences, the challenges we've faced, and why MCP is emerging as a potential game changer for developing scalable, flexible AI solutions.", "description": "In this talk, I'll share our journey with MCP at Blue Yonder, explaining why this protocol is becoming crucial for anyone involved in building AI agents. We'll start by understanding what an agent really is - essentially a clever brain leveraging powerful tools - and why composability is the key to efficient development.\r\n\r\nYou'll discover what MCP is, how it's already shaping popular tools like Cursor and Claude Desktop, and why developers everywhere are excited about it. I'll dive into practical insights, showing how agents like Manus, a highly regarded agent hailed as the next \"DeepSeek\" moment, achieved success simply by combining 29 MCP-compliant tools effectively. This demonstrates the power of composing existing capabilities rather than reinventing the wheel.\r\n\r\nWe'll also explore how MCP empowers organizations. Using MCP SDKs and OpenAPI wrappers, even teams without extensive AI expertise can rapidly transform existing APIs into sophisticated, usable AI agents. But there's no silver bullet. I'll frankly discuss some organizational challenges, including the tendency to chase flashy \"new\" agents over contributing collaboratively to existing solutions.\r\n\r\nFinally, we'll look ahead to an exciting future, envisioning a world where entire product ecosystems are MCP-enabled. Imagine agents seamlessly orchestrating tasks across multiple products, unlocking entirely new possibilities in user interaction.\r\n\r\nJoin me for an engaging session, learn from our experiences, and see how MCP can reshape your approach to building the next generation of composable AI agents.", "recording_license": "", "do_not_record": false, "persons": [{"code": "9FLCR9", "name": "Martin Seeler", "avatar": "https://pretalx.com/media/avatars/9FLCR9_8EWueN1.png", "biography": "Martin Seeler is the approachable tech enthusiast from next door who effortlessly bridges the gap between cutting-edge AI and practical customer solutions. As a Senior Staff Engineer at Blue Yonder, he spearheads Generative AI initiatives, crafting solutions that genuinely benefit customers rather than just riding the hype train. With a rich background in software development across various industries, Martin thrives as a tinkerer who delves into research papers and develops proof-of-concepts. His passion lies in harmonizing customer needs with state-of-the-art Generative AI solutions, ensuring technology serves as a tool for meaningful progress.", "public_name": "Martin Seeler", "guid": "379a916e-b3b2-5066-a04c-63729babd9a2", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/9FLCR9/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/7FLW7F/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/7FLW7F/", "attachments": []}, {"guid": "554e6834-22db-51ba-944a-b61fddc241e2", "code": "WMBDJ8", "id": 61421, "logo": null, "date": "2025-04-24T11:35:00+02:00", "start": "11:35", "duration": "00:45", "room": "Platinum3", "slug": "pyconde-pydata-2025-61421-going-global-taking-code-from-research-to-operational-open-ecosystem-for-ai-weather-forecasting", "url": "https://pretalx.com/pyconde-pydata-2025/talk/WMBDJ8/", "title": "Going Global: Taking code from research to operational open ecosystem for AI weather forecasting", "subtitle": "", "track": "PyCon: MLOps & DevOps", "type": "Talk (long)", "language": "en", "abstract": "When I was hired as a Scientist for Machine Learning, experts said ML would never work in weather forecasting. Nowadays, I get to contribute to Anemoi, a full-featured ML weather forecasting framework used by international weather agencies to research, build, and scale AI weather forecasting models. \r\n\r\nThe project started out as a curiosity by my colleagues and soon scaled as a result of its initial success. As machine learning stories go, this is a story of change, adaptation and making things work. \r\n\r\nIn this talk, I'll share some practical lessons: how we evolved from a mono-package with four people working on it to multiple open-source packages with 40+ internal and external collaborators. Specifically, how we managed the explosion of over 300 config options without losing all of our sanity, building a separation of packages that works for both researchers and operations teams, as well as CI/CD and testing that constrains how many bugs we can introduce in a given day. You'll learn concrete patterns for growing Python packaging for ML systems, and balancing research flexibility with production stability. As a bonus, I'll sprinkle in anecdotes where LLMs like chatGPT and Copilot massively failed at facilitating this evolution.\r\n\r\nJoin me for a deep dive into the real challenges of scaling ML systems - where the weather may be hard to predict, but our code doesn't have to be.", "description": "What does it take to go from \"ML will never work in weather forecasting\" to running AI models in production at weather agencies? This talk chronicles the journey of Anemoi, a framework that evolved from research code to an operational ML weather forecasting system - and the technical challenges we faced along the way.\r\n\r\nStarting as experimental code and notebooks by a small team of four, Anemoi grew into a robust ecosystem supporting 40+ developers across multiple international weather agencies. I'll share our experience of scaling both the team and codebase, including the interesting challenge of conducting weekly code tours for new team members while maintaining development velocity.\r\n\r\nThe technical evolution of Anemoi mirrors many challenges in scaling ML systems. We'll explore how the codebase transformed from research artifacts and notebooks into a structured mono-package with proper separation of concerns. Then, how we split this into an ecosystem of specialized packages - only to later realize that some components were too tightly coupled and needed reunification. This journey offers valuable lessons about when to split packages and when to maintain unified codebases. \r\n\r\nConfiguration management evolved alongside our architecture. I'll demonstrate how we leveraged Hydra to tame over 300 configuration options into a hierarchical system that enables component composition without sacrificing usability. This system now powers everything from dataset creation to model inference, with full traceability of configurations and artifacts throughout the ML lifecycle.\r\n\r\nA unique aspect of developing ML systems at ECMWF is integrating with decades of expertise in weather forecast validation. We'll look at how we connected modern ML tooling like MLFlow with traditional meteorological evaluation systems, creating a bridge between ML innovation and established meteorological practices.\r\n\r\nThe talk will cover practical challenges that every growing ML system faces:\r\n\r\n- Making model components truly configurable and replaceable\r\n- Implementing model sharding for global weather predictions\r\n- Supporting flexible grids for regional weather services\r\n- Managing CI/CD across multiple packages\r\n- Streamlining release processes with modern tools\r\n- The eternal struggle with changelog management\r\n\r\nThroughout the presentation, I'll share real examples of what worked, what didn't, and why - including our experiments with AI coding assistants and where they fell short. You'll walk away with concrete patterns for scaling Python ML systems, strategies for managing growing complexity, and insights into balancing research flexibility with production requirements.\r\n\r\nWhether you're scaling an ML system, managing a growing Python codebase, or interested in how weather forecasting is being transformed by AI, this talk offers practical lessons from the frontier of operational ML systems.", "recording_license": "", "do_not_record": false, "persons": [{"code": "HCWQZW", "name": "Jesper Dramsch", "avatar": "https://pretalx.com/media/avatars/HCWQZW_H2mkmrg.jpg", "biography": "Jesper Dramsch works at the intersection of machine learning and physical, real-world data. Currently, they're working as a scientist for machine learning in numerical weather prediction at the coordinated organisation ECMWF.\r\n\r\nJesper is a fellow of the Software Sustainability Institute, creating awareness and educational resources around the reproducibility of machine learning results in applied science. Before, they have worked on applied exploratory machine learning problems, e.g. satellites and Lidar imaging on trains, and defended a PhD in machine learning for geoscience. During the PhD, Jesper wrote multiple publications and often presented at workshops and conferences, eventually holding keynote presentations on the future of machine learning in geoscience.\r\n\r\nMoreover, they worked as consultant machine learning and Python educator in international companies and the UK government. They create educational notebooks on Kaggle applying ML to different domains, reaching rank 81 worldwide out of over 100,000 participants and their video courses on Skillshare have been watched over 128 days by over 4500 students.  Recently, Jesper was invited into the Youtube Partner programme creating videos around programming, machine learning, and tech.", "public_name": "Jesper Dramsch", "guid": "7992ef9b-b67c-5de3-9370-79479801a771", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/HCWQZW/"}], "links": [{"title": "Talk Resources and Slides", "url": "https://dramsch.net/pycon-germany-2025", "type": "related"}], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/WMBDJ8/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/WMBDJ8/", "attachments": []}, {"guid": "adecc8f0-d6de-5627-90a5-9af0290dd691", "code": "3DCS8K", "id": 66511, "logo": null, "date": "2025-04-24T14:20:00+02:00", "start": "14:20", "duration": "00:30", "room": "Platinum3", "slug": "pyconde-pydata-2025-66511-dataframely-a-declarative-native-data-frame-validation-library", "url": "https://pretalx.com/pyconde-pydata-2025/talk/3DCS8K/", "title": "Dataframely \u2014 A declarative, \ud83d\udc3b\u200d\u2744\ufe0f-native data frame validation library", "subtitle": "", "track": "PyData: Data Handling & Engineering", "type": "Sponsored Talk", "language": "en", "abstract": "Understanding the structure and content of data frames is crucial when working with tabular data \u2014 a core requirement for the robust pipelines we build at QuantCo.\r\n\r\nLibraries such as `pandera` or `patito` already exist to ease the process of defining data frame schemas and validating that data frames comply with these schemas. However, when building production-ready data pipelines, we encountered limitations of these libraries. Specifically, we were missing support for strict static type checking, validation of interdependent data frames, and graceful validation including introspection of failures.\r\n\r\nTo remedy the shortcomings of these libraries, we started building `dataframely` at the beginning of last year. Dataframely is a declarative data frame validation library with first-class support for polars data frames.\r\n\r\nOver the last year, we have gained experience in using `dataframely` both for analytical and production code across several projects. The result was a drastic improvement of the legibility of our pipeline code and our confidence in its correctness. To enable the wider data engineering community to benefit from similar effects, we have recently open-sourced `dataframely` and are keen on introducing it in this talk.", "description": "In this talk, we will talk about the motivation behind building `dataframely` in more detail and lead the audience through its key features. We will also touch upon our learnings in developing robust data pipelines that establish clear contracts for the design of data transformations. In our experience, this significantly improves communication among developers and comprehensibility of the entire pipeline.", "recording_license": "", "do_not_record": false, "persons": [{"code": "Y3BGJB", "name": "Oliver Borchert", "avatar": "https://pretalx.com/media/avatars/Y3BGJB_44RR8hN.jpg", "biography": "For the past 3 years, I have been working on machine learning and data engineering and QuantCo. Previously, I studied computer science at the Technical University of Munich, focusing on machine and deep learning.", "public_name": "Oliver Borchert", "guid": "39865145-c435-5506-8b4b-1e234a152a1d", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/Y3BGJB/"}, {"code": "BVTTB3", "name": "Daniel Elsner", "avatar": "https://pretalx.com/media/avatars/BVTTB3_dzXXJpr.jpg", "biography": "I am currently a software engineer at QuantCo. Previously, I worked as a researcher in program analysis and software testing at the Technical University of Munich.", "public_name": "Daniel Elsner", "guid": "e8294db2-2867-5f7a-868b-ba8279e9266d", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/BVTTB3/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/3DCS8K/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/3DCS8K/", "attachments": []}, {"guid": "40466c08-7ee1-5815-a345-6bace12fd91b", "code": "UDDTBS", "id": 61818, "logo": null, "date": "2025-04-24T15:00:00+02:00", "start": "15:00", "duration": "00:45", "room": "Platinum3", "slug": "pyconde-pydata-2025-61818-accuracy-is-not-enough-building-trustworthy-ai-with-conformal-prediction", "url": "https://pretalx.com/pyconde-pydata-2025/talk/UDDTBS/", "title": "Accuracy Is Not Enough: Building Trustworthy AI with Conformal Prediction", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Statistics", "type": "Talk (long)", "language": "en", "abstract": "Building a good scoring model is just the beginning. In the age of critical AI applications, understanding and quantifying uncertainty is as crucial as achieving high accuracy. This talk highlights conformal prediction as the definitive approach to both uncertainty quantification and probability calibration, two extremely important topics in Deep Learning and Machine Learning. We\u2019ll explore its theoretical underpinnings, practical implementations using TorchCP, and transformative impact on safety-critical fields like healthcare, robotics, and NLP. Whether you're building predictive systems or deploying AI in high-stakes environments, this session will provide actionable insights to level up your modelling skills for robust decision-making.", "description": "When deploying machine learning models in the real world, especially in domains like healthcare, robotics, or natural language processing, the stakes are high. It\u2019s not enough to train a model, evaluate its accuracy, and call it a day. Questions of how confident the model is, how reliable its predictions are, and how to act on these predictions are critical yet often overlooked. This talk takes you beyond conventional metrics and into the world of uncertainty quantification and probability calibration, with conformal prediction as the definitive tool for both.\r\n\r\n\r\n\r\nWe\u2019ll start of the presentation by exploring the fundamental need for uncertainty in AI systems\u2014why it matters, how it\u2019s quantified, and how it can be used to make informed decisions. From there, we\u2019ll introduce conformal prediction, a mathematically rigorous yet practical framework that provides guarantees on prediction reliability while remaining model-agnostic. Core concepts such as probability calibration and uncertainty quantification will be highlighted as key parts in the modelling process, establishing their importance in the domain.\r\n\r\n\r\n\r\nThe session will also feature real-world examples and use cases such as:\r\n\r\n- Healthcare: Predict irAE likelihood with quantifiable confidence, to inform life and death decisions\r\n\r\n- Robotics: Navigate dynamic environments safely using calibrated vision-language models.\r\n\r\n- Natural Language Processing: Improve outputs of large language models with uncertainty-aware predictions.\r\n\r\n\r\n\r\nFinally, we\u2019ll showcase the TorchCP toolbox, a GPU-accelerated library for integrating conformal prediction into deep learning pipelines, an area of Data Science that has a lot of hype but often overlooks the importance of such tools. Through a live demonstration, you\u2019ll see how to implement these methods step-by-step, empowering you to build trustworthy AI systems that go beyond accuracy.\r\n\r\n\r\n\r\nAttendees will leave with:\r\n\r\n- A solid understanding of uncertainty quantification, probability calibration and their importance.\r\n\r\n- Practical knowledge of conformal prediction and how to implement it.\r\n\r\n- A new perspective on AI reliability and decision-making in critical domains.\r\n\r\n\r\n\r\nWhether you're an ML researcher, data scientist, or practitioner deploying AI models in critical environments, this session will equip you with the right tools and philosophy to create AI systems that are not only accurate but also reliable and robust.", "recording_license": "", "do_not_record": false, "persons": [{"code": "GCAKSZ", "name": "Chris Aivazidis", "avatar": "https://pretalx.com/media/avatars/GCAKSZ_Tempisc.jpeg", "biography": "A data scientist that goes beyond conventional methods to build robust and trustworthy AI models and solutions. \r\n- Experience in industry leading companies and a fairly short research background in Explainable AI in NLP. \r\n- Background in mathematics.\r\n- Always keeping up to date with the latest AI research and findings.\r\n- Competing in Machine Learning competitions.", "public_name": "Chris Aivazidis", "guid": "71d01785-daf9-5b21-bcfc-f0c8fd75d36f", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/GCAKSZ/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/UDDTBS/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/UDDTBS/", "attachments": []}, {"guid": "ae1413c0-02ee-576f-bedc-af6cadbc3499", "code": "3FSWJU", "id": 67452, "logo": null, "date": "2025-04-24T16:15:00+02:00", "start": "16:15", "duration": "00:30", "room": "Platinum3", "slug": "pyconde-pydata-2025-67452-cache-me-if-you-can-boosted-application-performance-with-redis-and-client-side-caching", "url": "https://pretalx.com/pyconde-pydata-2025/talk/3FSWJU/", "title": "Cache me if you can: Boosted application performance with Redis and client-side caching", "subtitle": "", "track": "PyData: Data Handling & Engineering", "type": "Sponsored Talk", "language": "en", "abstract": "Did you know Redis can notify your app about server-side data changes? This feature enables client-side tracking and caching in redis-py, helping to reduce network round-trips and optimize performance. In this talk, we explore how client-side caching works in redis-py and how you can use it to make your applications even faster.", "description": "Did you know Redis can notify your app about server-side data changes? This feature enables client-side tracking and caching in redis-py, helping to reduce network round-trips and optimize performance. In this talk, we explore how client-side caching works in redis-py and how you can use it to make your applications even faster. The following topics are covered:\r\n\r\n- Quick introduction to Redis\r\n- Redis as a cache\r\n- What is client-side caching?\r\n- What's new in redis-py", "recording_license": "", "do_not_record": false, "persons": [{"code": "D7HNKK", "name": "David Maier", "avatar": "https://pretalx.com/media/avatars/D7HNKK_dLkAsSd.png", "biography": "I am a creative Software Engineer and a skilled Consultant with experiences in Software Project Management, for both Product Development and Customer Projects. Furthermore I have a strong Database background by being specialized on NoSQL Database Systems. My experience with Redis spans performance engineering, post-sales consultancy, technical education, and client library and ecosystem integration engineering.", "public_name": "David Maier", "guid": "74d18cb4-ad52-5ace-8a61-9b226d6546b9", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/D7HNKK/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/3FSWJU/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/3FSWJU/", "attachments": []}, {"guid": "36602324-91dd-51c3-b0b1-c9cdb13eeadd", "code": "WLZSEZ", "id": 60499, "logo": null, "date": "2025-04-24T16:55:00+02:00", "start": "16:55", "duration": "00:45", "room": "Platinum3", "slug": "pyconde-pydata-2025-60499-a11y-need-is-love-but-accessible-docs-help-too", "url": "https://pretalx.com/pyconde-pydata-2025/talk/WLZSEZ/", "title": "A11y Need Is Love (But Accessible Docs Help Too)", "subtitle": "", "track": "PyData: PyData & Scientific Libraries Stack", "type": "Talk (long)", "language": "en", "abstract": "Accessible documentation benefits everyone, from developers to end users. Using the [PyData Sphinx Theme](https://pydata-sphinx-theme.readthedocs.io/en/stable/) as a case study, this talk dives into common accessibility barriers in documentation websites like low contrast colors, missing focus states, etc. and practical ways to address them. Learn about accessibility improvements and take part in a live accessibility audit to see how small changes can make a big difference.", "description": "The Beatles told us that \u2018all you need is love\u2019 and while that is a lovely sentiment, love alone won\u2019t fix low contrast colours, missing focus states or inaccessible navigation. These barriers impact countless users with disabilities, reducing the usefulness and reach of valuable documentation. So, while love is great, accessible docs are *essential*.\r\n\r\nIn this talk, we will use the [PyData Sphinx Theme](https://pydata-sphinx-theme.readthedocs.io/en/stable/) as a case study to explore common accessibility problems in documentation websites and how to tackle them. We will discuss the accessibility changes we made to the theme, how those changes affected users, and what we learnt along the way. Additionally, we will also conduct a short accessibility audit on a website suggested by the audience. This demo will provide a practical understanding of how to improve accessibility.\r\n\r\nWhether you\u2019re a documentation maintainer, a curious developer or simply someone who cares about accessibility, this beginner-friendly talk will help you learn more about accessibility in documentation and how to get started. Love might be a universal language, but your code appreciates accessible documentation.", "recording_license": "", "do_not_record": false, "persons": [{"code": "9HDESG", "name": "Smera Goel", "avatar": "https://pretalx.com/media/avatars/9HDESG_gCyt2FY.jpg", "biography": null, "public_name": "Smera Goel", "guid": "a8cce646-0269-5947-97e9-6e2aeba1a515", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/9HDESG/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/WLZSEZ/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/WLZSEZ/", "attachments": []}], "Europium2": [{"guid": "37d6886a-3f90-523e-95d9-578c5eeca0b7", "code": "NQ3RHQ", "id": 67913, "logo": null, "date": "2025-04-24T10:15:00+02:00", "start": "10:15", "duration": "00:30", "room": "Europium2", "slug": "pyconde-pydata-2025-67913-blazing-fast-python-in-your-database-unlocking-data-science-at-scale-with-exasol", "url": "https://pretalx.com/pyconde-pydata-2025/talk/NQ3RHQ/", "title": "Blazing-Fast Python in Your Database: Unlocking Data Science at Scale with Exasol", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Statistics", "type": "Sponsored Talk", "language": "en", "abstract": "What if your Python models could run inside your database\u2014at scale, with parallel execution, and zero data movement? Meet Exasol: a high-performance Analytics Engine with native Python support and a massively parallel processing (MPP) engine. In this session, you\u2019ll learn how to run Python directly where your data lives using user-defined functions (UDFs) and customizable script language containers. Whether you're doing forecasting, categorization, or calling APIs in real time, Exasol enables fast, scalable Python execution\u2014perfect for demanding data science workflows. We\u2019ll share real-world use cases, including large-scale model inference across thousands of sensors. If you're tired of bottlenecks and batch jobs, this is your shortcut to blazing-fast, in-database Python.", "description": "What if your Python models could run inside your database\u2014at scale, with parallel execution, and no data movement? Meet Exasol: the high-performance analytics database that speaks native Python, supercharged by a massively parallel processing (MPP) engine.\r\nIn this talk, we\u2019ll dive into how Exasol empowers Python developers and data scientists to run custom Python code\u2014directly where the data lives\u2014using user-defined functions (UDFs) and fully customizable script language containers.\r\n\r\nWhether you\u2019re doing model training, forecasting, categorization, or even tapping into the power of large language models, Exasol brings Python to the party with native support and serious horsepower.\r\n\r\nYou\u2019ll learn how to:\r\n-Execute high-performance Python code inside your database using UDFs.\r\n-Bring any Python library into Exasol with containerized script languages.\r\n-Scale inference and forecasting across thousands of sensors or data points using Exasol\u2019s MPP engine\u2014no batch jobs, no bottlenecks.\r\n-Call APIs or run models in-database to enable real-time, insight-driven applications.\r\n\r\nWe\u2019ll showcase real-world examples, like how one company forecasts sensor traffic volume across entire regions to optimize planning\u2014running thousands of model inferences simultaneously with high speed performance.\r\n\r\nIf you\u2019re tired of waiting for your models to run\u2014or moving massive datasets just to do a quick prediction\u2014this talk is for you. Python meets MPP, and the result is next-level analytics.", "recording_license": "", "do_not_record": false, "persons": [{"code": "XQGCRJ", "name": "Alexander Stigsen", "avatar": "https://pretalx.com/media/avatars/XQGCRJ_fUSdiC5.jpg", "biography": "Alexander Stigsen is the Chief Product Officer at Exasol, where he leads product strategy and innovation for one of the world\u2019s fastest analytics databases. With a deep-rooted background in engineering and a career spanning more than two decades, Alexander has been at the forefront of database technology and product development.\r\nHe is best known as the founder and former CEO of Realm, a groundbreaking mobile database platform that quickly became one of the most widely adopted solutions for mobile app developers worldwide. Under his leadership, Realm was used in applications on over a billion devices and was ultimately acquired by MongoDB, further cementing his influence in the data infrastructure space.\r\nAlexander brings a unique perspective that bridges the worlds of engineering, product leadership, and entrepreneurship. At conferences, he shares insights on building scalable data systems, innovating in developer tools, and navigating the startup-to-acquisition journey\u2014all with a focus on delivering products that developers love.", "public_name": "Alexander Stigsen", "guid": "bf6baf83-5516-5de8-b146-9ad839837fca", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/XQGCRJ/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/NQ3RHQ/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/NQ3RHQ/", "attachments": []}, {"guid": "4f3319d0-bed2-52b1-9485-70c03d9d22b7", "code": "GVUPQN", "id": 66940, "logo": null, "date": "2025-04-24T10:55:00+02:00", "start": "10:55", "duration": "00:30", "room": "Europium2", "slug": "pyconde-pydata-2025-66940-scalable-python-and-sql-data-engineering-without-migraines", "url": "https://pretalx.com/pyconde-pydata-2025/talk/GVUPQN/", "title": "Scalable Python and SQL Data Engineering without Migraines", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Statistics", "type": "Sponsored Talk", "language": "en", "abstract": "This session is for data and ML engineers with a basic understanding of data engineering and Python. It shows how to easily use Python code in Snowflake Notebooks to create data pipelines. By the end, you\u2019ll know how to build and process data pipelines with Python.", "description": "Data loading processes are complex and require effort to organize, often different tools are used and seamless processing is not ensured. Learn how to create pipelines efficiently and easily with Python in Snowflake Notebooks. Create and monitor tasks to continuously load data. Use third-party data directly to extend the data model without copying it. Harness the power of Python  to quickly calculate values and write efficient stored procedures.\r\n\r\nIn this session you will see how to\r\n - Load Parquet data to Snowflake using schema inference\r\n - Setup access to Snowflake Marketplace data\r\n - Create a Python UDF to convert temperature\r\n - Create a data engineering pipeline with Python stored procedures to incrementally process data\r\n - Orchestrate the pipelines with tasks\r\n - Monitor the pipelines with Snowsight", "recording_license": "", "do_not_record": false, "persons": [{"code": "HTZCTW", "name": "Dirk Jung", "avatar": "https://pretalx.com/media/avatars/HTZCTW_ViEOgPY.jpg", "biography": "Dirk Jung has more than 20 years of experience in the IT industry. In his position as Senior Solution Engineer at Snowflake Computing, he supports companies in building modern data and analysis platforms in the cloud. In his professional career, he has held various positions at SAS Institute, Blue Yonder and Datameer, among others. He specializes in business intelligence, predictive analytics and data warehousing.", "public_name": "Dirk Jung", "guid": "5fb63867-7151-5a92-be8a-7457a347b9f5", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/HTZCTW/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/GVUPQN/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/GVUPQN/", "attachments": []}, {"guid": "96d97a60-5fb2-5a5f-ba73-5e4f94ba433e", "code": "ER3V7W", "id": 61120, "logo": null, "date": "2025-04-24T11:35:00+02:00", "start": "11:35", "duration": "00:30", "room": "Europium2", "slug": "pyconde-pydata-2025-61120-bias-meets-bayes-a-bayesian-perspective-on-improving-model-fairness", "url": "https://pretalx.com/pyconde-pydata-2025/talk/ER3V7W/", "title": "Bias Meets Bayes: A Bayesian Perspective on Improving Model Fairness", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Statistics", "type": "Talk", "language": "en", "abstract": "Bias in machine learning models remains a pressing issue, often disproportionately affecting the most vulnerable groups in society. This talk introduces a Bayesian perspective to effectively tackle these challenges, focusing on improving fairness by modeling and addressing bias directly.\r\nYou will learn about the interplay between uncertainty, equity, and predictive accuracy, while gaining actionable insights to improve fairness in diverse applications. Using a practical example of a risk-scoring model trained on data with underrepresented minority groups, I will showcase how Bayesian methods compare to traditional techniques, demonstrating their unique potential to mitigate bias while maintaining performance.", "description": "Machine learning models often perpetuate biases that exacerbate societal inequities, particularly for vulnerable groups. As machine learning increasingly shapes critical decisions, addressing these biases is more important than ever. In this talk, I will explain how Bayesian methods offer a principled and effective approach to improving fairness by directly addressing bias and incorporating uncertainty into machine learning models.\u2028\r\n\r\nThe talk will cover:\r\n\r\n1.\tTheoretical Foundations: I will start by exploring the connection between Bayesian statistics, fairness, and accuracy, with a focus on why uncertainty is a crucial factor in fairness interventions.\r\n2.\tPractical Example: Using a risk-scoring model trained on a dataset with underrepresented minority groups, I will demonstrate how Bayesian methods compare to traditional fairness techniques. This example will illustrate their ability to not only mitigate bias but also adapt to complex, real-world data distributions while maintaining predictive accuracy.\r\n3.\tKey Insights and Applications: Finally, I will provide actionable takeaways on incorporating Bayesian thinking into existing workflows, enabling more equitable and robust outcomes across diverse applications.\u2028\r\n\r\nThis talk is designed to be accessible to a broad audience. While minimal familiarity with machine learning concepts and fairness principles is recommended, no advanced knowledge of statistics is required. Attendees will leave with practical tools, code examples, and insights to address bias effectively in real-world scenarios, empowering them to promote fairness in their own projects and organizations.", "recording_license": "", "do_not_record": false, "persons": [{"code": "LZQCXY", "name": "Vince Nelidov", "avatar": "https://pretalx.com/media/avatars/LZQCXY_N07lcjz.png", "biography": "Vince Nelidov is a Staff Data Science at Blue Yonder with diverse consulting experience in the data domain in a variety of industries from energy sector and banking to skincare and agriculture. Throughout his years in the data world, Vince has been combining advanced data science with business insights to make data work with an impact. He aspires to see far beyond what is on the surface and get to the essence of the problems, discovering robust and scalable long-term solutions rather than temporary fixes.\r\n\r\nVince is passionate about sharing his knowledge and insights, believing that Data literacy should not be a privilege of a few. And his goal is to be there to make this a reality. Making the intricacies of data science intelligible and uncovering the regularities hiding in the data is a major source of inspiration for Vince. With this goal in mind, he combines his years of experience in consulting with his background in statistics, research and teaching to make this knowledge accessible to businesses and individuals in need.", "public_name": "Vince Nelidov", "guid": "6a6862c2-a3f7-556e-ba8e-64dfd8f8d975", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/LZQCXY/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/ER3V7W/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/ER3V7W/", "attachments": []}, {"guid": "14643bf6-a39d-5c47-bf71-be3c482de02b", "code": "ME7XPJ", "id": 61230, "logo": null, "date": "2025-04-24T14:20:00+02:00", "start": "14:20", "duration": "00:30", "room": "Europium2", "slug": "pyconde-pydata-2025-61230-oh-my-license-achieving-order-by-automation-in-the-license-chaos-of-your-dependencies", "url": "https://pretalx.com/pyconde-pydata-2025/talk/ME7XPJ/", "title": "Oh my license! \u2013 Achieving order by automation in the license chaos of your dependencies", "subtitle": "", "track": "PyCon: Programming & Software Engineering", "type": "Talk", "language": "en", "abstract": "License issues can haunt you at night.\r\nYou spend days, weeks, and months developing beautiful software.\r\nBut then it happens.\r\nYou realize that an essential dependency is GPL-3.0 licensed.\r\n\r\nAll your code is now infected with this license.\r\nNow you are forced to either:\r\n1. Rewrite all parts relying on the other library\r\n2. Open-source your codebase under the GPL-3.0 license\r\n\r\nHow could this have been avoided?\r\n\r\nJoin the talk and find out!\r\nFirst, we\u2019ll give you a brief introduction to different software licenses and their implications.\r\nSecond, we\u2019ll show you how to automate your license checking using open-source software.", "description": "Software licensing can feel like a daunting maze, but it doesn\u2019t have to be.\r\nThis talk will demystify the world of software licenses and equip you with the critical knowledge to navigate it with confidence.\r\n\r\nWe\u2019ll start by exploring key categories of licenses\u2014like Strong Copyleft, Weak Copyleft, and Permissive\u2014and break down the most common ones you\u2019ll encounter (e.g., GPL, AGPL, BSD, and MIT). Through concrete examples, you\u2019ll learn how these licenses affect your projects and how to handle them effectively.\r\n\r\nNext, we\u2019ll dive into practical solutions for automating license compliance. You\u2019ll be introduced to conda-deny (an open-source tool) and see how it can help ensure your projects remain compliant without adding manual overhead.\r\n\r\nWhether you\u2019re building open-source software or proprietary tools, this talk will leave you with actionable strategies to future-proof your projects and avoid licensing pitfalls.", "recording_license": "", "do_not_record": false, "persons": [{"code": "KMY3JY", "name": "Paul M\u00fcller", "avatar": "https://pretalx.com/media/avatars/KMY3JY_McOIsXU.jpg", "biography": "Paul studies Computer Science at the KIT in Karlsruhe.\r\nAlongside his studies, he works part-time at QuantCo.", "public_name": "Paul M\u00fcller", "guid": "ef148cc5-1913-50df-b040-4d6fa1db7f8b", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/KMY3JY/"}], "links": [{"title": "Slides for this presentation", "url": "https://docs.google.com/presentation/d/1ygOyKQTIB1RdazQp1ldEV4bNuHzi1I8x/edit?usp=sharing&ouid=108371010412324519105&rtpof=true&sd=true", "type": "related"}], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/ME7XPJ/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/ME7XPJ/", "attachments": []}, {"guid": "4ac75e96-30f9-5a7b-a220-a17c3770202a", "code": "P9GRZU", "id": 59893, "logo": null, "date": "2025-04-24T15:00:00+02:00", "start": "15:00", "duration": "00:45", "room": "Europium2", "slug": "pyconde-pydata-2025-59893-quiet-on-set-building-an-on-air-sign-with-open-source-technologies", "url": "https://pretalx.com/pyconde-pydata-2025/talk/P9GRZU/", "title": "Quiet on Set: Building an On-Air Sign with Open Source Technologies", "subtitle": "", "track": "General: Infrastructure - Hardware & Cloud", "type": "Talk (long)", "language": "en", "abstract": "Learn how to build a custom On-Air sign using Apache Kafka\u00ae, Apache Flink\u00ae, and Apache Iceberg\u2122! See how to capture events like Zoom meetings and camera usage with Python, process data with FlinkSQL, analyze trends using Iceberg, and bring it all together with a practical IoT project that easily scales out.", "description": "While many of us have adapted to work from home life, one major problem remains: finding an easy way to keep folks in your home away from your workspace when you\u2019re on an important call. Dust off your Raspberry Pi\u2013\u2013let\u2019s build a custom on-air sign with Apache Kafka\u00ae, Apache Flink\u00ae, and Apache Iceberg\u2122!\r\n\r\nWe\u2019ll begin by writing Python scripts to capture key events\u2013\u2013such as when a Zoom meeting is running and when a camera is being used\u2013\u2013and produce it into Kafka. The live data are then consumed by a Raspberry Pi script to drive the operation of a custom designed on-air sign. From there, you\u2019ll be introduced to the ins and outs of FlinkSQL for stream processing as we wrangle the data into a better format for downstream use. And, finally, we\u2019ll see Iceberg in action and learn how to use query engines to analyze meeting and recording trends.\r\n\r\nBy the end of the session, you\u2019ll be well-acquainted with this powerful trio of open source technologies and know how you could use the same scaffolding and scale out a simple, at-home project to millions of users and simultaneous events.", "recording_license": "", "do_not_record": false, "persons": [{"code": "BQKRWQ", "name": "Danica Fine", "avatar": "https://pretalx.com/media/avatars/BQKRWQ_MHUF7uf.jpg", "biography": "Danica began her career as a software engineer in data visualization and warehousing with a business intelligence team where she served as a point-person for standards and best practices in data visualization across her company. In 2018, Danica moved to San Francisco and pivoted to backend engineering with a derivatives data team which was responsible for building and maintaining the infrastructure that processes millions of financial market data per second in near real-time. Her first project on this team involved Kafka Streams and Kafka Connect. From there, she immersed herself in the world of data streaming and found herself quite at home in the Apache Kafka and Apache Flink communities. She now leads the open source advocacy efforts at Snowflake, supporting Apache Iceberg and Apache Polaris (incubating). Outside of work, Danica is passionate about sustainability, increasing diversity in the technical community, and keeping her many houseplants alive. She can be found on X (Bluesky and Mastodon), talking about tech, plants, and baking @TheDanicaFine.", "public_name": "Danica Fine", "guid": "6d1075a1-ba4b-5270-abc2-38f3f2b1b25d", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/BQKRWQ/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/P9GRZU/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/P9GRZU/", "attachments": []}, {"guid": "ff52ad11-f4a0-53b1-8b17-5696bfbc1b8a", "code": "3CYZUH", "id": 61325, "logo": null, "date": "2025-04-24T16:15:00+02:00", "start": "16:15", "duration": "00:30", "room": "Europium2", "slug": "pyconde-pydata-2025-61325-building-a-self-hosted-mlops-platform-with-kubernetes", "url": "https://pretalx.com/pyconde-pydata-2025/talk/3CYZUH/", "title": "Building a Self-Hosted MLOps Platform with Kubernetes", "subtitle": "", "track": "PyCon: MLOps & DevOps", "type": "Talk", "language": "en", "abstract": "Many managed MLOps platforms, while convenient, often fall short in providing flexibility, requiring complex integrations, and causing vendor lock-in. In this talk, we\u2019ll share our experience transitioning from managed MLOps tools to a self-hosted solution built on Kubernetes. We\u2019ll focus on how we leveraged open-source tools like Feast, MLflow, and Ray to build a more flexible, scalable, and customizable platform that is now in use at Rewe Digital. By migrating to this self-hosted architecture, we gained greater control over our ML pipelines, reduced our dependency on third-party services, and created a more adaptable infrastructure for our ML workloads.", "description": "Many managed MLOps platforms, while convenient, often fall short in providing flexibility, requiring complex integrations, and causing vendor lock-in. In this talk, we\u2019ll share our experience transitioning from managed MLOps tools to a self-hosted solution built on Kubernetes. We\u2019ll focus on how we leveraged open-source tools like Feast, MLflow, and Ray to build a more flexible, scalable, and customizable platform that is now in use at Rewe Digital. By migrating to this self-hosted architecture, we gained greater control over our ML pipelines, reduced our dependency on third-party services, and created a more adaptable infrastructure for our ML workloads.\r\n\r\nTalk Outline: \r\n\r\n1. Introduction (5 minutes):\r\n- The challenges of using managed MLOps platforms: vendor lock-in, integration complexity, and lack of flexibility.\r\n- Why transitioning to a self-hosted solution on Kubernetes can be beneficial.\r\n\r\n2. Proposed Solution (10 minutes):\r\n- Why Kubernetes for MLOps?\r\n- How open-source tools like Feast, MLflow, and Ray come together to form the core of a robust self-hosted MLOps stack.\r\n- Benefits of building a flexible, scalable platform that fits your needs.\r\n\r\n3. Building the Platform (10 minutes):\r\n- Practical steps for setting up and configuring Feast, MLflow, and Ray on Kubernetes.\r\n- Integration strategies and how to manage pipelines, model tracking, and feature storage.\r\n\r\n4. Lessons Learned and Q&A (5 minutes):\r\n- Challenges and takeaways during the migration process\r\n- Q&A", "recording_license": "", "do_not_record": false, "persons": [{"code": "NQU3YJ", "name": "Josef Nagelschmidt", "avatar": "https://pretalx.com/media/avatars/NQU3YJ_jpctoru.jpg", "biography": "I'm Josef, an econometrician turned ML engineer. With a strong background in statistics and causal inference, I have developed my skills through rigorous work at institutions such as the University of Bonn and UC Berkeley, but also through the design and implementation of ML solutions at the Rewe Group. My passion lies in reducing model and ecosystem complexity, enhancing interpretability, and bridging the gap between academia and production settings in the context of machine learning. I believe that if we do not establish reliable machine learning systems, we risk failing to harness the immense potential they offer for humanity.", "public_name": "Josef Nagelschmidt", "guid": "9f40c92e-fcdb-51d3-971b-417cfea9b708", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/NQU3YJ/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/3CYZUH/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/3CYZUH/", "attachments": [{"title": "Slides", "url": "/media/pyconde-pydata-2025/submissions/3CYZUH/resources/buildi_m7YrX3J.pdf", "type": "related"}]}, {"guid": "166b8d30-ded8-5e4b-bd56-f4a57671f411", "code": "BR3D83", "id": 60204, "logo": null, "date": "2025-04-24T16:55:00+02:00", "start": "16:55", "duration": "00:45", "room": "Europium2", "slug": "pyconde-pydata-2025-60204-from-algorithm-to-action-building-a-diy-distributed-trading-platform-with-open-source", "url": "https://pretalx.com/pyconde-pydata-2025/talk/BR3D83/", "title": "From Algorithm to Action: Building a DIY Distributed Trading Platform with Open Source", "subtitle": "", "track": "PyCon: Programming & Software Engineering", "type": "Talk (long)", "language": "en", "abstract": "In this talk, we'll explore how you can implement your own distributed system for algorithmic trading leveraging the power of open source without being dependent on trading bot providers.\r\n\r\nWe will discuss different challenges occurring in HFT inter alia processing massive amounts of data with low latency and reliable risk control and how to solve them. Furthermore we will touch on the topic of regulatory requirements in trading.\r\n\r\nThese challenges will be addressed through a distributed system implemented in Python, utilizing Kafka for real-time data streaming and PostgreSQL for persistent storage. We will examine approaches to decouple the components to re-use and scale them across different markets.\r\n\r\nCryptocurrency markets are used as a proving ground for the PoC due to easy availability for everyone.", "description": "## Who is this talk for\r\n\r\nThis talk is ideal for all software engineers interested in financial technology, quantitative developers looking to understand modern trading infrastructure, and technical architects exploring distributed systems in high-stakes environments.  \r\nThis talk will NOT discuss specific trading strategies or give any financial advice.\r\n\r\n## Outline\r\n\r\n* Motivation  \r\n* Fundamental trading concepts and market mechanics  \r\n* Market data ingestion and processing  \r\n* Order management and execution  \r\n* Implementation of trading strategies  \r\n* Data storage  \r\n* Outlook\r\n\r\n## Motivation\r\n\r\nThe landscape of financial trading has undergone a dramatic transformation over the past decades. What was once the exclusive domain of institutional players on physical trading floors has evolved into a digitized, accessible marketplace where individual traders can participate from anywhere in the world. The emergence of commission-free trading apps and cryptocurrency exchanges has brought market participation to millions of new retail traders.  \r\nThis enables everyone to participate with their own trading system in global markets.\r\n\r\nIn this talk, we'll explore how you can implement your own distributed system for exchange trading leveraging the power of open source without being dependent on trading bot providers. While we won't be able to cover every aspect in depth, we'll address the most essential elements.\r\n\r\nCryptocurrency markets are used as a proving ground for the PoC due to easy availability for everyone.\r\n\r\n## Fundamental Trading Concepts and Market Mechanics\r\n\r\nWe'll begin by exploring essential trading concepts:\r\n\r\n* Order book dynamics  \r\n* Orders, Trades and Positions  \r\n* Different types of orders and their implications for system implementation  \r\n* Regulatory requirements  \r\n* Performance of strategies\r\n\r\nThese lead to different considerations in system design and architecture:\r\n\r\n* De-coupling of exchange interfaces and trading strategies to use same strategy for different markets by using adapter pattern  \r\n* Horizontal scaling to handle data load  \r\n* Need of low latency components and their communication to properly react to market  \r\n* Need of streaming data for real-time risk management  \r\n* Need of persistent storage for regulatory data and post-trading-analysis  \r\n* Need of order action recording and post-trading analysis for performance evaluation\r\n\r\n## Market Data Ingestion and Processing\r\n\r\nThe foundation of any trading system is its ability to efficiently process market data. This includes a Python component responsible for real-time normalization and standardization of multi-venue data:\r\n\r\n* Efficient market data representation and storage structures  \r\n* Techniques for handling high-throughput data without compromising latency  \r\n* Market data recording for post-trading analysis using Kafka\r\n\r\n## Order Management and Execution\r\n\r\nCritical components for managing the trading lifecycle. This includes a Python component responsible for normalization and standardization of multi-venue order interfaces:\r\n\r\n* Order action handling (placing orders, modifying orders) and keeping track of orders  \r\n* Global real-time position tracking and risk calculation using Kafka  \r\n* State recovery and system restart procedures  \r\n* Audit trail implementation and transaction logging using Postgres\r\n\r\n## Implementation of Trading Strategies\r\n\r\nWe'll explore the practical aspects of implementing trading strategies in Python using the previously discussed system components:\r\n\r\n* Usage of provided market data  \r\n* Placing orders and keeping track of positions  \r\n* Fast communication with market data and order components using Kafka with msgpack\r\n* Recording of strategy internals for post-trade analysis\r\n\r\n## Data storage\r\n\r\nWe will take a closer look to:\r\n\r\n* What kinds of data exist in a trading system (live vs. post-trade)\r\n* Approaches to storing the different data kinds\r\n\r\n## Outlook\r\n\r\nAt the end we will have a brief outlook what other challenges might occur e.g.:\r\n\r\n* Other market types (Finance/Equity/ETF and Energy)  \r\n* Latency considerations  \r\n* Taxes", "recording_license": "", "do_not_record": false, "persons": [{"code": "8NMQMV", "name": "Eugen Geist", "avatar": "https://pretalx.com/media/avatars/8NMQMV_POIcGrV.jpg", "biography": "Seasoned Software & Data Engineering Professional with extensive experience in high-frequency trading systems, data warehousing, and cloud solutions. Expert in optimizing mission-critical systems and implementing engineering best practices. Specialized in Python, SQL and cloud technologies.\r\n\r\nCurrently working as a Freelance Developer focusing on software and data engineering.\r\n\r\nSkilled in developing distributed systems, data pipelines, and performance optimization, consistently delivering solutions that maximize business value.", "public_name": "Eugen Geist", "guid": "25cff7d0-c990-5d16-8f9b-b1624332672c", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/8NMQMV/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/BR3D83/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/BR3D83/", "attachments": []}], "Hassium": [{"guid": "c016136e-a9e4-5fd9-8771-2c587e3c3c58", "code": "TRUUVL", "id": 61163, "logo": null, "date": "2025-04-24T10:15:00+02:00", "start": "10:15", "duration": "00:30", "room": "Hassium", "slug": "pyconde-pydata-2025-61163-scaling-python-an-end-to-end-ml-pipeline-for-iss-anomaly-detection-with-kubeflow", "url": "https://pretalx.com/pyconde-pydata-2025/talk/TRUUVL/", "title": "Scaling Python: An End-to-End ML Pipeline for ISS Anomaly Detection with Kubeflow", "subtitle": "", "track": "PyCon: MLOps & DevOps", "type": "Talk", "language": "en", "abstract": "Building and deploying scalable, reproducible machine learning pipelines can be challenging, especially when working with orchestration tools like Slurm or Kubernetes. In this talk, we demonstrate how to create an end-to-end ML pipeline for anomaly detection in International Space Station (ISS) telemetry data using only Python code.\r\n\r\nWe show how Kubeflow Pipelines, MLFlow, and other open-source tools enable the seamless orchestration of critical steps: distributed preprocessing with Dask, hyperparameter optimization with Katib, distributed training with PyTorch Operator, experiment tracking and monitoring with MLFlow, and scalable model serving with KServe. All these steps are integrated into a holistic Kubeflow pipeline.\r\n\r\nBy leveraging Kubeflow's Python SDK, we simplify the complexities of Kubernetes configurations while achieving scalable, maintainable, and reproducible pipelines. This session provides practical insights, real-world challenges, and best practices, demonstrating how Python-first workflows empower data scientists to focus on machine learning development rather than infrastructure.", "description": "Among popular open-source MLOps tools, **Kubeflow** stands out as a Kubernetes-native platform designed to support the entire ML lifecycle, from data preprocessing to model training, deployment, and retraining. Its modular structure enables the integration of a wide range of tools, making it a highly versatile framework for building scalable and reproducible ML workflows. Despite this, most existing resources focus on individual components rather than demonstrating how these can be orchestrated into a seamless, end-to-end pipeline.\r\n\r\nIn this talk, we present a practical case study that highlights the potential of Kubeflow in a real-world application. Specifically, we showcase how an automated ML pipeline for anomaly detection in International Space Station (ISS) telemetry data can be built and deployed using Kubeflow and other open-source MLOps tools. The dataset, originating from the Columbus module of the ISS, introduces unique challenges due to its complexity and high-dimensional nature, providing an excellent testbed for MLOps workflows.\r\n\r\n### **What makes this approach unique?**\r\n\r\nOur workflow is built entirely in Python, leveraging Kubeflow\u2019s Python SDK to orchestrate every stage of the pipeline. This eliminates the need for manual interaction with Kubernetes or container configurations, making the process accessible to ML engineers and data scientists without extensive DevOps expertise.\r\n\r\n### **Key takeaways for attendees:**\r\n\r\n*   **Tool integration:** Learn how to combine Dask for distributed preprocessing, Katib for hyperparameter optimization, PyTorch Operator for distributed training, MLFlow for experiment tracking and monitoring, and KServe for scalable model serving. These tools are orchestrated into a unified pipeline using Kubeflow Pipelines.\r\n*   **Overcoming challenges:** Gain insights into the technical hurdles faced during the implementation of this pipeline and discover the strategies and best practices that made it possible.\r\n*   **Real-world impact:** Understand how to apply MLOps principles to complex, real-world datasets and how these principles translate into scalable, maintainable, and reproducible workflows.\r\n\r\nTo ensure reproducibility and accessibility, the entire pipeline, including configurations and code, is publicly available in our GitHub repository [here](https://github.com/hsteude/code-ml4cps-paper). Attendees will be able to replicate the workflow, adapt it to their own use cases, or extend it with additional features.\r\n\r\n### **Who should attend?**\r\n\r\nThis session is designed for data scientists, ML engineers, and Python enthusiasts who want to simplify the development of scalable ML pipelines. Whether you're new to Kubernetes or looking to streamline your MLOps workflows, this talk will provide actionable insights and tools to help you succeed.", "recording_license": "", "do_not_record": false, "persons": [{"code": "RL9F37", "name": "Christian Geier", "avatar": "https://pretalx.com/media/avatars/RL9F37_5LpN450.jpg", "biography": "Christian has 12+ years of experience in the scientific application of python in academic and industry settings. He is one of the founders of prokube.ai where he builds an MLOps platform build around Kubeflow, MLFlow, Kubernetes, and a host of other open source tools. He also holds a PhD in physics, where he gained experiences in maintaining a distributed compute clusters. Christian is a maintainer of several OSS projects.", "public_name": "Christian Geier", "guid": "406c71d6-c246-5e68-893b-df4bb32e608e", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/RL9F37/"}, {"code": "VNRJDV", "name": "Henrik Sebastian Steude", "avatar": "https://pretalx.com/media/avatars/VNRJDV_KOoZnuV.jpg", "biography": "Henrik is an ML researcher at Helmut Schmidt University, specializing in the application of ML in cyber-physical systems. In his current project, he is developing an anomaly detection and diagnostic AI system for use with data from the International Space Station. Before returning to academia, Henrik spent five years as a data scientist in various consulting roles, where he had the opportunity to delve into a range of exciting datasets. During this time, Henrik became a Python and Kubeflow enthusiast.", "public_name": "Henrik Sebastian Steude", "guid": "b39ab80a-9f35-54b3-a565-ff3931fde8ac", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/VNRJDV/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/TRUUVL/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/TRUUVL/", "attachments": [{"title": "Slides of the talk", "url": "/media/pyconde-pydata-2025/submissions/TRUUVL/resources/Scali_tbaB0Pv.pdf", "type": "related"}]}, {"guid": "1f156aa9-0408-5c6b-8276-deb1f2d0bd92", "code": "HPGEKH", "id": 61182, "logo": null, "date": "2025-04-24T10:55:00+02:00", "start": "10:55", "duration": "00:30", "room": "Hassium", "slug": "pyconde-pydata-2025-61182-outgrowing-your-node-zero-stress-scaling-with-cupynumeric", "url": "https://pretalx.com/pyconde-pydata-2025/talk/HPGEKH/", "title": "Outgrowing your node? Zero stress scaling with cuPyNumeric.", "subtitle": "", "track": "PyCon: Programming & Software Engineering", "type": "Talk", "language": "en", "abstract": "Many data and simulation scientists use NumPy for its ease of use and good performance on CPU.  This approach works well for single-node tasks, but scaling to handle larger datasets or more resource-intensive computations introduces significant challenges. Not to mention, using GPUs requires another level of complexity. We present the cuPyNumeric library, which  gives developers the same familiar NumPy interface, but seamlessly distributes work across CPUs and GPUs.\r\nIn this talk we showcase the productivity and performance of cuPyNumeric library on one of the user's examples covering some detail on its implementation.", "description": "Many data and simulation scientists use NumPy for its ease of use and good performance on CPU.  This approach works well for single-node tasks, but scaling to handle larger datasets or more resource-intensive computations introduces significant challenges. Not to mention, using GPUs requires another level of complexity. We present the cuPyNumeric library.  cuPyNumeric gives developers the same familiar NumPy interface, but seamlessly distributes work across CPUs and GPUs.\r\n\r\nA compelling example when scaling is necessary is when scientists at the Stanford Linear Accelerator Center(SLAC) need to process a large amount of data within a fixed time window, called beam time.  The full dataset generated during experiments is too large to be processed on a single CPU. Additionally, the code often must be modified during the beam time to adapt to changing experimental needs. Being able to use NumPy syntax rather than lower level distributed computing libraries makes these changes quick and easy, allowing researchers to focus on conducting more experiments rather than debugging or optimizing code.\r\n\r\ncuPyNumeric is designed to be a drop-in replacement to NumPy. Built on top of task-based distributed runtime from Stanford University, it automatically parallelizes NumPy APIs across all available resources, taking care of data distribution, communication, asynchronous and accelerated execution of compute kernels on both GPUs or multi-core CPUs.  In addition, cuPyNumeric can be integrated with other popular Python libraries like SciPy, matplotlib, Jax.  With cuPyNumeric, SLAC scientists successfully ran their data processing code distributed across multiple nodes and GPUs, processing the full dataset with a 6x speed-up compared to the original single-node implementation.\r\n\r\nIn this talk we showcase the productivity and performance of cuPyNumeric library covering some detail on its implementation.", "recording_license": "", "do_not_record": false, "persons": [{"code": "NYEURC", "name": "Bo Dong", "avatar": "https://pretalx.com/media/avatars/NYEURC_enDk60O.jpg", "biography": "Bo Dong is a Principal Technical Product Manager on the CUDA team. He is responsible for distributed computing including Legate and other technologies/products at NVIDIA.", "public_name": "Bo Dong", "guid": "bbf34818-d0cd-5543-8842-4a61fc5dc428", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/NYEURC/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/HPGEKH/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/HPGEKH/", "attachments": []}, {"guid": "74bcad63-bf3a-5a10-843e-76ee9a99ff38", "code": "KCV9RS", "id": 61479, "logo": null, "date": "2025-04-24T11:35:00+02:00", "start": "11:35", "duration": "00:30", "room": "Hassium", "slug": "pyconde-pydata-2025-61479-beyond-alembic-and-django-migrations", "url": "https://pretalx.com/pyconde-pydata-2025/talk/KCV9RS/", "title": "Beyond Alembic and Django Migrations", "subtitle": "", "track": "PyCon: Django & Web", "type": "Talk", "language": "en", "abstract": "ORMs like Django and SQLAlchemy have become indispensable in Python development, simplifying the interaction between applications and databases. Yet, their built-in schema migration tools often fall short in projects that require advanced database features or robust CI/CD integration.\r\n\r\nIn this talk, we\u2019ll explore how you can go beyond the limitations of your ORM\u2019s migration tool. Using Atlas\u2014a language-agnostic schema management tool\u2014as a case study, we\u2019ll demonstrate how Python developers can automate migration planning, leverage advanced database features, and seamlessly integrate database changes into modern CI/CD pipelines.", "description": "Talk Structure: \"Beyond Your ORM's Migration Tool\"\r\n\r\n1. Introduction \u2013 Why ORMs Build Migration Tools\r\n   - ORMs like SQLAlchemy and Django ORM simplify database interactions and include migration tools (e.g., Alembic, Django Migrations) for schema changes.\r\n   - These tools are robust for ORM-defined schemas but lack advanced features and native CI/CD integrations.\r\n\r\n2. Where Built-in Tools Fall Short\r\n   - ORM migration tools focus on basic schema changes but don\u2019t support advanced database objects like triggers, materialized views, or stored procedures.\r\n   - Lack native integration with modern CI/CD tools, leaving teams to implement custom, often suboptimal solutions.\r\n\r\n3. Presenting Atlas \u2013 Bridging the Gap\r\n   - Atlas complements ORM tools by reading their schemas (e.g., Django models, SQLAlchemy models) and enabling advanced extensions.\r\n   - Key features:\r\n     - Support for triggers, materialized views, and other advanced objects.\r\n     - Native CI/CD integration for automating and validating schema changes.\r\n\r\n4. How Atlas Integrates with ORMs\r\n   - Atlas reads ORM-defined schemas and enhances them with advanced features.\r\n   - Combines ORM workflows with Atlas\u2019s robust schema management capabilities, enabling automation and database-specific optimizations.\r\n\r\n5. Demo \u2013 Atlas in Action\r\n   - Example: A Django project adds a materialized view and a trigger using Atlas.\r\n   - Steps:\r\n     - Use Atlas to read the ORM schema and extend it with advanced features.\r\n     - Automate migration validation and deployment through CI/CD pipelines.\r\n   - Outcome: Simplified and automated schema management with modern tooling.\r\n\r\n6. Conclusion and Q&A\r\n   - Key Takeaways:\r\n     - ORM migration tools like Alembic and Django Migrations are great for standard use cases but fall short for advanced workflows and CI/CD integration.\r\n     - Atlas bridges this gap, enabling automation and advanced database features.\r\n   - Call to Action: Try Atlas to enhance schema workflows.\r\n   - Q&A: Open floor for questions.", "recording_license": "", "do_not_record": false, "persons": [{"code": "AYDHS9", "name": "Rotem Tamir", "avatar": "https://pretalx.com/media/avatars/AYDHS9_H3NxWEc.png", "biography": "Rotem Tamir (39), father of two. Co-founder and CTO of Ariga, creator of Atlas, an open-source database schema as code tool.", "public_name": "Rotem Tamir", "guid": "f1691a53-7905-51ff-9e40-b1b8a0ae1b40", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/AYDHS9/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/KCV9RS/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/KCV9RS/", "attachments": []}, {"guid": "402c1ce8-14c3-5ebb-bdd8-401f56abc9fd", "code": "7PDARV", "id": 61356, "logo": null, "date": "2025-04-24T14:20:00+02:00", "start": "14:20", "duration": "00:30", "room": "Hassium", "slug": "pyconde-pydata-2025-61356-writing-reliable-software-while-depending-on-hazardous-apis", "url": "https://pretalx.com/pyconde-pydata-2025/talk/7PDARV/", "title": "Writing reliable software while depending on hazardous APIs", "subtitle": "", "track": "PyCon: MLOps & DevOps", "type": "Talk", "language": "en", "abstract": "As we develop business critical software, we often need to rely on external APIs to get the job done. And all services are not born equal: although the ideal world would provide well operated APIs with over-met service levels, the real world is usually way worse than that. Timeouts, HTTP errors, cascading failures, unclear or changing contracts, approximate protocol implementations ... And even the oh-so-human bad faith while trying to pinpoint the root cause... Most of us have written hacks to handle commonly seen failures, from the quick and dirty implementation to well thought resilience patterns implementation, but this is usually hard to do correctly, and rarely a business priority to invest the correct amount of time and money on the topic. We'll present the options, both including direct dependencies (not framework dependant, although some families can emerge (async/sync ...)) and including a service/proxy based approach.", "description": "The two most common causes for software failure are, in order, human errors then external services. Working extensively with external APIs, we often encounter tricky issues in maintaining the responsiveness of our end-user services (both in terms of speed, but also plain availability). Many teams are addressing those issues on a case-by-case basis, most often using a homemade patchwork of external libraries and failing cases, and we used to do the same. Over time, we have come to rethink our approach to this problem.\r\n\r\nWe will present the usual suspects (and their consequences) we're usually facing: timeouts, HTTP errors, cascading failures, unclear or changing contracts, and the difficulty of forensic analysis after an incident occurs when the root cause stems from external data or calls.\r\n\r\nThen, we'll show various approaches we use or have seen be used by teams of different sizes.\r\n\r\nWe'll finish by presenting an innovative approach delegating the issues to a forward proxy so that the development team can both avoid having to spend time on reinventing the resilience and reliability patterns, while providing them the tools to act quickly when things go wrong.", "recording_license": "", "do_not_record": false, "persons": [{"code": "E7DVCA", "name": "Romain Dorgueil", "avatar": "https://pretalx.com/media/avatars/E7DVCA_nk9z4sI.jpg", "biography": "My first pieces of code ran on Atari ST machines. I had the chance to see the Internet baby say its first words while I was starting to get interested in building software. I'm a proud Software Craftsman and Open-Source Software advocate. I spend a few of my other lifes playing afro-cuban & jazz music, or playing some go games.", "public_name": "Romain Dorgueil", "guid": "4ebccad8-e011-5021-af57-06d9f2cfc554", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/E7DVCA/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/7PDARV/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/7PDARV/", "attachments": []}, {"guid": "8ffefb03-be46-5128-b46a-ac3f48eeca19", "code": "BJKSGK", "id": 59307, "logo": null, "date": "2025-04-24T15:00:00+02:00", "start": "15:00", "duration": "00:45", "room": "Hassium", "slug": "pyconde-pydata-2025-59307-decoding-topics-a-comparative-analysis-of-python-s-leading-topic-modeling-libraries-using-climate-c", "url": "https://pretalx.com/pyconde-pydata-2025/talk/BJKSGK/", "title": "Decoding Topics: A Comparative Analysis of Python\u2019s Leading Topic Modeling Libraries Using Climate C", "subtitle": "", "track": "PyData: Natural Language Processing & Audio (incl. Generative AI NLP)", "type": "Talk (long)", "language": "en", "abstract": "Topic modelling has come a long way, evolving from traditional statistical methods to leveraging advanced embeddings and neural networks. Python\u2019s diverse library ecosystem includes tools like Latent Dirichlet Allocation (LDA) using gensim, Top2Vec, BERTopic, and Contextualized Topic Models (CTM). This talk evaluates these popular approaches using a dataset of UK climate change policies, considering use cases relevant to organisations like DEFRA (Department for Environment, Food & Rural Affairs). The analysis explores real-time integration, dynamic topic modelling over time, adding new documents, and retrieving similar ones. Attendees will learn the strengths, limitations, and practical applications of each library to make informed decisions for their projects.", "description": "Objectives:\r\nThe session aims to:\r\n\r\n1.  Compare Python-based topic modelling libraries, highlighting their relevance to real-world scenarios like policy analysis.\r\n2.    Explore practical use cases, including real-time document integration, tracking topic evolution, and finding similar documents.\r\n3.    Evaluate the tools based on performance, interpretability, scalability, and flexibility, with a focus on climate change policy data presented by [1] focusing on adaptation and mitigation.\r\n4.    Provide actionable guidance on selecting the right library for different project needs and datasets.\r\n\r\nOutline:\r\n\r\n1. Introduction to Topic Modeling: Overview of traditional and modern approaches, including their practical significance.\r\n\r\n2. Algorithms & Libraries Overview: LDA (gensim) [2], CTM [3], Top2Vec [4], BERTopic [5]\r\n\r\n3. Dataset and Use Cases:\r\n      - Overview of the UK climate change policy dataset.\r\n      - Use cases inspired by DEFRA and similar organisations, such as:\r\n            - Real-time integration for continuously adding new documents.\r\n            - Tracking topic development over time (dynamic topic modeling).\r\n            - Retrieving similar documents for faster insights.\r\n           (- Classification)\r\n\r\n4. Evaluation Criteria: Analysis of libraries based on:\r\n        - Ease of Use: How easy it is for no coding experts\r\n        - Quality: Coherence and diversity of extracted topics.\r\n        - Efficiency: Runtime performance and scalability.\r\n        - Flexibility: Features like contextual embeddings and integration capabilities.\r\n        - Interpretability: Ease of understanding topics and output.\r\n\r\n5. Results: Detailed findings, including specific advantages and limitations of each library in supporting the outlined use cases.\r\n\r\n6. Practical Recommendations: Guidance on choosing a library based on project goals, dataset characteristics, and organisational needs.\r\n\r\n7. Conclusion and Future Directions: Summary of key insights and the evolving role of embedding-based methods in topic modelling.\r\n\r\nOutcomes:\r\nBy attending this session, participants will:\r\n\r\n- Gain an in-depth understanding of Python\u2019s top topic modeling libraries.\r\n- Learn how to apply these tools to real-world challenges in policy analysis and other fields.\r\n- Understand how to handle use cases like real-time document integration and topic evolution over time.\r\n- Develop the skills to evaluate and choose the best tool for specific datasets and objectives.\r\n\r\nTarget Audience\r\n\r\nThis talk is for:\r\n- Data scientists and NLP practitioners seeking to apply topic modelling to unstructured text data.\r\n- Policy analysts and researchers working with large textual datasets, such as government or environmental policies.\r\n- Professionals in organisations like DEFRA, where tracking changes, adding new documents, or finding similar records are critical tasks.\r\n- Python enthusiasts interested in cutting-edge NLP techniques for extracting meaningful insights.\r\n\r\n[1] R. Biesbroek, S. Badloe, and I. Athanasiadis. Machine learning for research on cli-\r\nmate change adaptation policy integration: an exploratory uk case study. Regional\r\nEnvironmental Change, 20, 07 2020.\r\n\r\n[2] https://pypi.org/project/gensim/\r\n[3] https://github.com/MilaNLProc/contextualized-topic-models\r\n[4] https://github.com/ddangelov/Top2Vec\r\n[5] https://maartengr.github.io/BERTopic/index.html\r\n5 https://github.com/MilaNLProc/contextualized-topic-models", "recording_license": "", "do_not_record": false, "persons": [{"code": "GPD9BB", "name": "Dr. Lisa Andreevna Chalaguine", "avatar": "https://pretalx.com/media/avatars/GPD9BB_r7ErLTO.jpeg", "biography": "Lisa is an accomplished educator, researcher, and freelancer specializing in data science, natural language processing (NLP), and artificial intelligence. With a PhD in Intelligent Systems from UCL and a master's from Imperial College London, Lisa has extensive experience in academia and industry, having taught at UCL, and contributed to impactful projects like those with Cancer Research UK.\r\n\r\nA digital nomad at heart, Lisa teaches corporate clients and supervises university students worldwide, focusing on Python, machine learning, and NLP. Known for their engaging teaching style and passion for problem-solving, they are currently developing innovative courses and creating a YouTube channel featuring masterclasses on data analysis and machine learning.\r\n\r\nDriven by a love for teaching, research, and helping others succeed, Lisa is exploring opportunities to return to academia, with aspirations to lecture in Eastern Europe and Central Asia. Multilingual and versatile, they are shaping the future of data science education while continuing to inspire learners globally.", "public_name": "Dr. Lisa Andreevna Chalaguine", "guid": "37f5a5d6-9e5d-5f89-a105-faac5f2d67dd", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/GPD9BB/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/BJKSGK/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/BJKSGK/", "attachments": []}, {"guid": "320e467d-0aef-58c9-9a08-32cf5c12a14b", "code": "J8FLDN", "id": 61288, "logo": null, "date": "2025-04-24T16:15:00+02:00", "start": "16:15", "duration": "00:30", "room": "Hassium", "slug": "pyconde-pydata-2025-61288-conquering-the-queue-lessons-from-processing-one-billion-celery-tasks", "url": "https://pretalx.com/pyconde-pydata-2025/talk/J8FLDN/", "title": "Conquering the Queue: Lessons from processing one billion Celery tasks", "subtitle": "", "track": "PyCon: Django & Web", "type": "Talk", "language": "en", "abstract": "At Userlike, Celery is the backbone of our application, orchestrating over a 100 million tasks per month. In this talk, I\u2019ll share real-world insights into scaling Celery, optimizing performance, avoiding common pitfalls, handling failures, and building a resilient architecture.", "description": "At Userlike, Celery plays a critical role as the backbone of our Django-based SaaS application, orchestrating over 100 million tasks per month with speed, reliability, and precision. In this talk, I\u2019ll share the lessons we\u2019ve learned while scaling Celery to handle massive workloads and support the needs of a growing user base. From optimizing performance and avoiding common pitfalls to handling failures gracefully and ensuring a resilient architecture, this session will provide actionable insights for developers and architects working with distributed task queues.\r\n\r\nWhether you\u2019re just starting with Celery or looking to scale an established system, you\u2019ll walk away with practical tips, battle-tested strategies, and a deeper understanding of how to harness Celery\u2019s full potential in real-world scenarios.\r\n\r\nOutline:\r\n\r\n\u2022\tIntroduction: Why Userlike needs a task queue, and why you need one too\r\n\u2022\tFundamental concepts: latency, throughput, failure modes\r\n\u2022\tOptimizing Performance: Strategies for faster and more efficient task execution\r\n\u2022\tAvoiding Pitfalls: Common mistakes and how to mitigate them\r\n\u2022\tHandling Failures: Building fault-tolerant workflows and monitoring systems\r\n\u2022\tResilient Architecture: Designing for reliability and scalability\r\n\u2022\tKey Takeaways: Practical tips for implementing and scaling Celery in your own projects\r\n\r\nThis talk is designed to be technical, engaging, and packed with real-world experiences to help you conquer the queue in your own applications.", "recording_license": "", "do_not_record": false, "persons": [{"code": "WV3US9", "name": "Daniel Hepper", "avatar": "https://pretalx.com/media/avatars/WV3US9_uWb9vIx.JPG", "biography": "Daniel is the CTO at Userlike, a leading SaaS company providing innovative customer communication solutions. He has a degree in Computer Science from the University of Karlsruhe and has been writing software professionally for over 20 years. He enjoys sharing his experiences and helping fellow developers level up their software development skills, and presented before at various Django and Python conferences throughout Europe. When not in front of a keyboard, he can be found training for his next marathon or building intricate Lego contraptions with his son.", "public_name": "Daniel Hepper", "guid": "bf7d856c-fdd2-5da7-9bad-bb605386c68c", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/WV3US9/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/J8FLDN/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/J8FLDN/", "attachments": []}, {"guid": "33fb6a30-23f8-58da-809a-b3936448472b", "code": "UCG9AS", "id": 61776, "logo": null, "date": "2025-04-24T16:55:00+02:00", "start": "16:55", "duration": "00:45", "room": "Hassium", "slug": "pyconde-pydata-2025-61776-from-like-to-love-adding-proper-search-to-your-django-apps", "url": "https://pretalx.com/pyconde-pydata-2025/talk/UCG9AS/", "title": "From LIKE to Love: Adding Proper Search to Your Django Apps", "subtitle": "", "track": "PyCon: Django & Web", "type": "Talk (long)", "language": "en", "abstract": "Is your Django application still relying on SQL LIKE queries for search? In this talk, we'll explore why basic text matching falls short of modern user expectations and how to implement proper search functionality without complexity. We'll introduce django-semantic-search, a practical package that bridges the gap between Django's ORM and powerful semantic search capabilities. Through practical code examples and real-world use cases, you'll learn how to enhance your application's search experience from basic keyword matching to understanding user intent. Whether you're building a content platform, e-commerce site, or internal tool, you'll walk away with concrete steps to implement production-ready search that your users will actually enjoy using.", "description": "Introduction (5 minutes)\r\n1. The state of search in Django applications today\r\n2. Common patterns and their limitations\r\n3. Real costs of poor search functionality\r\n4. Why search is often an afterthought in Django apps\r\n\r\nThe Search Landscape (10 minutes)\r\n1. Review of Django's built-in search capabilities\r\n2. Performance implications of basic text matching\r\n3. Field lookups and their limitations\r\n4. PostgreSQL-specific features\r\n5. Popular search solutions in the Django ecosystem\r\n6. Trade-offs between complexity and functionality\r\n\r\nWhy Search Matters (10 minutes)\r\n1. User expectations in 2025\r\n2. Common search patterns and user behaviors\r\n3. Impact on user engagement and business metrics\r\n4. Natural language queries vs keyword matching\r\n5. Handling imperfect input\r\n6. Context and intent understanding\r\n7. Real-world examples of search improvements\r\n\r\nModern Search Approaches (5 minutes)\r\n1. Key concepts of vector search\r\n2. From keywords to meaning\r\n3. Why embeddings work better than keywords\r\n4. Understanding user intent\r\n5. Relevance beyond exact matches\r\n\r\nPractical Implementation & Best Practices (15 minutes)\r\n1. Introducing django-semantic-search\r\n2. Core concepts and architecture\r\n3. Integration with existing Django models\r\n4. Real-world implementation strategies\r\n5. Handling different content types\r\n6. Performance optimization techniques\r\n7. Common pitfalls and solutions\r\n8. Resource management\r\n9. Query optimization\r\n10. Monitoring and maintaining search quality", "recording_license": "", "do_not_record": false, "persons": [{"code": "H3RSTE", "name": "Kacper \u0141ukawski", "avatar": "https://pretalx.com/media/avatars/H3RSTE_qsyggPK.jpg", "biography": "Software developer and data scientist at heart, with an inclination to teach others. Public speaker, working in DevRel.", "public_name": "Kacper \u0141ukawski", "guid": "235d4c1b-f02c-53a2-9a37-d126b9976e0e", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/H3RSTE/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/UCG9AS/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/UCG9AS/", "attachments": []}], "Palladium": [{"guid": "5c3203d2-8119-5f37-b97a-49ae4d3f01b0", "code": "KCSSJ7", "id": 61793, "logo": null, "date": "2025-04-24T10:15:00+02:00", "start": "10:15", "duration": "00:30", "room": "Palladium", "slug": "pyconde-pydata-2025-61793-multi-tenant-conversational-analytics", "url": "https://pretalx.com/pyconde-pydata-2025/talk/KCSSJ7/", "title": "Multi-tenant Conversational Analytics", "subtitle": "", "track": "PyData: Natural Language Processing & Audio (incl. Generative AI NLP)", "type": "Talk", "language": "en", "abstract": "Ever wondered how to use GenAI to enable self-service analytics through prompting? In this talk, I will share my experience of building a multi-tenant conversational analytics set-up that is built into a Software-as-a-Service (SaaS) platform. This talk is intended for AI engineers, data scientists, software engineers and anyone interested in using GenAI to power conversational analytics using open-source tools. \r\n\r\nI will discuss the challenges faced in designing and implementing, as well as the lessons learned along the way. We'll answer questions such as, why offer analytics through prompting? Why multi-tenancy and makes it so difficult? How to build it into an existing product? What makes open-source the preferred choice over proprietary solutions? What could the implications be for the analytics field?", "description": "This talk will start by answering the question: What is conversational analytics and how does it work? After which we'll dive into why this was built and how the implementation was done. \r\n\r\n* How analytics in SaaS can be fundamentally improved by conversational analytics (5 mins). \r\n* How the Text-to-SQL fundament was shaped using RAG with Embeddings in PGVector (5 mins). \r\n* Dealing with multi-tenancy in PostgreSQL and BigQuery to ensure data segregation & security (5 mins). \r\n* How to handle tenant specific pre-training and training examples (5 mins). \r\n* Building this into an existing application and supporting integrations (5 minutes). \r\n* Conclusion and thoughts on the implications for the field of analytics (5 mins). \r\n\r\nIn the end you should have a good idea on why conversational analytics can be a game changer, what the pitfalls are and how to build it with open source technologies.", "recording_license": "", "do_not_record": false, "persons": [{"code": "97XB7E", "name": "Rodel van Rooijen", "avatar": "https://pretalx.com/media/avatars/97XB7E_4d88FIl.png", "biography": "I speak & write about my experiences in the world of data & AI. This comes from the perspective of having worked across data science, data engineering and ML engineering in start-ups, scale-ups and enterprises.", "public_name": "Rodel van Rooijen", "guid": "f33b85b3-5e98-58c9-9c90-54ade9f59478", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/97XB7E/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/KCSSJ7/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/KCSSJ7/", "attachments": []}, {"guid": "10f7c9db-9fb5-5ce6-943f-c415219342bd", "code": "3WLDMQ", "id": 61125, "logo": null, "date": "2025-04-24T10:55:00+02:00", "start": "10:55", "duration": "00:30", "room": "Palladium", "slug": "pyconde-pydata-2025-61125-navigating-the-security-maze-an-interactive-adventure", "url": "https://pretalx.com/pyconde-pydata-2025/talk/3WLDMQ/", "title": "Navigating the Security Maze: An Interactive Adventure", "subtitle": "", "track": "PyCon: Security", "type": "Talk", "language": "en", "abstract": "How to integrate security into a software development project? Without jeopardizing timeline or budget?  You decide! \r\nThis interactive session covers crucial decisions for software security, and the audience decides how the story ends...", "description": "Although DevSecOps has been a trend topic for years, it is still far from being a solved problem.\r\n\r\nThis interactive session brings the challenges of security in the development process to life: Participants are confronted with several scenarios from everyday project work and their decisions help shape the further course of the presentation. They have to reconcile security requirements with budget, development speed and user-friendliness and bring the project safely from the idea to live operation.\r\n\r\nThe session covers the entire development process, but each run is different as the audience decides the course of the story via online-voting: How to proceed with the development project and think about security at the same time?", "recording_license": "", "do_not_record": false, "persons": [{"code": "DS3TQU", "name": "Clemens H\u00fcbner", "avatar": "https://pretalx.com/media/avatars/DS3TQU_q2Okxq8.jpg", "biography": "For more than ten years, Clemens H\u00fcbner has been working at the interface between software and security. After roles as a software developer and in penetration testing, he joined inovex in 2018 as a software security engineer. Today, he supports development projects at the conception and implementation level and is a trainer both in-house and for clients. He advises on secure development processes and DevSecOps. As speaker, he is invited to national and international conferences.", "public_name": "Clemens H\u00fcbner", "guid": "2211c8c4-18d2-5bc3-8719-afbfb0213fe8", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/DS3TQU/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/3WLDMQ/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/3WLDMQ/", "attachments": []}, {"guid": "c7a04d2e-21de-54f3-b60c-73ce54bc81c6", "code": "UGTB7A", "id": 61891, "logo": null, "date": "2025-04-24T11:35:00+02:00", "start": "11:35", "duration": "00:30", "room": "Palladium", "slug": "pyconde-pydata-2025-61891-securing-generative-ai-essential-threat-modeling-techniques", "url": "https://pretalx.com/pyconde-pydata-2025/talk/UGTB7A/", "title": "Securing Generative AI: Essential Threat Modeling Techniques", "subtitle": "", "track": "PyData: Generative AI", "type": "Talk", "language": "en", "abstract": "Generative AI development introduces unique security challenges that traditional methods often overlook. This talk explores practical threat modeling techniques tailored for AI practitioners, focusing on real-world scenarios encountered in daily development. Through relatable examples and demonstrations, attendees will learn to identify and mitigate common vulnerabilities in AI systems. The session covers user-friendly security tools and best practices specifically designed for AI development. By the end, participants will have practical strategies to enhance the security of their AI applications, regardless of their prior security expertise.", "description": "1. Introduction\r\n    * Motivation\r\n    * What can go wrong\r\n2. Generative AI vs Traditional Applications\r\n    * Key differences in security considerations\r\n    * Unique challenges posed by generative AI\r\n3. Threat Modeling Basics and AI-Specific Threats \r\n    * Threat modeling frameworks\r\n    * Focus on prompt injection and data poisoning\r\n    * Example: Simple prompt injection attempt\r\n4. Practical Threat Modeling Process\r\n    * Simplified system decomposition example\r\n    * Threat identification walkthrough\r\n5.  Example: Input Validation\r\n6. Tools Showcase and Mitigation Strategies\r\n7. Conclusion and Resources\r\n    * Recap key takeaways\r\n    * List of recommended tools and further reading", "recording_license": "", "do_not_record": true, "persons": [{"code": "PDSUUK", "name": "Elizaveta Zinovyeva", "avatar": "https://pretalx.com/media/avatars/PDSUUK_XK8lPmS.jpg", "biography": "I am Liza - Applied Scientist at AWS Generative AI Innovation Center and am based in Berlin. I am passionate about AI/ML, finance and software security topics. In my spare time, I enjoy spending time with my family, sports, learning new technologies, and table quizzes.", "public_name": "Elizaveta Zinovyeva", "guid": "5fd37410-ae53-58c5-a4ee-39ecfe6ed87e", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/PDSUUK/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/UGTB7A/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/UGTB7A/", "attachments": []}, {"guid": "fde53224-eae9-5f17-869a-4f7628d1076e", "code": "AYN837", "id": 61377, "logo": null, "date": "2025-04-24T14:20:00+02:00", "start": "14:20", "duration": "00:30", "room": "Palladium", "slug": "pyconde-pydata-2025-61377-machine-reasoning-and-system-2-thinking", "url": "https://pretalx.com/pyconde-pydata-2025/talk/AYN837/", "title": "Machine Reasoning and System 2 Thinking", "subtitle": "", "track": "PyData: Generative AI", "type": "Talk", "language": "en", "abstract": "Raw large language models struggle with complex reasoning. New techniques have\r\nemerged that allow these models to spend more time thinking before giving an answer.\r\nDirect token sampling can be seen as system-1 thinking and explicit step-by-step\r\nreasoning as system-2. How can this reasoning ability be improved and what is the future?", "description": "Basic large language models struggle with complex reasoning. New techniques, broadly referred to as \"test time compute\" have emerged that allow these models to spend more time processing before giving an answer. Direct token sampling can be seen as analogous to system-1 thinking and explicit step-by-step reasoning as system-2. Many top AI researchers and companes are now working on building system-2 into AI systems to improve general reasoning.\r\n\r\nWe will review the newest open research on test time computation including promising techniques that have appeared in top entries for Fran\u00e7ois Chollet's ARC-AGI challenge. While OpenAI has shamefully kept the research behind their o1, o3 and o-N models secret, other researchers have worked in public, demonstrating how to use test time compute to greatly boost model performance with the right fine-tuning and test time procedures.\r\n\r\nThis talk will explore the latest developments in the rapidly developing area of system-2 AI reasoning, the engine behind the only significant gains in LLM performance recently. Giving LLMs system-2 like capabilities improves problem solving, code generation quality and reduces hallucinations, get up to speed on research behind these techniques.", "recording_license": "", "do_not_record": false, "persons": [{"code": "Y3GHEB", "name": "Andy Kitchen", "avatar": "https://pretalx.com/media/avatars/Y3GHEB_WTE5TdV.jpg", "biography": "Andy Kitchen is a AI/neuroscience researcher, startup founder, and all-around hacker. He co-founded Cortical Labs where the team taught live brain cells to play pong. He's still trying to figure out how to catch the ghost in the machine.", "public_name": "Andy Kitchen", "guid": "3d850e4c-f6e5-588b-8700-0a4a6f2e6242", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/Y3GHEB/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/AYN837/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/AYN837/", "attachments": []}, {"guid": "122598cd-07ad-5512-8450-c132e3aac274", "code": "GRWYQB", "id": 61181, "logo": null, "date": "2025-04-24T15:00:00+02:00", "start": "15:00", "duration": "00:45", "room": "Palladium", "slug": "pyconde-pydata-2025-61181-securing-rag-pipelines-with-fine-grained-authorization", "url": "https://pretalx.com/pyconde-pydata-2025/talk/GRWYQB/", "title": "Securing RAG Pipelines with Fine Grained Authorization", "subtitle": "", "track": "PyData: Generative AI", "type": "Talk (long)", "language": "en", "abstract": "Using LLMs and AI in your Enterprise? Make sure you build Fine Grained Authorization to ensure your LLMs access only the data they are authorized to. \r\n\r\nThis talk will show how you can build Relationship Based Access Control (ReBAC) for fine-grained authorization for your RAG pipelines. The talk also includes a demo using Pinecone, Langchain, OpenAI, and SpiceDB.", "description": "Building enterprise-ready AI requires ensuring users can only augment prompts with data they're authorized to access. Relationship-based access control (ReBAC) is particularly well-suited for fine-grained authorization in Retrieval-Augmented Generation (RAG) because it makes decisions based on relationships between objects, offering more precise control compared to traditional models like RBAC and ABAC.\r\n\r\nThis talk covers how ReBAC systems can safeguard sensitive data in RAG pipelines. We'll start with why Authorization is critical for RAG pipelines, and how Google Zanzibar achieves this with ReBAC. We'll then illustrate how pre-filtering vector database queries with a list of authorized object IDs can improve efficiency & security. \r\n\r\nThe talk will also include a demo implementing fine-grained authorization for RAG using Pinecone, Langchain, OpenAI, and SpiceDB.", "recording_license": "", "do_not_record": false, "persons": [{"code": "XH7YNC", "name": "Sohan Maheshwar", "avatar": "https://pretalx.com/media/avatars/XH7YNC_5DPc3lg.jpeg", "biography": "Sohan is a Lead Developer Advocate at AuthZed, based in the Netherlands. He started his career as a developer building mobile apps and has worked in the developer relations space since 2013, in companies such as Amazon, Fermyon and Gupshup. He has always been interested in emerging technologies and how it shapes the world around us.\r\n\r\nHis interests outside work include visual arts, trivia, and playing frisbee.", "public_name": "Sohan Maheshwar", "guid": "0ffa3ab7-dc1d-5977-add5-bec02d32b68a", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/XH7YNC/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/GRWYQB/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/GRWYQB/", "attachments": []}, {"guid": "b6d43d92-6083-5a20-9408-8972c61ee34a", "code": "CTUEJX", "id": 59280, "logo": null, "date": "2025-04-24T16:15:00+02:00", "start": "16:15", "duration": "00:30", "room": "Palladium", "slug": "pyconde-pydata-2025-59280-streaming-at-30-000-feet-a-real-time-journey-from-apis-to-stream-processing", "url": "https://pretalx.com/pyconde-pydata-2025/talk/CTUEJX/", "title": "Streaming at 30,000 Feet: A Real-Time Journey from APIs to Stream Processing", "subtitle": "", "track": "PyCon: Programming & Software Engineering", "type": "Talk", "language": "en", "abstract": "Traditional API architectures face significant challenges in environments where repetitive and frequent requests are required to retrieve data updates. These request-response mechanisms introduce latency, as clients must continually query the server to check for changes, often receiving redundant or outdated information. This approach leads to increased network overhead, inefficient use of server resources and diminished scalability as the number of clients or requests grows. Additionally, frequent requests expand the attack surface, requiring security measures to mitigate risks such as (un-)authorised access, rate limiting and query sanitisation. Managing all of these inherent problem results in increasingly complex systems to maintain and improve while putting considerable implementation effort onto the customer.\r\nJoin to find out how transitioning to a streaming architecture can address these issues by providing proactive, event-based data delivery, reducing latency, minimising redundant processing, enhancing scalability and simplifying security management.", "description": "In this talk we will go over which benefits, drawbacks and lessons Airbus has encountered/learned in the switch from an API to a Python based Stream Architecture for continuous flight traffic prediction. The Goal is to highlight Stream based architectures as a architectural alternative and allow the attendees to decide if it could be a alternative to their current API based Setup worthwhile to look into.\r\n\r\nThe talk addresses the inefficiencies and limitations of traditional API-based architectures for real-time data delivery. Specifically, it explores challenges such as high latency, network overhead, customer effort, and scalability issues when APIs rely on polling mechanisms. These issues became apparent at Airbus during a project as customer needs evolved, highlighting the shortcomings of APIs in handling real-time updates effectively. This story of the project will aid as an example on how to identify a limiting architectural decision and what pain points can potentially be avoided by taking a new route guiding us along the talk.\r\n\r\nFor developers and architects building modern, data-driven applications, choosing the right architecture is critical. Many face similar challenges when scaling APIs for real-time use cases, such as IoT, financial data, or notifications. This problem is relevant because adopting an unsuitable architecture can lead to poor performance, higher costs, and frustrated users.\r\n\r\nThe proposed solution is to transition from an API-based architecture to a streaming architecture for real-time data delivery. This involves leveraging stream processing systems that push updates proactively, handle high-throughput data efficiently, and offer features like backpressure, partitioning, and stateful processing. Recent developments in Python based tools such as **Bytewax**, **Faust** and **Quix** are highlighted for their scalability and fault-tolerance capabilities.\r\n\r\n### Key Takeaways\r\n1. **Challenges of APIs**: Polling APIs is inefficient for real-time updates, leading to delays, resource wastage, and customer dissatisfaction.\r\n2. **Advantages of Streaming**: Streaming architectures offer real-time data delivery, lower latency, reduced customer effort, better scalability, and improved fault tolerance.\r\n3. **Key Streaming Concepts**: Understanding backpressure, partitioning, and stateful processing is essential for understanding streaming specific limitations and solutions.\r\n4. **Architectural Considerations**: Streaming is ideal for use cases where data changes frequently and needs to be delivered in real time, while APIs may still be suitable for low-frequency, static, or manual queries.\r\n5. **Strategic Transition**: Adopting a streaming approach requires a paradigm shift in thinking about how data is delivered and processed, with significant changes to the system architecture which needs to be cautiously managed.", "recording_license": "", "do_not_record": true, "persons": [{"code": "TVU8C7", "name": "Felix Leon Buck", "avatar": null, "biography": "Senior Data Scientist and Engineer at Airbus, leading the development of flight plan optimisation and continuous air traffic prediction services based on time series forecasting with ML and physics models.", "public_name": "Felix Leon Buck", "guid": "93d5e4c7-dc32-5bf1-a668-8b8012a1ca9e", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/TVU8C7/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/CTUEJX/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/CTUEJX/", "attachments": []}, {"guid": "5cf856fc-9a22-5945-aec2-a18b865476a0", "code": "9NFHAS", "id": 61739, "logo": null, "date": "2025-04-24T16:55:00+02:00", "start": "16:55", "duration": "00:45", "room": "Palladium", "slug": "pyconde-pydata-2025-61739-transformers-for-game-log-data", "url": "https://pretalx.com/pyconde-pydata-2025/talk/9NFHAS/", "title": "Transformers for Game Log Data", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Statistics", "type": "Talk (long)", "language": "en", "abstract": "The Transformer architecture, originally designed for machine translation, has revolutionized deep learning with applications in natural language processing, computer vision, and time series forecasting. Recently, its capabilities have extended to sequence-to-sequence tasks involving log data, such as telemetric event data from computer games.\r\n\r\nThis talk demonstrates how to apply a Transformer-based model to game log data, showcasing its potential for sequence prediction and representation learning. Attendees will gain insights into implementing a simple Transformer in Python, optimizing it through hyperparameter tuning, architectural adjustments, and defining an appropriate vocabulary for game logs.\r\n\r\nReal-world applications, including clustering and user level predictions, will be explored using a dataset of over 175 million events from an MMORPG. The talk will conclude with a discussion of the model's performance, computational requirements, and future opportunities for this approach.", "description": "The paper[1] introducing the Transformer architecture has been cited almost 150k times. By now, this deep learning architecture has been used for a large number of use cases. Obviously, language generation and large language models are among the most prominent use cases. However, the architecture has also been successfully employed to solve problems in computer vision and to forecast time series data to name only a few other examples.\r\n\r\nAt its core, the Transformer architecture is a deep neural network designed for sequence-to-sequence prediction tasks. E.g., mapping a sequence of words in one language to a sequence of words in another language as it is done in machine translation tasks. This architecture has recently gained attention for another application well-suited to sequence-to-sequence mapping: the analysis of telemetric log data from games[2]. While log data from games is one specific area that has been explored lately, this approach generally works for log data in other domains arising from websites or mobile apps.\r\n\r\nIn this talk, I will walk the audience through a simple Transformer architecture in Python that can be used to train a model on game log data. I will discuss the challenges of constructing a vocabulary and tokenizer based on log data. Unlike language data, game logs often contain structured events with properties, making vocabulary design non-trivial. I will highlight design choices in the model construction to balance the predictive power of the model and computational efficiency. This includes hyper-parameter selection for the model (e.g., embedding size, number of layers, etc.) and the training procedure (e.g., batch size, learning rate, etc.). I will also explain how to adapt the Transformer architecture to handle long sequences of log data efficiently, including architectural changes to the basic network.\r\n\r\nI will demonstrate how representations derived from the model can be applied to various use cases, such as clustering and prediction tasks arising in game data science. Typical prediction tasks in game data science are survival time prediction for regression or purchase prediction for classification. Insights from clustering or player level predictions can help to improve retention or optimize monetization models. To evaluate the effectiveness of this approach, I trained multiple models on a publicly available 100GB game log dataset containing over 175 million events from NCSOFT\u2019s MMORPG Blade and Soul. In addition to presenting qualitative results, I will compare the computational resources and hardware requirements of this method to those of a simple baseline algorithm. \r\n\r\nBy the end of the talk, attendees will gain actionable insights into building and training Transformers for log data, equipping them to tackle similar challenges in their own domains.\r\n\r\nTentative agenda of the talk:\r\n5  - Intro\r\n5  - Review of the Transformer architecture and its usage in GPT\r\n10 - Adjusting the architecture to game log data\r\n10 - Training of different models\r\n10 - Obtaining player representations from the models for clustering and prediction tasks\r\n5  - Outlook & Conclusion\r\n\r\n[1] \u201cAttention is all you need\u201d, Vaswani et al., 2017\r\n[2] \u201cplayer2vec: A Language Modeling Approach to Understand Player Behavior in Games\u201d, Wang et al., 2024\r\n[3] \u201cGame Data Mining Competition on Churn Prediction and Survival Analysis using Commercial Game Log Data\u201d, Lee et al., 2018", "recording_license": "", "do_not_record": false, "persons": [{"code": "RLAACQ", "name": "Fabian Hadiji", "avatar": "https://pretalx.com/media/avatars/RLAACQ_2HwxJlQ.jpg", "biography": "Fabian combines his passion for data, machine learning, and computer games with his professional activities. In addition to his role as Head of Business Intelligence at Lotum, a mobile game publisher, he also lectures at TH K\u00f6ln, where he leads a project group focused on game data science. Additionally, Fabian co-organizes the Cologne AI and Machine Learning Meetup (CAIML), hosting bi-monthly events that bring together the local AI and ML community.", "public_name": "Fabian Hadiji", "guid": "13e91212-ad2c-5653-8e91-177ba3cf38d3", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/RLAACQ/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/9NFHAS/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/9NFHAS/", "attachments": []}], "Ferrum": [{"guid": "dab4b13f-cc62-5edd-8819-131e72bcdc32", "code": "TMBTYH", "id": 65811, "logo": null, "date": "2025-04-24T10:15:00+02:00", "start": "10:15", "duration": "01:30", "room": "Ferrum", "slug": "pyconde-pydata-2025-65811-baybe-a-bayesian-back-end-for-experimental-planning-in-the-low-to-no-data-regime", "url": "https://pretalx.com/pyconde-pydata-2025/talk/TMBTYH/", "title": "BayBE: A Bayesian Back End for Experimental Planning in the Low-To-No-Data Regime", "subtitle": "", "track": "PyData: PyData & Scientific Libraries Stack", "type": "Sponsored Talk (Keystone)", "language": "en", "abstract": "From coffee machine settings to chemical reactions to website AB testing - iterative make-test-learn cycles are ubiquitous. The [Bayesian Back End](https://emdgroup.github.io/baybe/stable/) (BayBE) is an open-source experimental planner enabling users to smartly navigate such black-box optimization problems in iterative settings. This tutorial will i) introduce the core concepts enabled by combining Bayesian optimization and machine learning; ii) explain our software design choices, robust tests and open-source libraries this is built on; and iii) provide a short practical hands-on session.", "description": "In the evolving landscape of data science, advanced computational tools are crucial for driving innovation and efficiency. This tutorial introduces the [Bayesian Back End](https://emdgroup.github.io/baybe/stable/) (BayBE), an AI-assisted open-source experimental planner developed by [Merck KGaA](https://www.merckgroup.com/en), which utilizes Bayesian Optimization and machine learning to smartly streamline experimental workflows in the low-to-no-date regime. From chemical reactions to biological assays to coffee machine settings - with BayBE users can find optimal configurations in an iterative manner, which is anyway the main working mode of many experimentalists.\r\n\r\nWe will start the first part with a brief introduction to Bayesian Optimization, highlighting its principles and advantages in experimental design. Following this, we will showcase BayBE's unique features, including elegant categorical encodings and advanced capabilities like active learning, transfer learning or Pareto optimization.\r\n\r\nIn the second part, we explain some of our code and test design choices that went into the open-source Python package [`baybe`](https://github.com/emdgroup/baybe). This will include learnings about our built-in (de-)serialization engine, CI/CD, advanced hypothesis tests, autodocumentation and open-source tools BayBE is built on.\r\n\r\nThe final part will comprise of a hands-on tutorial. We will look at representative problems and guide potential users from formalization of the problem to performing the iterative loop to analyzing the results including an assessment of parameter relevance. The tutorials can be accessed [here](https://github.com/emdgroup/baybe-resources).", "recording_license": "", "do_not_record": false, "persons": [{"code": "9AMQEB", "name": "Martin Fitzner", "avatar": "https://pretalx.com/media/avatars/9AMQEB_odMKCO7.jpg", "biography": "Principal Data Scientist at Merck KGaA Darmstadt, Germany\r\nInterested in combining machine learning, data science, computational natural science, and cheminformatics.", "public_name": "Martin Fitzner", "guid": "252391c1-4f04-5176-b1b5-fdbd557c7f51", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/9AMQEB/"}, {"code": "ZLLGER", "name": "Alexander Hopp", "avatar": "https://pretalx.com/media/avatars/ZLLGER_mAa8fYb.jpg", "biography": "Mathematician who got into coding and enjoys it way too much. One of the three core developers of BayBE, the Bayesian Optimization Package developed at Merck KGaA, Darmstadt. Also working on antibody and retrosynthesis projects.\r\n\r\nInterested in everything the intersection between mathematics and computer science has to offer, as well as in best practices for coding. Always curious to learn!", "public_name": "Alexander Hopp", "guid": "ae7dfcdc-a474-53df-8505-2836fbdbed11", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/ZLLGER/"}, {"code": "LSUVUD", "name": "Adrian \u0160o\u0161i\u0107", "avatar": "https://pretalx.com/media/avatars/LSUVUD_d1ZkK77.jpg", "biography": "Lead Data Scientist at Merck Life Science KGaA, Darmstadt, Germany\r\nMachine Learning and Probabilistic Modeling", "public_name": "Adrian \u0160o\u0161i\u0107", "guid": "371cd5a4-cd16-5fe6-94e9-93cbfd97f52d", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/LSUVUD/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/TMBTYH/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/TMBTYH/", "attachments": []}, {"guid": "53165332-3247-52ca-b9e0-86e76e469305", "code": "C3RVM3", "id": 61900, "logo": null, "date": "2025-04-24T14:20:00+02:00", "start": "14:20", "duration": "01:30", "room": "Ferrum", "slug": "pyconde-pydata-2025-61900-unlocking-the-predictive-power-of-relational-data-with-automated-feature-engineering", "url": "https://pretalx.com/pyconde-pydata-2025/talk/C3RVM3/", "title": "Unlocking the Predictive Power of Relational Data with Automated Feature Engineering", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Statistics", "type": "Tutorial", "language": "en", "abstract": "Relational data can be a goldmine for classical Machine Learning applications \u2014 yet extracting useful features from multiple tables, time windows, and primary-foreign key relationships is notoriously difficult. In this code tutorial, we\u2019ll use the H&M Fashion dataset to demonstrate how\u00a0getML\u00a0FastProp automates feature engineering for both classification (churn prediction) and regression (sales prediction) with minimal manual effort, outperforming both\u00a0Relational Deep Learning\u00a0and a skilled\u00a0human data scientist according to the RelBench leaderboard.\r\n\r\nThis code tutorial is perfect for data scientists looking to leverage their relational and time-series data data effectively for any kind of predictive analytics applications.", "description": "This tutorial tackles a common pain point in data science \u2013 extracting useful features from relational data spread across multiple interconnected tables. Manually crafting these features is often tedious, error-prone, and heavily reliant on domain expertise.\r\n\r\nWhy is this important? Relational data powers industries from e-commerce and healthcare to finance. Yet, building predictive models on such datasets often involves laborious feature engineering. getML FastProp \u2013 the fastest open-source algorithm for automated feature engineering \u2013 streamlines this process, helping data scientists move faster and build better models.\r\n\r\nIn this hands-on tutorial, we\u2019ll work through two tasks from Stanford\u2019s Relational Learning Benchmark (RelBench) using the H&M Fashion dataset: 1) Predict customer churn with a classification model, 2) Forecast item sales using regression model.\r\n\r\nWe\u2019ll walk through the code and concepts needed to solve these tasks with getML FastProp, achieving state-of-the-art performance and outperforming both Relational Deep Learning models and an experienced human data scientist.\r\n\r\nBy the end of this tutorial, you'll learn how to:\r\n- Understand relational learning \u2013 Grasp the core challenges and concepts of working with multi-table datasets.\r\n- Reproduce results \u2013 Run the provided notebooks and code to reproduce the results at your own pace.\r\n- Automate feature engineering \u2013 Use getML\u2019s FastProp to extract features directly from relational data.\r\n- Build and optimize getML pipelines \u2013 Develop pipelines for both classification and regression tasks.\r\n- Integrate into MLOps workflows \u2013 Leverage getML alongside LightGBM and Optuna.\r\n\r\nThis tutorial provides a practical, reproducible framework for working with relational and time-series data, applicable across industries and domains.", "recording_license": "", "do_not_record": false, "persons": [{"code": "WP7K8V", "name": "Alexander Uhlig", "avatar": null, "biography": "Alexander Uhlig is the CEO of Code17, the company behind getML. With a background in Physics, he leads the development of getML and has worked hands-on with data teams to build prediction models across various domains, including healthcare, trading, and e-commerce.", "public_name": "Alexander Uhlig", "guid": "06015002-6fab-5deb-866f-6b7f977631be", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/WP7K8V/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/C3RVM3/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/C3RVM3/", "attachments": []}, {"guid": "a455d1e0-e320-5105-a24b-b636533dd391", "code": "PDBAXQ", "id": 59479, "logo": null, "date": "2025-04-24T16:15:00+02:00", "start": "16:15", "duration": "01:30", "room": "Ferrum", "slug": "pyconde-pydata-2025-59479-pytest-simple-rapid-and-fun-testing-with-python", "url": "https://pretalx.com/pyconde-pydata-2025/talk/PDBAXQ/", "title": "pytest - simple, rapid and fun testing with Python", "subtitle": "", "track": "PyCon: Testing", "type": "Tutorial", "language": "en", "abstract": "The pytest tool offers a rapid and simple way to write tests for your Python code. This training gives an introduction with exercises to some distinguishing features, such as its assertions, marks and fixtures.\r\n\r\nDespite its simplicity, pytest is incredibly flexible and configurable. We'll look at various configuration options as well as the plugin ecosystem around pytest.", "description": "# Preparation and Repository\r\n\r\nSee [The-Compiler/pytest-basics](https://github.com/The-Compiler/pytest-basics) on GitHub for exercise code and preparation steps. Please make sure you have at least a virtualenv with `pytest` (or the full `requirements.txt` in the repo) set up and the code cloned before the training starts, so that we don't lose any time with the boring setup parts.\r\n\r\nSee the README for detailed setup instructions.\r\n\r\n# Schedule\r\n\r\n- (25 minutes) **pytest feature walkthrough:**\r\n    * Automatic test discovery\r\n    * Assertions without boilerplate via the assert statement\r\n    * Configuration and commandline options\r\n    * Marking and skipping tests\r\n    * Data-driven tests via parametrization\r\n    * Exercises\r\n\r\n- (60 minutes) **pytest fixture mechanism:**\r\n    * Setup and teardown via dependency injection\r\n    * Declaring and using function/module/session scoped fixtures\r\n    * Using fixtures from fixture functions\r\n    * Parametrizing fixtures\r\n    * Looking at useful built-in fixtures (managing temporary files, patching, output capturing)\r\n    * Exercises\r\n\r\n- (5 minutes) **Where to go next:**\r\n    * Useful CLI arguments to deal with failing tests\r\n    * Overview of the plugin ecosystem around pytest", "recording_license": "", "do_not_record": false, "persons": [{"code": "BPA78X", "name": "Florian Bruhin", "avatar": "https://pretalx.com/media/avatars/BPA78X_7lU3SPR.jpg", "biography": null, "public_name": "Florian Bruhin", "guid": "97323304-0f6e-5496-a41f-38e84991e7ca", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/BPA78X/"}], "links": [{"title": "Slides", "url": "https://raw.githubusercontent.com/The-Compiler/pytest-basics/main/pytest-basics.pdf", "type": "related"}], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/PDBAXQ/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/PDBAXQ/", "attachments": []}], "Dynamicum": [{"guid": "e0323428-512f-549f-9518-4ace86018a9e", "code": "XB8VG7", "id": 67741, "logo": null, "date": "2025-04-24T10:15:00+02:00", "start": "10:15", "duration": "01:30", "room": "Dynamicum", "slug": "pyconde-pydata-2025-67741-career-path-experience-stories", "url": "https://pretalx.com/pyconde-pydata-2025/talk/XB8VG7/", "title": "Career Path Experience Stories", "subtitle": "", "track": "General: Education, Career & Life", "type": "Tutorial", "language": "en", "abstract": "As part of the PyConDE & PyData 2025 Conference, we would like to present an initiative aimed primarily at students and those just starting their careers in computer science. Our goal is to showcase the diverse career paths possible and break some myths about typical job skills and responsibilities relevant, so as to inspire and encourage their journey.", "description": "Join us for an interactive session where professionals share their diverse tech career journeys. This workshop aims to broaden students' perspectives on the many paths available in tech and beyond. Speakers will share honest insights about their career decisions, skills that proved most valuable, and advice they wish they'd received as students.\r\n\r\nWhy should you be there?\r\n- Hear honest stories about career twists, turns, and triumphs.\r\n- Discover roles you might not have even considered yet.\r\n- See how versatile your current skillset really is.\r\n- Ask YOUR questions to people who've been where you are.\r\n\r\nIf you're curious about the diverse opportunities waiting for you and want to hear firsthand accounts of building unique careers in tech, this workshop is for you.\r\nCome prepared with questions and leave with a clearer vision of the possibilities ahead.", "recording_license": "", "do_not_record": false, "persons": [{"code": "J7LMZG", "name": "Kristina Khvatova", "avatar": null, "biography": null, "public_name": "Kristina Khvatova", "guid": "3af33f27-b013-54de-901b-166665e3b2d8", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/J7LMZG/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/XB8VG7/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/XB8VG7/", "attachments": []}, {"guid": "f64683bb-05c1-5aa6-b678-c73203f1ba3e", "code": "PKZD8L", "id": 61867, "logo": null, "date": "2025-04-24T14:20:00+02:00", "start": "14:20", "duration": "01:30", "room": "Dynamicum", "slug": "pyconde-pydata-2025-61867-ai-agents-of-change-creating-reflecting-and-monetizing", "url": "https://pretalx.com/pyconde-pydata-2025/talk/PKZD8L/", "title": "AI Agents of Change: Creating, Reflecting, and Monetizing", "subtitle": "", "track": "PyData: Generative AI", "type": "Tutorial", "language": "en", "abstract": "Create, reflect, and earn\u2014with purpose. In this workshop, you\u2019ll not only build your own AI agent but also confront the ethical questions it raises, from its impact on jobs to its potential for social good. Together, we\u2019ll explore how to harness AI for empowerment while uncovering pathways to turn your skills into meaningful value.\r\n\r\nThis workshop is designed to equip Python enthusiasts with the tools to create their own AI agent while fostering a deeper understanding of the societal implications of this technology. Through hands-on learning, collaborative discussions, and practical monetization strategies, you\u2019ll leave with more than just code\u2014you\u2019ll gain a vision of how AI can be wielded responsibly and profitably.", "description": "The session unfolds in three engaging parts:\r\n\t1.\tBuild Your AI Agent\r\nStart with the fundamentals of AI by designing and implementing a functional agent. Using Python, we\u2019ll demystify the process and equip you with practical skills for creating an AI that responds to user needs and scenarios.\r\n\t2.\tReflect on Ethics and the Future of Work\r\nOnce your agent comes to life, we\u2019ll pause to examine the bigger picture:\r\n\t\u2022\tHow does the AI agent you have created may reshape the job market?\r\n\t\u2022\tCan it democratize and decentralize opportunities, or does it risk amplifying inequalities?\r\n\t\u2022\tWhat collective vision do we want for the future of work?\r\nThis thought-provoking discussion will challenge you to think critically about the role of technology in fostering empowerment or exacerbating social challenges.\r\n\t3.\tEarn by Sharing Value\r\nFinally, we\u2019ll explore how your AI agent can create real-world value. You\u2019ll learn how to leverage marketplaces like OpenServ to turn your innovation into income. Whether you aim to solve practical problems, inspire creativity, or contribute to ethical AI development, this segment will connect your skills with opportunities for meaningful impact.\r\n\r\nBy the end of the workshop, you\u2019ll have built an AI agent, grappled with its ethical dimensions, and uncovered how to use your coding prowess to create and share value\u2014all while shaping a more inclusive, responsible AI ecosystem.", "recording_license": "", "do_not_record": false, "persons": [{"code": "TSXAJM", "name": "Paloma Oliveira", "avatar": "https://pretalx.com/media/avatars/TSXAJM_QLTKVNh.jpeg", "biography": "I\u2019m a wholehearted explorer and community-driven developer, advocating for FOSS while blending art, technology, and inclusion.", "public_name": "Paloma Oliveira", "guid": "a3b6ba2d-b6e2-536b-9d88-3363a2e6320f", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/TSXAJM/"}, {"code": "NMACLQ", "name": "Tereza Iofciu", "avatar": "https://pretalx.com/media/avatars/NMACLQ_M0SmHO9.jpeg", "biography": "Tereza Iofciu is data leadership coach and  a data practitioner She has more than 15 years of experience in Data Science, Data Engineering, Product Management and Team Management. Alongside that she spent most of those years volunteering in the Python Community and wears many hats: PyLadies Hamburg organizer, Python Software Verband board member, Python Software Foundation Code of Conduct team member, Diversity & Inclusion working group member, PyConDE & PyData Berlin organizer, Python Pizza Hamburg organizer, and PyPodcats co-leader. In 2021 Tereza was awarded the Python Software Foundation community service award.", "public_name": "Tereza Iofciu", "guid": "9f1c4db3-3e40-5e40-a06d-ad540d3a75fc", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/NMACLQ/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/PKZD8L/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/PKZD8L/", "attachments": []}, {"guid": "0b81b7ce-9e9a-5549-a19c-5ed74ceb5427", "code": "9Y9DM8", "id": 61376, "logo": null, "date": "2025-04-24T16:15:00+02:00", "start": "16:15", "duration": "01:30", "room": "Dynamicum", "slug": "pyconde-pydata-2025-61376-the-future-of-ai-training-is-federated", "url": "https://pretalx.com/pyconde-pydata-2025/talk/9Y9DM8/", "title": "The future of AI training is federated", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Statistics", "type": "Tutorial", "language": "en", "abstract": "Since it\u2019s introduction in 2016, Federated Learning (FL) has become a key paradigm to AI models in scenarios when training data cannot leave its source. This applies in many industrial settings where centralizing data is challenging due to a combination of reasons, including but not limited to privacy, legal, and logistics.\r\n\r\nThe main focus of this tutorial is to introduce an alternative approach to training AI models that is straightforward and accessible. We\u2019ll walk you through the basics of an FL system, how to iterate on your workflow and code in a research setting, and finally deploy your code to a production environment. You will learn all of these approaches using a real-world application based on open-sourced datasets, and the open-source federated AI framework, [Flower](https://github.com/adap/flower), which is written in Python and designed for Python users. Throughout the tutorial, you\u2019ll have access to hands-on open-sourced code examples to follow along.", "description": "Federated Learning has quickly become the preferred form of training of AI models when the training data cannot leave their point of origin due to privacy regulations (e.g. GDPR), legal constraints (e.g. in different jurisdictions), and logistical challenges (e.g. large volumes of data, sparse connectivity), among other reasons. Furthermore, contracts and regulations establish boundaries for data sharing, particularly in industries like healthcare and finance, where misuse prevention is crucial. One could also argue that we are [running out of publicly and ethically sourced datasets](https://www.theverge.com/2024/12/13/24320811/what-ilya-sutskever-sees-openai-model-data-training), for instance to [scale large foundational models](https://arxiv.org/abs/2410.08892), and federated learning offers one way to train models on protected data.\r\n\r\nThe key point of this tutorial is to introduce an alternative approach to training AI models that is straightforward and accessible.\r\n\r\nThis tutorial is sequenced in 3 parts. We\u2019ll first introduce federated learning and its prototypical architecture. In part 2, we\u2019ll dive into a series of live Python code demos that showcase how to convert a classical centralized machine learning workflow into a federated workflow involving multiple federated clients. We\u2019ll demonstrate the similarities and differences of how the iteration of a federated research project is conducted. Finally, in part 3, we\u2019ll demonstrate how you can take your research code and deploy it in a production setting using a mixture of physical edge devices and VMs.\r\n\r\nThroughout the tutorial, we\u2019ll use [Flower](https://github.com/adap/flower), the fully open-sourced federated AI framework, which is written in Python and designed for Python users. With simplicity as one of it\u2019s main goals, Flower provides multiple features and libraries to accelerate research, such as [Flower Baselines](https://flower.ai/docs/baselines/) (for reproducing federated learning benchmarks) and [Flower Datasets](https://flower.ai/docs/datasets/) (a standalone Python library for easily creating federated datasets). We\u2019ll showcase how to use the Flower CLI in both research and production setting.\r\n\r\nThis tutorial addresses people with fluency in Python, CLI, and basic knowledge of a machine learning project. It would help if you\u2019ve also used Docker before. Any data practitioner is encouraged to attend the tutorial to learn and discuss how to federate and distribute the training of an ML model. \r\n\r\nYou will learn:\r\n\r\n- What\u2019s Federated Learning?\r\n    - Basics and real-world examples\r\n- How to federate your existing ML training code, and more FL-specific steps such as how to:\r\n    - Configure the behaviours of each federated client\r\n    - Persist the state of each client across global rounds\r\n    - Evaluate both aggregated and local models\r\n    - Standardize your FL experiments\r\n    - Track your experiments\r\n- How to deploy your research code in a production setting, such as how to:\r\n    - Deploy Flower federated learning clients using Docker\r\n    - Set-up secure connection and node authentication\r\n    - Run, monitory, and manage the federated learning runs.\r\n\r\nBring your own laptop if you\u2019d like to follow along. Some code examples will be executed in GitHub Codespaces, others can be locally executed on your favourite IDE. \r\n\r\n**Update: 24th April 2025** \r\nThe GitHub repo containing the code examples is available here \ud83d\udc49 [link](https://github.com/chongshenng/pyconde2025).\r\n\r\nThe tutorial session is structured in the following way:\r\n\r\n- 0:00 Introduction, and getting to know the audience.\r\n- 0:05 What\u2019s Federated Learning? Basics and real-world-examples.\r\n- 0:25 Overview of the Flower framework for federated learning\r\n- 0:30 Quickstart examples with PyTorch. Moving from a centralized training to federated.\r\n- 1:00 Deploying your research to production\r\n- 1:20 Feedback and Q&A", "recording_license": "", "do_not_record": false, "persons": [{"code": "YHPLBC", "name": "Chong Shen Ng", "avatar": "https://pretalx.com/media/avatars/YHPLBC_oCxnXnQ.png", "biography": "Dr. Chong Shen Ng is a Research Engineer at Flower Labs with over a decade of experience in both research and industry, specializing in federated learning, data science, and parallel computing. As a key developer, he focuses on scaling Flower to deploy privacy-enhanced distributed AI solutions for real-world applications. Chong Shen is passionate about contributing to the open-source community, developing trustworthy AI systems through federated learning, and advancing edge AI technologies. A dedicated advocate for open-source software, he has co-chaired PyData Global events and volunteered at SciPy and PyData London conferences.", "public_name": "Chong Shen Ng", "guid": "7238dab2-401b-5973-b670-f3418b34f88a", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/YHPLBC/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/9Y9DM8/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/9Y9DM8/", "attachments": []}], "Carbonium": [{"guid": "f098345b-834a-562b-aeed-36388f3923d5", "code": "VC3T39", "id": 64272, "logo": null, "date": "2025-04-24T09:00:00+02:00", "start": "09:00", "duration": "03:00", "room": "Carbonium", "slug": "pyconde-pydata-2025-64272-mini-pythonistas-coding-experimenting-and-exploring-with-zumi", "url": "https://pretalx.com/pyconde-pydata-2025/talk/VC3T39/", "title": "Mini-Pythonistas: Coding, Experimenting, and Exploring with Z\u00fcmi!", "subtitle": "", "track": "PyData: Embedded Systems & Robotics", "type": "Kids Workshop", "language": "en", "abstract": "Please note, this is a children's workshop. Recommended age 10-16 years. Experienced use of keyboard and mouse, first words in English (for programming) are required. // \r\n\r\nWelcome, mini-Pythonistas! In this workshop, we\u2019ll dive into the world of Z\u00fcmi, a programmable car that\u2019s much more than just wheels and motors. With built-in sensors, lights, and a camera, Z\u00fcmi can learn to recognize colors, respond to gestures, and even identify faces \u2014 all with your help!", "description": "## Summary\r\nWelcome, mini-Pythonistas! In this workshop, we\u2019ll dive into the world of Z\u00fcmi, a programmable car that\u2019s much more than just wheels and motors. With built-in sensors, lights, and a camera, Z\u00fcmi can learn to recognize colors, respond to gestures, and even identify faces \u2014 all with your help!\r\n\r\nNote, this is a kids workshop: All children and young people up to the age of 16 are welcome.\r\n\r\n\r\n## More Details\r\nWhether you\u2019re brand new to programming or a seasoned Python pro, there\u2019s something here for everyone:\r\n* Blockly: Perfect for beginners! Learn the basics of programming by snapping together colorful blocks.\r\n* Jupyter Notebooks: Already know about variables and loops? Take the next step and explore more advanced coding concepts.\r\n* Python Scripting: For our experienced coders, write your own Python scripts and push Z\u00fcmi to its limits.\r\n\r\nWhat can you teach Z\u00fcmi?\r\n* Drive and park autonomously: With infrared sensors, Z\u00fcmi can detect obstacles, stop, and adjust its course.\r\n* Recognize colors: Train a machine learning model to teach Z\u00fcmi to stop or react when it sees a specific color.\r\n* Identify faces: Using its camera, Z\u00fcmi can spot faces in photos and even recognize a smile!\r\n* and many more!\r\n\r\nJoin us for a fun-filled adventure where coding meets creativity and discovery. Let\u2019s see what you and Z\u00fcmi can achieve together! \ud83d\ude97\ud83d\udcbb\u2728", "recording_license": "", "do_not_record": false, "persons": [{"code": "8ZRD8E", "name": "Dr. Marisa Mohr", "avatar": "https://pretalx.com/media/avatars/8ZRD8E_x0Xwg2f.jpg", "biography": "I'm Marisa, I live in L\u00fcbeck, Germany, I'm a mathematician and Team Lead at inovex. With a passion for data, demystification, and people, I\u2019m on a mission to bridge the gap between complex mathematical concepts and real-world understanding. Whether it\u2019s through interpretability, explainability, or fostering equal opportunity and fairness, I believe that math can \u2013 and should \u2013 be accessible to everyone.", "public_name": "Dr. Marisa Mohr", "guid": "20cc6109-0502-551e-af07-5615f44da91b", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/8ZRD8E/"}, {"code": "9CX9CB", "name": "Anna-Lena Popkes", "avatar": "https://pretalx.com/media/avatars/9CX9CB_cJ2hJM2.jpg", "biography": "I'm Anna-Lena, a machine learning engineer living in Bonn, Germany. I'm very passionate about learning and love to share my knowledge with other people. Besides machine learning I love teaching Python and have been a regular guest on PyCon events and podcasts.", "public_name": "Anna-Lena Popkes", "guid": "9b697e10-d673-5739-8580-75f8612f8ff2", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/9CX9CB/"}, {"code": "MFBP3R", "name": "Hannah Hepke", "avatar": "https://pretalx.com/media/avatars/MFBP3R_CXBKsxJ.jpeg", "biography": "I'm Hannah, a data and machine learning engineer living in Karlsruhe, Germany. With a strong interest in Artificial Intelligence, I am excited to start my journey in this dynamic field. I am passionate about teaching and take great pleasure in breaking down complex concepts and making them accessible to others, fostering a collaborative learning environment.", "public_name": "Hannah Hepke", "guid": "223561c6-9718-5fbf-858c-27e523522e73", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/MFBP3R/"}, {"code": "CZVWXB", "name": "Daniel Hieber", "avatar": "https://pretalx.com/media/avatars/CZVWXB_jqVHirF.png", "biography": "Hi, I'm Daniel, a PhD student in digital neuropathology at Julius-Maximilians-University W\u00fcrzburg and a research associate at the University Hospital Augsburg as well as Neu-Ulm University of Applied Sciences. My work focuses on applying computer vision techniques to automate analysis processes in the pathological departments and provide physicians with the tools to conduct machine learning on their own.", "public_name": "Daniel Hieber", "guid": "c55366e6-e133-52a1-bb8c-3f759efc0a86", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/CZVWXB/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/VC3T39/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/VC3T39/", "attachments": []}], "OpenSpace": [{"guid": "f254e91e-bb5f-5fa3-b6c9-09df722cdd8a", "code": "WELCVS", "id": 61389, "logo": null, "date": "2025-04-24T10:15:00+02:00", "start": "10:15", "duration": "01:30", "room": "OpenSpace", "slug": "pyconde-pydata-2025-61389-probably-fun-board-games-to-teach-data-science", "url": "https://pretalx.com/pyconde-pydata-2025/talk/WELCVS/", "title": "Probably Fun: Board Games to teach Data Science", "subtitle": "", "track": "General: Education, Career & Life", "type": "Tutorial", "language": "en", "abstract": "In this tutorial, you will speed-date with board and card games that can be used to teach Data Science. You will play one game for 15 minutes, reflect on the Data Science concepts it involves, and then rotate to the next table. \r\n\r\nAs a result, you will experience multiple ideas that you can use to make complex ideas more understandable and enjoyable. We would like to demonstrate how gamification can not only used to produce short puzzles and quizzes, but also as a tool to reason complex problem-solving strategies.\r\n\r\nWe will bring a set of carefully selected games that have been proven effective in teaching statistics, programming, machine learning and other Data Science skills. We also believe that it is probably fun to participate in this tutorial.", "description": "Games encourage people to put their brains to work in a focused, constructive and peaceful way. This makes games a fantastic tool in the classroom. Many board games contain sophisticated algorithms and statistical models right under the surface. Therefore, Data Science education can be boosted by playing carefully selected games.\r\n\r\nWe have applied popular board and card games such as Memory, Wizard, Machi Koro, Pandemic and Sky Team (the 2024 Game of the Year in Germany) to teach Data Science concepts in our courses. Learners would first play a game, discuss the mechanisms and only after that get exposed to the theory. Finally, they would move to practical applications using computers.\r\n\r\nThis game-driven approach provides learners with an intrinsic motivation to solve a real practical problem (succeeding at the game).\r\nAnalyzing a game makes it easier to grasp the core mechanism or algorithmic model and ask qualified questions about the details later.\r\nIt also makes sure learners will want to come back for the next class. We have documented practical lessons and made them available under a CC license on https://www.academis.eu/probably_fun/ .\r\n\r\nIn this tutorial, you will speed-date with several short games that can be used to teach Data Science concepts and skills. You will play one game for 15 minutes, reflect on the Data Science concepts it involves, and then rotate to the next table. \r\nThis way, you will experience multiple ideas you can use to make complex methods and ideas more accessible. Also, the tutorial is probably fun to participate in.\r\n\r\nThe tutorial will be executed according to the following pseudocode (or lesson plan):\r\n\r\n1. The presenters give a short introduction on why games matter (5 min)\r\n2. The presenters group participants into teams of up to 6 people.\r\n3. Each team is assigned to a game table with a game and a cheat sheet with instructions. The presenters facilitate with understanding rules and to remove other obstacles.\r\n4. The teams play the game for up to 15 minutes.\r\n5. The teams discuss 1-3 prepared reflection questions to make the transfer from the game to the data science concepts.\r\n6. Each team moves to the next table.\r\n7. Repeat for 3-4 rounds.\r\n8. Everybody gets together for a joint Q & A\r\n9. A QR-Code links to material with games that help learning Data Science and lesson plans", "recording_license": "", "do_not_record": false, "persons": [{"code": "9EPNQG", "name": "Dr. Kristian Rother", "avatar": "https://pretalx.com/media/avatars/9EPNQG_TTM8mnl.jpg", "biography": "Kristian is a freelance Python trainer who wrote his first lines of Python in the year 11111001111. After a career writing software for life science research, he has been teaching Python, Data Analysis and Machine Learning throughout Europe since 2011. More recently, he has built data pipelines for the real estate and medical sector.\r\n\r\nKristian has translated 5 Python books and written 2 more himself, in addition to numerous teaching guides. Kristian has collected 364 stars on Advent of Code. His knowledge about async is, unfortunately, miserable. His favorite Python module is 're'. Kristian believes everybody can learn programming.\r\n\r\nYou can find Kristians teaching materials on https://www.academis.eu", "public_name": "Dr. Kristian Rother", "guid": "1096b371-55c7-509f-a52d-73e66c5db09b", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/9EPNQG/"}, {"code": "WVNMPG", "name": "Paula Gonzalez Avalos", "avatar": "https://pretalx.com/media/avatars/WVNMPG_opng10Z.png", "biography": "Data Nerd & Python Pydata community lover. AI education specialist with five years of experience shaping data science and AI educational offers. Currently leading the AI Academy at the appliedAI Institute for Europe.", "public_name": "Paula Gonzalez Avalos", "guid": "50a454c2-0f8e-5d15-9a7a-b91319a30558", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/WVNMPG/"}], "links": [{"title": "Description of games + lesson plans", "url": "https://www.academis.eu/probably_fun/", "type": "related"}], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/WELCVS/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/WELCVS/", "attachments": []}]}}, {"index": 3, "date": "2025-04-25", "day_start": "2025-04-25T04:00:00+02:00", "day_end": "2025-04-26T03:59:00+02:00", "rooms": {"Zeiss Plenary (Spectrum)": [{"guid": "59c95c60-d3b4-52d9-990a-902dc7da0811", "code": "Z9ZTAH", "id": 65615, "logo": null, "date": "2025-04-25T09:05:00+02:00", "start": "09:05", "duration": "00:45", "room": "Zeiss Plenary (Spectrum)", "slug": "pyconde-pydata-2025-65615-the-future-of-ai-building-the-most-impactful-technology-together", "url": "https://pretalx.com/pyconde-pydata-2025/talk/Z9ZTAH/", "title": "The Future of AI: Building the Most Impactful Technology Together", "subtitle": "", "track": "Keynote", "type": "Keynote", "language": "en", "abstract": "In this talk, Leandro will examine the significant benefits of combining open source principles with artificial intelligence. He will walk through the need for openness in language models to build trust, maintain control, mitigate biases, and achieve true alignment and show how open models are rapidly gaining momentum in the AI landscape, challenging proprietary systems through community-driven innovation. Finally, he will then talk about emerging trends and what the community needs to build for the next generation of models.", "description": "In this talk, Leandro will examine the significant benefits of combining open source principles with artificial intelligence. He will walk through the need for openness in language models to build trust, maintain control, mitigate biases, and achieve true alignment and show how open models are rapidly gaining momentum in the AI landscape, challenging proprietary systems through community-driven innovation. Finally, he will then talk about emerging trends and what the community needs to build for the next generation of models.", "recording_license": "", "do_not_record": false, "persons": [{"code": "D7XC7V", "name": "Leandro von Werra", "avatar": null, "biography": "Leandro von Werra is the head of research at Hugging Face. He promotes open science and works on building large high-quality datasets and training of open LLMs. He lead the BigCode project, is a co-author of the \u201cNatural Language Processing with Transformers\u201d book published at O\u2019Reilly and the creator of the popular Python library TRL, which combines transformers with reinforcement learning and other effective fine-tuning methods.", "public_name": "Leandro von Werra", "guid": "96320a4f-e4b0-5107-9517-0bafec9ac6e8", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/D7XC7V/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/Z9ZTAH/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/Z9ZTAH/", "attachments": []}, {"guid": "6c4cf40d-e631-5f38-80b2-31c08758290b", "code": "SXRVNU", "id": 60689, "logo": null, "date": "2025-04-25T10:15:00+02:00", "start": "10:15", "duration": "00:30", "room": "Zeiss Plenary (Spectrum)", "slug": "pyconde-pydata-2025-60689-data-as-python-code", "url": "https://pretalx.com/pyconde-pydata-2025/talk/SXRVNU/", "title": "Data as (Python) Code", "subtitle": "", "track": "PyCon: MLOps & DevOps", "type": "Talk", "language": "en", "abstract": "In contemporary data-driven environments, the seamless integration of data into automated workflows is paramount. The reliability of automation, however, is constantly threatened by breaking changes in the source data. The Data-as-Code (DaC) paradigm address this challenge by treating data as a first-class citizen within the software development lifecycle.", "description": "Data-as-Code (DaC) is a paradigm that streamlines data distribution by encapsulating dataset retrieval within Python packages, along with a data contract. This approach makes it easy to enforce data quality, effortlessly leverage on semantic versioning to prevent errors in the data pipeline, and abstracts away from the Data Scientist all the boilerplate code to load the data needed by the ML models, improving efficiency and consistency. This presentation will delve into the implementation of DaC, demonstrate its practical applications, and discuss the benefits it offers in modern data workflows.\r\n\r\nThis session will cover:\r\n1. Introduction to Data-as-Code (DaC):\r\n   - What problems do we want to solve with DaC\r\n   - What it is out of scope\r\n2. Implementing DaC:\r\n   - Packaging data as Python packages\r\n   - Defining data contracts\r\n3. Advantages of DaC:\r\n   - Application of semantic versioning to manage data changes effectively\r\n   - Breaking changes in data are automatically detected as part of the data distribution\r\n   - Abstraction of data loading mechanisms, allowing seamless transitions between data sources\r\n   - Elimination of hard-coded data field names, enhancing code maintainability\r\n   - Facilitation of unit testing through schema examples\r\n   - Inclusion of comprehensive data descriptions and metadata\r\n   - Centralized data distribution via the Python Package Index (PyPI)\r\n4. DaC in the real world:\r\n   - Step-by-step walkthrough of creating and distributing a DaC package\r\n   - Guidelines for data engineers on preparing data for DaC\r\n   - Instructions for data scientists on consuming DaC packages in their workflows\r\n   - Discussion on the scalability and adaptability of DaC\r\n6. Q&A Session:\r\n   - Addressing audience questions and remarks", "recording_license": "", "do_not_record": false, "persons": [{"code": "7BDJCR", "name": "Francesco Calcavecchia", "avatar": "https://pretalx.com/media/avatars/7BDJCR_P4DP4OI.jpeg", "biography": "Physicist, ML Engineer, Agile adept. I\u2019d rather have a taste of everything than specialize. Eager to learn, unlearn, try out, share, help.", "public_name": "Francesco Calcavecchia", "guid": "ab89ef83-7c80-542f-8fe0-5ace6d701363", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/7BDJCR/"}], "links": [{"title": "Data as Code - Homepage", "url": "https://data-as-code.github.io/docs/", "type": "related"}, {"title": "Tutorial repo - reproduce the examples used in the presentation", "url": "https://github.com/data-as-code/tutorials", "type": "related"}, {"title": "Slides", "url": "https://www.dropbox.com/scl/fi/w9lxdyxpcjeag0f5obppe/pycon-dac.pdf?rlkey=hwo96c1wndk4918qini97ijls&st=cbpxkj5c&dl=0", "type": "related"}], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/SXRVNU/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/SXRVNU/", "attachments": []}, {"guid": "b95a62aa-9104-578f-ae72-3528e8d50dc6", "code": "CPCNRZ", "id": 61175, "logo": null, "date": "2025-04-25T10:55:00+02:00", "start": "10:55", "duration": "00:30", "room": "Zeiss Plenary (Spectrum)", "slug": "pyconde-pydata-2025-61175-how-narwhals-is-silently-bringing-pandas-polars-duckdb-pyarrow-and-more-together", "url": "https://pretalx.com/pyconde-pydata-2025/talk/CPCNRZ/", "title": "How Narwhals is silently bringing pandas, Polars, DuckDB, PyArrow, and more together", "subtitle": "", "track": "PyData: PyData & Scientific Libraries Stack", "type": "Talk", "language": "en", "abstract": "If you were writing a data science tool in 2015, you'd have ensured it supported pandas and then called it a day.\r\n\r\nBut it's not 2015 anymore, we've fast-forwarded to 2025. If you write a tool which only supports pandas, users will demand support for Polars, PyArrow, DuckDB, and so many other libraries that you'll feel like giving up.\r\n\r\nLearn about how Narwhals allows you to write dataframe-agnostic tools which can support all of the above, with zero dependencies, low overhead, static typing, and strong backwards-compatibility promises!", "description": "Suppose you want to write a data science tool to do feature engineering. Your experience may go like this:\r\n- Expectation: you can focus on state-of-the art techniques for feature engineering.\r\n- Reality: you keep having to make you codebase more complex because a new dataframe library has come out and users are demanding support for it.\r\n\r\nOr rather, it might have gone like that in the pre-Narwhals era. Because now, you can focus on solving the problems which your tool set out to do, and let Narwhals handle the subtle differences between different kinds of dataframe inputs!\r\n\r\nNarwhals is a lightweight and extensible compatibility layer between dataframe libraries. It is already used by several open source libraries including Altair, Marimo, Plotly, Scikit-lego, Vegafusion, and more. You will learn how to use Narwhals to build dataframe-agnostic tools.\r\n\r\nThis is a technical talk aimed at tool-builders. You'll be expected to be familiar with Python and dataframes. We will cover:\r\n- 2-3 minutes: motivation. Why are there so many dataframe libraries?\r\n- 2-3: minutes: life before vs after Narwhals - real-world examples of how the data landscape is changing\r\n- 7-8 minutes: basics of Narwhals, wrapping native objects, expressions vs Series, lazy vs eager\r\n- 7-8 minutes: advanced Narwhals concepts: row order, non-elementary group-by aggregations, multi-indices, null values, backwards-compatibility promises\r\n- 2-3 minutes: what comes next?\r\n- 5 minutes: engaging Q&A / awkward silence\r\n\r\nTool builders will benefit from the talk by learning how to build tools for modern dataframe libraries without sacrificing support for foundational classic libraries such as pandas.", "recording_license": "", "do_not_record": false, "persons": [{"code": "KEUJ9U", "name": "Marco Gorelli", "avatar": "https://pretalx.com/media/avatars/KEUJ9U_SQaluhL.jpg", "biography": null, "public_name": "Marco Gorelli", "guid": "6fa6c8b8-7b91-5fe9-8b55-ae35ecc17ec8", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/KEUJ9U/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/CPCNRZ/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/CPCNRZ/", "attachments": []}, {"guid": "4f78a170-f3ab-5903-8d7e-d1d7f76b3f6a", "code": "HQWAYP", "id": 61318, "logo": null, "date": "2025-04-25T11:35:00+02:00", "start": "11:35", "duration": "00:45", "room": "Zeiss Plenary (Spectrum)", "slug": "pyconde-pydata-2025-61318-topological-data-analysis-how-to-quantify-holes-in-your-data-and-why", "url": "https://pretalx.com/pyconde-pydata-2025/talk/HQWAYP/", "title": "Topological data analysis: How to quantify \"holes\" in your data and why?", "subtitle": "", "track": "PyData: PyData & Scientific Libraries Stack", "type": "Talk (long)", "language": "en", "abstract": "Do you need to compare sets of points in a plane? Identify a potential cyclic event in high-dimensional time series data? Find the second or the third highest peak of a noisily sampled function? Topological data analysis (TDA) is not a universal hammer, but it might just be the 16 mm wrench for your 16 mm hex head bolt. There is no shortage of Python libraries implementing TDA methods for various settings, but navigating the options can be challanging without prior familiarity with the topic. In my talk I will demonstrate the utility of the tool with several simple examples, list various libraries used by the TDA community, and dive a bit deeper into the methods to explain what the libraries implement and how to interpret and work with the outputs.", "description": "For specific tasks, topological data analysis can be a more rigid, straightforward and interpretable alternative to complicated machine learning pipelines. However, it is not so widely known and can be intimidating to get into when starting from zero. The goal of this talk is to introduce persistent homology, the main tool of topological data analysis, show concrete examples of how to apply it using available Python libraries, and reveal more details about what is going on \"under the hood\", which is important to correctly utilize the methods. I will start with several examples showcasing the possible uses of persistent homology and how to establish an analysis pipeline in Python. Then I will describe more about different variants within such a pipeline, like a choice of a filtered complex or vectorization, and their advantages and disadvantages.", "recording_license": "", "do_not_record": false, "persons": [{"code": "ZTVM8H", "name": "Ondrej Draganov", "avatar": "https://pretalx.com/media/avatars/ZTVM8H_Vj3Mbrf.jpeg", "biography": "A researcher in Topological Data Analysis (TDA) working on both its theoretical aspects and applications. I have completed my PhD at ISTA in Austria and then moved to Inria in France to apply the TDA methods to spatial transcriptomics data.", "public_name": "Ondrej Draganov", "guid": "ea5e1e6a-d3b8-5341-a1d9-8c8926fa5ea8", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/ZTVM8H/"}], "links": [{"title": "Slides", "url": "https://drive.google.com/file/d/1Up_0mnvRk8deOhbMKtnhyTez8rM9_PzC/view?usp=sharing", "type": "related"}], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/HQWAYP/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/HQWAYP/", "attachments": []}, {"guid": "25e94e0b-968a-5532-a1ff-687ab7881344", "code": "QN3BTA", "id": 61905, "logo": null, "date": "2025-04-25T13:20:00+02:00", "start": "13:20", "duration": "00:30", "room": "Zeiss Plenary (Spectrum)", "slug": "pyconde-pydata-2025-61905-from-stockouts-to-happy-customers-proven-solutions-for-time-series-forecasting-in-retail", "url": "https://pretalx.com/pyconde-pydata-2025/talk/QN3BTA/", "title": "From stockouts to happy customers: Proven solutions for time series forecasting in retail", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Statistics", "type": "Talk", "language": "en", "abstract": "Time series forecasting in the retail industry is uniquely challenging: Datasets often include stockouts that censor actual demand, promotional events cause irregular demand spikes, new product launches face cold-start issues, and diverse demand patterns within an imbalanced product portfolio create modeling challenges.\r\nIn this talk, we\u2019ll explore proven, real-world strategies and examples to address these problems. Learn how to successfully handle censored demand caused by stockouts, effectively incorporate promotional effects, and tackle the variability of diverse products using clustering and ensembling strategies. Whether you\u2019re a seasoned data scientist or a Python developer exploring forecasting, the goal of this session is to introduce you to the key challenges in retail forecasting and equip you with actionable insights to successfully overcome them in real-life scenarios.", "description": "Retail time series forecasting is uniquely challenging: stockouts censor true demand, promotions cause irregular demand spikes, cold-start products lack historical data, and diverse product portfolios introduce modeling complexities. These challenges can lead to inefficiencies such as over- or understocking in the warehouses and therefore also to dissatisfied customers. This talk explores proven strategies to tackle these issues and deliver actionable insights.\r\n\r\nLearn how to handle constrained demand caused by stockouts both with adequate imputation as well as machine learning strategies, incorporate promotional effects with suitable feature engineering techniques that also help in cases of incomplete promotional data, predict demand for new products using transfer learning and also discover how ensembling strategies and clustering can simplify forecasting for diverse, imbalanced datasets.\r\n\r\nWe\u2019ll also highlight tools like statsforecast, neuralforecast, scikit-learn and our AutoML framework with a strong stacking ensembling mechanism in it's core. Whether you\u2019re a seasoned data scientist or a Python developer exploring forecasting, the goal of this session is to introduce you to the key challenges in retail forecasting and equip you with actionable insights to successfully overcome them in real-life scenarios.", "recording_license": "", "do_not_record": false, "persons": [{"code": "CAH7QT", "name": "Robert Haase", "avatar": "https://pretalx.com/media/avatars/CAH7QT_Qpcha4S.jpeg", "biography": "I earned both my Bachelor's and Master's degrees in Physics from the University of Heidelberg, specializing in Condensed Matter Physics and Computational Physics. During my Master's thesis in 2020, I advanced existing NLP Transformer architectures for timeseries applications where I worked extensively with uncertainty quantifications and normalizing flows. Since the beginning of 2021, I have been employed at Paretos, where the primary focus of my work lies in Timeseries Forecasting, specifically demand forecasting. Since 2023, Im leading the AI team at paretos which is giving me a good opportunity to combine my leadership skills with our super interesting research in scalable time series forecasting & optimization applications.", "public_name": "Robert Haase", "guid": "ac3694c6-c97f-56fd-b87a-bcaecd96077b", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/CAH7QT/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/QN3BTA/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/QN3BTA/", "attachments": []}, {"guid": "da0970f3-147a-5c67-a4f2-beede56650a0", "code": "RAHBEP", "id": 59643, "logo": null, "date": "2025-04-25T14:00:00+02:00", "start": "14:00", "duration": "00:30", "room": "Zeiss Plenary (Spectrum)", "slug": "pyconde-pydata-2025-59643-forecast-of-hourly-train-counts-on-rail-routes-affected-by-construction-work", "url": "https://pretalx.com/pyconde-pydata-2025/talk/RAHBEP/", "title": "Forecast of Hourly Train Counts on Rail Routes Affected by Construction Work", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Statistics", "type": "Talk", "language": "en", "abstract": "Construction work in national railroad networks often disrupts train traffic, making it vital to estimate hourly train numbers for effective re-routing. Traditionally managed by humans, this process has been automated due to staff shortages and demographic changes. DB Systel GmbH, Deutsche Bahn's IT provider, leveraged machine learning and artificial intelligence to estimate train traffic during construction. Using Python and frameworks like Pandas, scikit-learn, NumPy, PyTorch and Polars, their solution demonstrated significant benefits in performance and efficiency.", "description": "Within a national railroad network, construction work for maintenance and modernization is unavoidable - as is train traffic on the affected sections under certain circumstances. Although there are fixed timetables for passenger rail transport that are planned well in advance and are set very early, there are still many freight transports and special trains that are registered at short notice and cause a dynamic traffic situation on the rail network. Therefore, the capacity utilisation of the rail routes is unknown until shortly before the journey takes place. It is therefore important to estimate the number of trains that will run over the affected tracks in order to establish a sensible re-routing strategy. Until now, this process has been in the hands of human decision-makers for decades or even more than a century.\r\n\r\nDemographic change and staff shortages are increasingly forcing companies to automate activities intelligently. This is where machine learning and artificial intelligence come into play.\r\n\r\nAs Deutsche Bahn's IT service provider, DB Systel GmbH was able to successfully implement an example of intelligent automation of this process and estimate train numbers on sections of tracks affected by construction using modern ML and AI methods. Python as well as various established frameworks (Pandas, scikit-learn, NumPy, PyTorch) and new frameworks (Polars, Ruff) were used in this project. A success and performance measurement clearly demonstrated the benefits of ML automation.", "recording_license": "", "do_not_record": false, "persons": [{"code": "DCTDRA", "name": "Sebastian Folz", "avatar": "https://pretalx.com/media/avatars/DCTDRA_K4QdCQP.JPG", "biography": "Sebastian Folz works as a machine learning engineer at DB Systel GmbH. His tasks also include project management and contributing in works groups around AI topics. He also has contributed to open source projects in the past. Sebastian can often be found at meet-ups around Python and machine learning in Karlsruhe.", "public_name": "Sebastian Folz", "guid": "6c25e2e9-9589-5e76-945c-c40ce9e9f187", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/DCTDRA/"}, {"code": "DAPDTE", "name": "Dr Maren Westermann", "avatar": "https://pretalx.com/media/avatars/DAPDTE_on44jIn.jpg", "biography": "Dr Maren Westermann works as a machine learning engineer at DB Systel GmbH and holds a PhD in environmental science. She is a self taught Pythonista, a member of the documentation and contributor experience team, respectively at the open source machine learning library scikit-learn, and a team member of the open source library Narwhals. She is also a co-organiser of PyLadies Berlin where she mainly hosts open source hack nights.", "public_name": "Dr Maren Westermann", "guid": "2ea19cf1-6743-5308-aad4-2121e7b25ff9", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/DAPDTE/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/RAHBEP/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/RAHBEP/", "attachments": []}, {"guid": "2448dbf5-4f31-5f02-8a1b-a60a753aa9c3", "code": "8PFFPS", "id": 61311, "logo": null, "date": "2025-04-25T14:40:00+02:00", "start": "14:40", "duration": "00:30", "room": "Zeiss Plenary (Spectrum)", "slug": "pyconde-pydata-2025-61311-demystifying-design-patterns-a-practical-guide-for-developers", "url": "https://pretalx.com/pyconde-pydata-2025/talk/8PFFPS/", "title": "Demystifying Design Patterns: A Practical Guide for Developers", "subtitle": "", "track": "PyCon: Programming & Software Engineering", "type": "Talk", "language": "en", "abstract": "Do you ever worry about your code becoming spaghetti-like and difficult to maintain?\r\nMaster the art of crafting clean, maintainable, and adaptable software by harnessing the power of design patterns. This presentation will empower you with a clear, structured understanding of these reusable solutions to address common programming challenges.\r\n\r\nWe'll delve into design patterns\u2019 key categories: Behavioral, Structural, and Creational, as well as explore their functionality and how they can be applied in your daily development workflow. For each category, we'll also explore a practical design pattern in detail and showcase real-world applications of these patterns, along with small-scale code examples that illustrate their practical implementation.\r\n\r\nYou'll gain valuable insight into how these patterns can translate into real-world development scenarios, such as facilitating communication between objects (Behavioral), separating interfaces from implementation for flexibility (Structural), and enabling dynamic algorithm selection at runtime (Creational).", "description": "Do you ever worry about your code becoming spaghetti-like and difficult to maintain?\r\nMaster the art of crafting clean, maintainable, and adaptable software by harnessing the power of design patterns. This presentation will empower you with a clear, structured understanding of these reusable solutions to address common programming challenges.\r\n\r\nWe'll delve into design patterns\u2019 key categories: Behavioral, Structural, and Creational, as well as explore their functionality and how they can be applied in your daily development workflow. For each category, we'll also explore a practical design pattern in detail and showcase real-world applications of these patterns, along with small-scale code examples that illustrate their practical implementation.\r\n\r\nYou'll gain valuable insight into how these patterns can translate into real-world development scenarios, such as facilitating communication between objects (Behavioral), separating interfaces from implementation for flexibility (Structural), and enabling dynamic algorithm selection at runtime (Creational).", "recording_license": "", "do_not_record": false, "persons": [{"code": "MQ7JUZ", "name": "Tanu", "avatar": "https://pretalx.com/media/avatars/MQ7JUZ_aLFD3YZ.jpg", "biography": "Tanu is a Software Engineer at Bloomberg on the BQL (Bloomberg Query Language) team. BQL provides intelligent query suggestions to empower users for efficient data exploration.  A passion for crafting clean, maintainable, and efficient software solutions fuels her work in this role and throughout her career. She has a Master's degree in Distributed Systems and 6 years of industry experience building scalable systems. She is a tech writer for Medium, has organized hands-on workshops and delivered technical presentations internally for 100+ people  . She is passionate about staying on top of tech and sharing knowledge at conferences. In her free time, Tanu enjoys traveling and playing music.", "public_name": "Tanu", "guid": "5cda1458-4fce-5335-9a02-df2adffc361c", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/MQ7JUZ/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/8PFFPS/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/8PFFPS/", "attachments": []}], "Titanium3": [{"guid": "6df162f1-096d-5abc-9adc-4c7eed3de50f", "code": "FSK3PE", "id": 61104, "logo": null, "date": "2025-04-25T10:15:00+02:00", "start": "10:15", "duration": "00:30", "room": "Titanium3", "slug": "pyconde-pydata-2025-61104-from-queries-to-confidence-ensuring-sql-reliability-with-python", "url": "https://pretalx.com/pyconde-pydata-2025/talk/FSK3PE/", "title": "From Queries to Confidence: Ensuring SQL Reliability with Python", "subtitle": "", "track": "PyCon: Testing", "type": "Talk", "language": "en", "abstract": "SQL remains a foundational component of data-driven applications, but ensuring the accuracy and reliability of SQL logic is often challenging. SQL testing can be cumbersome, time-consuming, and error-prone. However, these challenges can be addressed by leveraging the simplicity of Python's testing framework such as pytest, enabling clean, robust, and automated SQL testing.", "description": "SQL is an essential part of data-driven applications, powering everything from simple queries to complex data transformations. However, ensuring the accuracy and reliability of SQL code is often challenging, particularly when dealing with intricate logic or large-scale datasets. Also, deploying changes in SQL code to production is another complex task, as it requires careful validation to avoid breaking the query logic.\r\n\r\nFortunately, integrating Python\u2019s testing framework such as pytest into SQL workflows provides a streamlined solution for these challenges. Such approach enables creating clean, efficient, and automated testing processes for SQL code and database logic. Therefore, we can validate query results, enforce schema consistency, and simulate complex data scenarios, all while reducing manual effort and improving test coverage.\r\n\r\nThis talk will address:\r\n- configuring lightweight database fixtures\r\n- verifying SQL query result and testing scripts seamlessly\r\n- data mocking\r\n- schema validation\r\n- testing non-deterministic queries\r\n- handling large datasets\r\n\r\nAttendees will gain insights into improving SQL code quality, identifying issues early in the development process, and ensuring the reliability of data-driven products. This presentation is particularly beneficial for Data Scientists, Engineers, and Analysts seeking to enhance the efficiency and precision of their testing practices.", "recording_license": "", "do_not_record": false, "persons": [{"code": "EKLLGL", "name": "Anna Varzina", "avatar": "https://pretalx.com/media/avatars/EKLLGL_lpNJ1YY.PNG", "biography": "Anna Varzina is a Data Science Engineer at Lighthouse, where she has been developing data-driven solutions for the hospitality industry since 2021. She specialises in working with large datasets and performing complex data transformations using Python and SQL to extract meaningful insights.\r\nThis is Anna's first time speaking at PyCon/PyData, and she is excited to share her experiences in overcoming the challenges of building reliable and scalable data workflows.", "public_name": "Anna Varzina", "guid": "20c3a244-2f41-544b-ae6e-c68aaaeb43ee", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/EKLLGL/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/FSK3PE/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/FSK3PE/", "attachments": [{"title": "Presentation slides", "url": "/media/pyconde-pydata-2025/submissions/FSK3PE/resources/Lighth_l8R7Iqa.pdf", "type": "related"}]}, {"guid": "b305a168-621d-5ab4-a0ef-ad7580abe1df", "code": "DSHASE", "id": 61858, "logo": null, "date": "2025-04-25T10:55:00+02:00", "start": "10:55", "duration": "00:30", "room": "Titanium3", "slug": "pyconde-pydata-2025-61858-using-python-to-enter-the-world-of-microcontrollers", "url": "https://pretalx.com/pyconde-pydata-2025/talk/DSHASE/", "title": "Using Python to enter the world of Microcontrollers", "subtitle": "", "track": "PyData: Embedded Systems & Robotics", "type": "Talk", "language": "en", "abstract": "So you've happily used the Raspberry Pi for your homelab projects, of course with Python based solutions as we all do. You've been down the rabbit hole with everything about temperature and humidity measurements, energy and solar tracking, video recording and time-lapse photography, object detection and security surveillance.\r\n\r\nYou don't just buy these things of the shelve. You want to deeply understand what it takes to create such a thing, and you've been quite happy with your results so far, learned a lot. \r\n\r\nBut for many simple applications ... the power draw! Yes, it's just 5 Watts you say for using a Raspberry Pi. Not a big deal in terms of cost. But you'll always need a power adapter and a free socket.\r\n\r\nYou've heard of these guys using microcontrollers that run on batteries or even solar, for days, weeks, even months.\r\n\r\nThat's exciting, but there's also a catch. These people write code in C-like languages, they build firmware to make their projects run. And it's all bare metal! That seems very different. That'll be a steep learning curve to take ... Or is it?\r\n\r\nWell, there's MicroPython to the rescue. Let me take you with me on a journey to make a simple microcontroller based application to read a Power Meter and send the readings over WiFi for more in depth processing somewhere else.", "description": "Over the past years Python became available on more and more platforms, both software and hardware ones. From MacOS and Linux to Windows. From Desktop Computers and SoC Platforms such as the Raspberry Pis to Data Centers. And even on the smallest side Python is available today.\r\n\r\nMicroPython implements our beloved language for direct use on embedded platforms built on top of popular microcontrollers, such as the original PyBoard using an STM32 microcontroller, the  ESP32 platform and the Raspberry Pi Picos.\r\n\r\nIn this talk we'll have a look at how MicroPython feels compared to the fully fledged Python implementations, by \"porting\" a simple application that initially was built to run on a Raspberry Pi to an ESP32 based Microcontroller. \r\n\r\nThe application was used to retrieve Power Meter Readings via its internal Infrared LED using a small photo transistor based circuit connected to the Raspberry Pi and calculate current power draw from these readings to send them somewhere else for further processing.\r\n\r\nWe'll see what it takes to make such an application work on a Microcontroller running just on batteries.", "recording_license": "", "do_not_record": false, "persons": [{"code": "DUSVE9", "name": "Jens Nie", "avatar": "https://pretalx.com/media/avatars/DUSVE9_x6pgG1p.jpg", "biography": "A physicist currently tackling the development of embedded devices at Rosenxt for various use cases. My journey with Python began a long, long time ago, when the interpreters version string said 1.4.\r\n\r\nBesides my current efforts I can rely on great experience from various other roles in my prior career as a scientist, technology manager and department head.", "public_name": "Jens Nie", "guid": "df23dc13-20d6-55c5-a620-5d4717e311aa", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/DUSVE9/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/DSHASE/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/DSHASE/", "attachments": []}, {"guid": "e857086d-db5a-5a1a-9aae-e4a42fc49f70", "code": "QXSQKL", "id": 61270, "logo": null, "date": "2025-04-25T11:35:00+02:00", "start": "11:35", "duration": "00:45", "room": "Titanium3", "slug": "pyconde-pydata-2025-61270-rustifying-python-a-practical-guide-to-achieving-high-performance-while-maintaining-observability", "url": "https://pretalx.com/pyconde-pydata-2025/talk/QXSQKL/", "title": "Rustifying Python: A Practical Guide to Achieving High Performance While Maintaining Observability", "subtitle": "", "track": "PyCon: Programming & Software Engineering", "type": "Talk (long)", "language": "en", "abstract": "In this session, I\u2019ll share our journey of migrating key parts of a Python application to Rust, resulting in over 200% performance improvement.\r\nRather than focusing on quick Rust-to-Python integration with PyO3, this talk dives into the complexities of implementing such a migration in an enterprise environment, where reliability, scalability, and observability are crucial.\r\nYou\u2019ll learn from our mistakes, how we identified suitable areas for Rust integration, and how we extended our observability tools to cover Rust components.\r\nThis session offers practical insights for improving performance and reliability in Python applications using Rust.", "description": "For performance-critical sections of code, especially those that are I/O-bound or CPU-heavy, Python\u2019s Global Interpreter Lock (GIL) can create significant bottlenecks.\r\nTo improve performance, our team explored integrating Rust, taking advantage of its speed and concurrency features while maintaining Python\u2019s ease of use and flexibility.\r\n\r\nThis session will focus on overcoming common hurdles when migrating to Rust and optimizing performance in a real-world, production environment which orchestrates workload across 2000 compute nodes in various data centers and cloud provider regions.\r\nThis talk covers practical aspects such as observability, scalability, and deployment in a production setting.\r\n\r\nWe\u2019ll begin by discussing how to identify the parts of your Python code that would benefit most from a Rust migration, particularly those where the GIL is a limiting factor.\r\nWe\u2019ll also share insights into our migration process, including the challenges we faced and how we overcame them.\r\nYou\u2019ll learn how we refactored Python code and used PyO3 to integrate Rust, achieving over 200% performance improvements.\r\n\r\nA key challenge when adding Rust to a Python codebase is maintaining robust observability.\r\nWe\u2019ll explain how we extended our OpenTelemetry and Sentry observability stack to include Rust components, ensuring seamless monitoring, tracing, and debugging across the entire stack.\r\n\r\nThroughout the session, we\u2019ll illustrate the process with a practical example: a simplified version of our own application, which includes both I/O-heavy and compute-heavy tasks.\r\nYou\u2019ll see how to break down business logic and decide which parts to migrate to Rust for maximum performance benefit.\r\n\r\nBy the end of this session, you will be equipped with the knowledge to assess where Rust can improve your Python application\u2019s performance, and how to integrate it in a reliable and observable way.\r\nThis session is ideal for anyone looking to optimize Python performance with Rust, while keeping applications running.", "recording_license": "", "do_not_record": false, "persons": [{"code": "ZYGLV3", "name": "Max H\u00f6hl", "avatar": "https://pretalx.com/media/avatars/ZYGLV3_ApfJYZi.jpg", "biography": "I\u2019m a Senior Software Developer in the team behind SAP\u2019s huge CI/CD infrastructure for SAP HANA.\r\nWe design, implement, operate and maintain it's cloud native graph-based task execution framework leveraging 2000 compute nodes in multiple data centers and cloud provider regions.\r\nIn my spare time, I like to play Dungeons & Dragons.", "public_name": "Max H\u00f6hl", "guid": "4917aac4-7ee0-5799-8a5e-01cef77b5bfe", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/ZYGLV3/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/QXSQKL/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/QXSQKL/", "attachments": []}, {"guid": "30beb0d4-0738-5b20-9245-36cf3ba3f82a", "code": "P9VKRV", "id": 61215, "logo": null, "date": "2025-04-25T13:20:00+02:00", "start": "13:20", "duration": "00:30", "room": "Titanium3", "slug": "pyconde-pydata-2025-61215-extending-python-with-rust-mojo-cuda-and-c-and-building-packages", "url": "https://pretalx.com/pyconde-pydata-2025/talk/P9VKRV/", "title": "Extending Python with Rust, Mojo, Cuda and C and building packages", "subtitle": "", "track": "PyData: PyData & Scientific Libraries Stack", "type": "Talk", "language": "en", "abstract": "We all love Python - but we especially love it for its unique ability as a glue language.\r\n\r\nIn this talk we will show a number of ways of extending Python: using Rust, C and Cython, C++, CUDA and Mojo! We will use the pixi package manager and the open source conda-forge distribution to demonstrate how to easily build custom Python extensions with these languages.\r\n\r\nThe main challenge with custom extensions is about distributing them. The new pixi build feature makes it easy to build a Python extension into a conda package as well as wheel file for PyPI.\r\n\r\nPixi will manage not only Python, but also the compilers and other system-level dependencies.", "description": "Extending Python with native code is a common way of speeding up the execution. There are a number of traditional ways of writing Python extensions (Fortran, C, C++) but lately some modern languages have also entered the game (Rust, Mojo, and let\u2019s count CUDA as modern, too).\r\n\r\nAll of these have slightly different ways of writing Python extensions, and they require the installation of a compiler and compilation tool chain, as well as possibly other system dependencies. Installing, updating and managing the system dependencies is usually a bit of a hassle, and it is where pixi comes in. Pixi is a new package manager that builds on top of the Conda ecosystem. The community distribution \u201cconda-forge\u201d already has tools like C and Rust compilers, and thus it\u2019s easy to maintain the compiler + Python tool chain in a single project.\r\n\r\nThe new \u201cpixi build\u201d feature makes it even easier to build complex multi-language workspaces that combine different Python versions, compilers, and languages.\r\n\r\nIn the talk we will show lots of live demos going over simple numerical examples that highlight the different ways of extending Python (using pybind11, nanobind, PyO3, Mojo, \u2026) glued together in a single workspace with pixi build compiling the extensions from source. We will also demonstrate how pixi can help not only depending on other packages from source, but also by building the packages into Conda and Wheel (PyPI) packages that can be shared.\r\n\r\nAfter the talk, the listeners will have seen a number of ways how to extend Python code (easily!) with native languages and will have an understanding of benefits and drawbacks of the different approaches.", "recording_license": "", "do_not_record": false, "persons": [{"code": "8ZWEGR", "name": "Wolf Vollprecht", "avatar": "https://pretalx.com/media/avatars/8ZWEGR_3X2GOK0.jpeg", "biography": "Wolf Vollprecht has been active in the Python open source community for the past 5 years. He is a core member of conda-forge and the conda steering council, and the original author of the mamba package manager. He also has extensive experience in high-performance C++ and Rust. 2 years ago he started prefix.dev where the team is focusing all efforts on making cross-platform, language independent package management great (on top of the conda ecosystem).", "public_name": "Wolf Vollprecht", "guid": "d7d80768-ffbe-56af-b55c-5b65c9fb8d09", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/8ZWEGR/"}, {"code": "7YTCGE", "name": "Ruben Arts", "avatar": "https://pretalx.com/media/avatars/7YTCGE_F24UFc1.jpg", "biography": "After studying robotics and working in the field for some years, I've noticed that package management was one of the bigger unsolved issues in Robotics, so I joined prefix.dev to solve that issue!\r\nCurrently, I'm a core maintainer of [`pixi`](pixi.sh) and take on a big part of the community management.", "public_name": "Ruben Arts", "guid": "6acdee43-32b2-5373-a617-982120266323", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/7YTCGE/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/P9VKRV/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/P9VKRV/", "attachments": []}, {"guid": "4d8d0e3f-5bc8-5de0-9360-a07344bdcac5", "code": "GUEAHT", "id": 59909, "logo": null, "date": "2025-04-25T14:00:00+02:00", "start": "14:00", "duration": "00:30", "room": "Titanium3", "slug": "pyconde-pydata-2025-59909-offline-disaster-relief-coordination-with-openstreetmap-and-fastapi", "url": "https://pretalx.com/pyconde-pydata-2025/talk/GUEAHT/", "title": "Offline Disaster Relief Coordination with OpenStreetMap and FastAPI", "subtitle": "", "track": "General: Infrastructure - Hardware & Cloud", "type": "Talk", "language": "en", "abstract": "In natural disaster scenarios, reliable communication is crucial. This talk presents a solution for disaster relief coordination using OpenStreetMap vector maps hosted on a local device in the emergency vehicle with FastAPI, ensuring functionality without an internet connection. By integrating a database of post codes and street names, and leveraging a LORAWAN gateway to receive positional data and water levels, this system ensures access to critical information even in blackout situations.", "description": "In natural disaster scenarios, effective coordination of relief efforts is essential, especially when traditional communication networks are compromised or overloaded. This involves organizing various aspects of disaster response without internet connectivity, such as dike defense, sandbag logistics, power supply, distribution of relief goods, clearing and repairing roadways, searching for missing persons, securing structurally unstable buildings and salvaging property. Typically, emergency responders are mobilized from different regions and may not be familiar with the affected area. Since it is unlikely that responders will have pre-downloaded offline maps of the target region on their devices, a vehicle-hosted Wi-Fi hotspot providing nationwide maps would be invaluable. This presentation introduces a practical solution for offline disaster relief coordination using OpenStreetMap vector maps hosted on a local device in the emergency vehicle with FastAPI, allowing responders to use their existing devices to access critical geographical information and enhancing the efficiency and effectiveness of disaster relief operations.\r\n\r\nThe solution includes:\r\n\r\nFastAPI Server Setup: Configuring and deploying a FastAPI server to host OpenStreetMap vector maps offline on a Raspberry Pi, which also offers a WiFi hotspot for existing end devices, ensuring accessibility without an internet connection.\r\nDatabase Integration: Integrating a comprehensive database of post codes and street names to facilitate quick and accurate location searches.\r\nLORAWAN Gateway Integration: Implementing a LORAWAN gateway to receive real-time positional data and water level measurements, providing up-to-date situational awareness for disaster relief workers.", "recording_license": "", "do_not_record": false, "persons": [{"code": "KSM737", "name": "Jannis L\u00fcbbe", "avatar": null, "biography": "2008 \r\nM.Sc. Physics and Computer Science at Osnabr\u00fcck University\r\n\r\n2012 \r\nPhD in Physics at Osnabr\u00fcck University\r\n\r\n2013 - now \r\nSensor Developer at ROSEN Group\r\n\r\n2020 - now \r\nVolunteer operative in the Federal Agency for Technical Relief (THW, Germany)", "public_name": "Jannis L\u00fcbbe", "guid": "5f0a198b-8135-5bd0-a9c6-203cc457e65f", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/KSM737/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/GUEAHT/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/GUEAHT/", "attachments": []}, {"guid": "dbf983d2-3da8-54a9-9ab8-e11ac30cd23c", "code": "ECNDQM", "id": 61855, "logo": null, "date": "2025-04-25T14:40:00+02:00", "start": "14:40", "duration": "00:30", "room": "Titanium3", "slug": "pyconde-pydata-2025-61855-3-ways-to-speed-up-your-regression-modeling-in-python", "url": "https://pretalx.com/pyconde-pydata-2025/talk/ECNDQM/", "title": "3 Ways to Speed up Your Regression Modeling in Python", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Statistics", "type": "Talk", "language": "en", "abstract": "Linear Regression is the workhorse of statistics and data science. Some data scientists even go as far and argue that \"linear regression is all you need\". \r\n\r\nIn this talk, we will introduce three ways to run regression models faster by using smarter algorithms, implemented in the scikit-learn & fastreg (sparse solvers), pyfixest (Frisch-Waugh-Lovell), and duckreg (regression compression via duckdb) libraries.", "description": "We introduce three different ways to make regressions run faster. \r\n\r\nWe first introduce sparse solvers and show how to run regressions on sparse matrices via scikit-learn and the fastreg libraries. \r\n\r\nWe then lay out the Frisch-Waugh-Lovell theorem and the alternating projections algorithm and show how to speed it up on the CPU (via numba) and on the GPU (via JAX) as implemented in the pyfixest library. \r\n\r\nFinally, we demonstrate how to drastically speed up regression estimation by first preprocessing the data in duckdb and then fitting a regression via weighted least squares in memory. \r\n\r\nReferences: \r\n- fastreg: https://github.com/iamlemec/fastreg\r\n- scikit-learn: https://github.com/scikit-learn/scikit-learn\r\n- pyfixest: https://github.com/py-econometrics/pyfixest\r\n- duckreg: https://github.com/py-econometrics/duckreg", "recording_license": "", "do_not_record": false, "persons": [{"code": "KFGF9T", "name": "Alexander Fischer", "avatar": "https://pretalx.com/media/avatars/KFGF9T_0TgysAR.JPG", "biography": "Economist and Data Scientist. I spend most of my week working on online auctions at Trivago. In the evenings and weekend, I work on open source packages for regression modeling and inference in R and Python.", "public_name": "Alexander Fischer", "guid": "95b3c7eb-3207-52e4-b2d6-97866f31d445", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/KFGF9T/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/ECNDQM/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/ECNDQM/", "attachments": []}], "Helium3": [{"guid": "b721f5bb-c103-5c9f-a931-e3de88e32e5d", "code": "WCDPLP", "id": 59664, "logo": null, "date": "2025-04-25T10:15:00+02:00", "start": "10:15", "duration": "00:30", "room": "Helium3", "slug": "pyconde-pydata-2025-59664-fasthtml-vs-streamlit-the-dashboarding-face-off", "url": "https://pretalx.com/pyconde-pydata-2025/talk/WCDPLP/", "title": "FastHTML vs. Streamlit - The Dashboarding Face Off", "subtitle": "", "track": "PyCon: Django & Web", "type": "Talk", "language": "en", "abstract": "In the right corner, we have the go-to dashboarding solution for showcasing ML models or visualizing data, **STREAMLIT** (\\*crowd cheers\\*). Simple yet powerful, it defends the throne of Python dashboarding, but have you ever tried to create complex interactions with it? Things like drill-downs or logins, can make your control flow become messy really quick (\\*crowd nods knowlingly\\*).\r\n\r\nAnd in the left corner, the new contender in the arena of Python web frameworks which, according to its docs, \"*excels at building dashboards*\", **FastHTML** (\\*crowd whoops\\*). We will see if this is true, in the **ultimate dashboarding face off** (\\*crowd gasps\\*). By building the same dashboard, step by step, in both frameworks, investigate their strengths and weaknesses, we will see which framework can claim the crown.", "description": "Streamlit is the go-to dashboarding solution for showcasing ML models or visualizing data. It has a vibrant community, multiple years of development under its belt, and tons of third-party integrations. On the other hand, everyone that tried to create complex interactions, like drill-downs or logins, knows that control flow can get messy really quick. Initially simple dashboards often evolve into something bigger and the simple-but-powerful Streamlit formula may not always be up to the tasks.\r\n\r\nFastHTML is a new contender in the arena of Python web frameworks and, according to its docs, \"it excels at building dashboards.\" FastHTML stands on the shoulders of giants, giving you a smooth Python experience for authoring web pages, while allowing access to the foundations of the web, like CSS and JS, at any time. We will see if FastHTML can put code where its mouth is, by building the same dashboard, step by step, in both frameworks and investigate their strengths and weaknesses.\r\n\r\nThis is a talk for data enthusiasts that dabble in web technologies for the sake of showcasing their work or building internal tooling. Do not expect a course on building customer-facing web apps. We will build a dashboard that features:\r\n\r\n- an interactive Plotly chart\r\n- a drill-down with detailed information shown in  a second plot\r\n- a login\r\n- multiple pages and navigation\r\n\r\nWe will examine how hard or easy it is to implement each of these features and how interacting with them in the browser feels. At the end we will see if the reigning champion can defend their crown or if the ambitious contender takes the win.", "recording_license": "", "do_not_record": false, "persons": [{"code": "WLLGPM", "name": "Tilman Krokotsch", "avatar": "https://pretalx.com/media/avatars/WLLGPM_MUTJ0a9.jpeg", "biography": "I'm a data scientist, machine learning engineer, AI developer, or whatever else you want to call it. After finishing my PhD I am now working as a consultant at Dataciders ixto, where I'm helping our customers to never make wrong decisions again.", "public_name": "Tilman Krokotsch", "guid": "45803626-1598-58be-a493-78e4dcc30f3b", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/WLLGPM/"}], "links": [{"title": "Talk Repository with Live Demo", "url": "https://github.com/tilman151/streamlit-vs-fasthtml", "type": "related"}], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/WCDPLP/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/WCDPLP/", "attachments": []}, {"guid": "d479df5d-e747-5152-a311-07b750c5b1ce", "code": "ZMKJAY", "id": 61236, "logo": null, "date": "2025-04-25T10:55:00+02:00", "start": "10:55", "duration": "00:30", "room": "Helium3", "slug": "pyconde-pydata-2025-61236-death-by-a-thousand-api-versions", "url": "https://pretalx.com/pyconde-pydata-2025/talk/ZMKJAY/", "title": "Death by a Thousand API Versions", "subtitle": "", "track": "PyCon: Django & Web", "type": "Talk", "language": "en", "abstract": "API versioning is tough, really tough. We tried multiple approaches to versioning in production and eventually ended up with a solution we love. During this talk you will look into the tradeoffs of the most popular ways to do API versioning, and I will recommend which ones are fit for which products and companies. I will also present my framework, Cadwyn, that allows you to support hundreds of API versions with ease -- based on FastAPI and inspired by Stripe's approach to API versioning.\r\n\r\nAfter this session, you will understand which approach to pick for your company to make your versioning cost effective and maintainable without investing too much into it.", "description": "Web API Versioning is a way to allow your developers to move quickly and break things while your clients enjoy the stable API in long cycles. It is best practice for any API-first company to have API Versioning in one way or another. Otherwise, the company will either be unable to improve their API or their clients will have their integrations broken every few months.\r\n\r\nI'll cover all sorts of approaches you can pick to add incompatible features to your API: extremely stable and expensive, easy-looking but horrible in practice, and even completely version-less yet viable. I will provide you with the best practices of how you could find or implement a modern API versioning solution and will discuss the versioning at Stripe in great detail.\r\n\r\nWhen you leave, you'll have enough information to make your API Versioning user-friendly without overburdening your developers.", "recording_license": "", "do_not_record": false, "persons": [{"code": "UDHJSP", "name": "Stanislav Zmiev", "avatar": "https://pretalx.com/media/avatars/UDHJSP_JIPG5fp.JPG", "biography": "Experienced platform engineer and architect with a passion for open source and developer tools. The author of Cadwyn -- a sophisticated API Versioning framework based on FastAPI. A contributor to numerous projects such as CPython and tortoise-orm. Currently building the future of finance at Monite.", "public_name": "Stanislav Zmiev", "guid": "d6f56f46-5226-5e46-b9be-1e542c1df1b3", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/UDHJSP/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/ZMKJAY/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/ZMKJAY/", "attachments": []}, {"guid": "c1ee31cf-0fad-58e5-b395-4d36f19e2bb6", "code": "3DSU8V", "id": 61112, "logo": null, "date": "2025-04-25T11:35:00+02:00", "start": "11:35", "duration": "00:45", "room": "Helium3", "slug": "pyconde-pydata-2025-61112-hands-on-llm-security-attacks-and-countermeasures-you-need-to-know", "url": "https://pretalx.com/pyconde-pydata-2025/talk/3DSU8V/", "title": "Hands-On LLM Security: Attacks and Countermeasures You Need to Know!", "subtitle": "", "track": "PyCon: Security", "type": "Talk (long)", "language": "en", "abstract": "Dive into the vulnerabilities of LLMs and learn how to prevent them\r\nFrom prompt injection to data poisoning, we\u2019ll demonstrate real-world attack scenarios and reveal essential countermeasures to safeguard your applications.", "description": "The rapid increase in usage of large language models (LLMs) in the last years makes it necessary to address the specific security risks of LLMs.\r\nIn this presentation, we will examine typical vulnerabilities in LLMs from a practical perspective. Starting with a systematic overview, we will use a specific demo app to illustrate the various attack scenarios. Vulnerabilities like prompt injection, data poisoning and system prompt leakage will be explained and demonstrated as well as attacks on RAG and agent implementations.\r\nIn addition to a basic introduction and a presentation of specific vulnerabilities, the talk also presents suitable countermeasures and general best practices for the use of LLMs in productive applications.\r\n\r\nWhat to expect? \r\nAttending this talk, you learn which vulnerabilities need to be considered when using and integrating LLMs. You will see how specific attacks work and what risks are associated with them. You will also learn which countermeasures are suitable and how these can be implemented technically.", "recording_license": "", "do_not_record": false, "persons": [{"code": "DS3TQU", "name": "Clemens H\u00fcbner", "avatar": "https://pretalx.com/media/avatars/DS3TQU_q2Okxq8.jpg", "biography": "For more than ten years, Clemens H\u00fcbner has been working at the interface between software and security. After roles as a software developer and in penetration testing, he joined inovex in 2018 as a software security engineer. Today, he supports development projects at the conception and implementation level and is a trainer both in-house and for clients. He advises on secure development processes and DevSecOps. As speaker, he is invited to national and international conferences.", "public_name": "Clemens H\u00fcbner", "guid": "2211c8c4-18d2-5bc3-8719-afbfb0213fe8", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/DS3TQU/"}, {"code": "BZNSWG", "name": "Florian Teutsch", "avatar": "https://pretalx.com/media/avatars/BZNSWG_E8n9UpR.jpg", "biography": "Florian Teutsch possesses extensive knowledge in the field of generative AI and works as a Machine Learning Engineer at inovex. After successfully completing his studies in Information Systems at the University of Cologne in 2020, he worked for two years as a Data Scientist on an innovative AI-based image search. Since joining inovex, he has been able to continuously expand his practical experience in the field of generative AI.", "public_name": "Florian Teutsch", "guid": "fc679cfa-aca5-5e99-9493-89bdac9a589b", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/BZNSWG/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/3DSU8V/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/3DSU8V/", "attachments": []}, {"guid": "ecd6569c-bdc0-50bd-8201-398e6acfe94b", "code": "DNVCEY", "id": 61790, "logo": null, "date": "2025-04-25T13:20:00+02:00", "start": "13:20", "duration": "00:30", "room": "Helium3", "slug": "pyconde-pydata-2025-61790-electify-retrieval-augmented-generation-for-voter-information-in-the-2024-european-election", "url": "https://pretalx.com/pyconde-pydata-2025/talk/DNVCEY/", "title": "Electify - Retrieval-Augmented Generation for Voter Information in the 2024 European Election", "subtitle": "", "track": "PyData: Natural Language Processing & Audio (incl. Generative AI NLP)", "type": "Talk", "language": "en", "abstract": "In general elections, voters often face the challenge of navigating complex political landscapes and extensive party manifestos. To address this, we developed Electify, an interactive application that utilizes Retrieval-Augmented Generation (RAG) to provide concise summaries of political party positions based on individual user queries. During its first roll-out for the European Election 2024, Electify attracted more than 6,000 active users. This talk will explore its development and deployment. It will focus on its technical architecture, the integration of data from party manifestos and parliamentary speeches, and the challenges of ensuring political neutrality and providing accurate replies. Additionally, we will discuss user feedback and ethical considerations, focusing on how generative AI can enhance voter information systems.", "description": "In general elections, voters often face the  challenge of navigating complex political landscapes. These challenges include understanding the differences between nuanced policy positions, comparing extensive party manifestos, and reconciling conflicting information from various sources. The sheer volume of information and the high frequency of elections can lead to voter fatigue and disengagement [1]. Existing tools like Wahlomat are helpful for voters but don\u2019t adapt well to individual preferences or specific questions. To address these issues, we developed Electify\u2014an interactive application designed to empower voters by addressing these pain points. \r\n\r\nUsing Retrieval-Augmented Generation (RAG), Electify simplifies the decision-making process by enabling users to access concise and relevant summaries of political party positions tailored to their individual queries. Our user interface provides the possibility to fact-check the generated responses by directly showing the original sources. Additionally, we included a blinding feature to combat confirmation bias: users can hide party names and read summaries of their positions before unblinding. This talk will explore the technical development and deployment of Electify, covering its  architecture, integration of data from party manifestos and parliamentary speeches, and strategies to maintain political neutrality and accuracy in responses. In particular, we will discuss our efforts to use reranking to improve context relevancy and LLM-as-a-judge evaluation for parameter optimization. We identify a trade-off between factual accuracy and the frequency of denied responses, which we think is highly relevant for generative AI systems that operate within sensitive areas like voter information [2].\r\n\r\nDuring its first roll-out for the European Election 2024, Electify received significant attention, attracting 6,000 active users who leveraged the platform to make more informed and confident voting decisions. We will address the lessons learned from user feedback and discuss the ethical considerations involved, emphasizing the potential of generative AI to enhance voter information systems and promote political engagement.\r\n\r\nContributors: Christian Liedl, Anna Neifer, Joshua Nowak\r\n[Github Repository](https://github.com/electify-eu/europarl-ai)\r\n\r\n[1] Kostelka et al. \"Election frequency and voter turnout.\" Comparative Political Studies 56.14 (2023)\r\n[2] Cao, Lang. \"Learn to Refuse: Making Large Language Models More Controllable and Reliable through Knowledge Scope Limitation and Refusal Mechanism.\" arXiv:2311.01041 (2023).", "recording_license": "", "do_not_record": false, "persons": [{"code": "GMGGWH", "name": "Christian Liedl", "avatar": null, "biography": "Christian hat einige Jahre als Physiker auf dem Gebiet der experimentellen Quantenoptik geforscht und sich seit 2024 auf Data Science und K\u00fcnstliche Intelligenz spezialisiert.", "public_name": "Christian Liedl", "guid": "adf99943-1728-5059-8dde-201e5de3600a", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/GMGGWH/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/DNVCEY/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/DNVCEY/", "attachments": []}, {"guid": "38a77727-f27a-56ac-8f19-a3892be8e743", "code": "CUJMCD", "id": 61860, "logo": null, "date": "2025-04-25T14:00:00+02:00", "start": "14:00", "duration": "00:30", "room": "Helium3", "slug": "pyconde-pydata-2025-61860-practical-python-rust-building-and-maintaining-dual-language-libraries", "url": "https://pretalx.com/pyconde-pydata-2025/talk/CUJMCD/", "title": "Practical Python/Rust: Building and Maintaining Dual-Language Libraries", "subtitle": "", "track": "General: Rust", "type": "Talk", "language": "en", "abstract": "Building performant Python often means reaching for C extensions. This talk explores an alternative: leveraging Rust to create blazing-fast Python modules that also benefit the Rust ecosystem. I will share practical strategies from building `semantic-text-splitter`, a library for fast and accurate text segmentation used in both Python and Rust, demonstrating how to bridge the gap between these two languages and unlock new possibilities for performance and cross-language collaboration.", "description": "Building performant Python often means reaching for C extensions. But what if you could achieve similar performance with Rust, while also creating a library usable directly within the Rust ecosystem? This talk explores how Rust can be a powerful ally, creating blazing-fast Python modules that benefit both communities. I will share the strategies I use while building and maintaining my package, `semantic-text-splitter`, used for fast and accurate text segmentation, which sees significant usage in both Python and Rust ecosystems.\r\n\r\nSome key challenges arise when integrating these two languages, such as bridging the gap between Rust's generics and Python's dynamic typing, managing data representation and memory across the Python/Rust boundary, and maintaining type hints and documentation across both languages.\r\n\r\nBut with practical maintenance strategies, these challenges can be overcome. Moreover, you contribute to a growing ecosystem of high-performance Python tools powered by Rust. Join me to learn how to build and maintain dual-language Python/Rust libraries, and discover how this approach can unlock new possibilities for performance and cross-language collaboration.", "recording_license": "", "do_not_record": false, "persons": [{"code": "KZEU3C", "name": "Ben Brandt", "avatar": "https://pretalx.com/media/avatars/KZEU3C_yuSp9yK.jpg", "biography": "Ben has been identifying as a Rustacean since 2018. With a background in UI/UX, he's excited to use Rust to make products that are faster, more resilient, and delightful for his users. He's currently a Staff Engineer at Aleph Alpha, where he uses Rust to make AI applications easier to build and operate.", "public_name": "Ben Brandt", "guid": "301450b6-fc76-5a8e-b30a-dcdb26427ba6", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/KZEU3C/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/CUJMCD/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/CUJMCD/", "attachments": []}, {"guid": "ab49df39-ac2d-5784-8cb9-cb2e45ca0d59", "code": "KDGZ8K", "id": 68948, "logo": null, "date": "2025-04-25T14:40:00+02:00", "start": "14:40", "duration": "00:30", "room": "Helium3", "slug": "pyconde-pydata-2025-68948-switching-from-data-scientist-to-manager", "url": "https://pretalx.com/pyconde-pydata-2025/talk/KDGZ8K/", "title": "Switching from Data Scientist to Manager", "subtitle": "", "track": "General: Education, Career & Life", "type": "Talk", "language": "en", "abstract": "In this presentation, I will discuss my transition from a Data Scientist to a management role, covering key managerial responsibilities, preparation tips, and the pros and cons of the switch. The talk is particularly relevant for engineers who have recently moved into management or are considering the change, as well as those interested in understanding the challenges managers face. The session will include brief presentations followed by interactive discussions with the audience.", "description": "In this presentation, I will discuss my transition from a Data Scientist to a management position. After a brief introduction, I will provide an overview of the key aspects of managerial responsibilities. Then I will share some advice on how to prepare for the transition, what you cannot prepare for, and how to start as a new manager. Ultimately, I will share my perspective on the transition and outline the pros and cons of the new role. \r\nThis talk is particularly relevant to engineers who have recently transitioned to management or are considering a change in roles. It will also be of value for those who are keen to understand the challenges their manager is (probably) facing and how to help them. This session will consist of a few rounds of a brief presentation, followed by an interactive session with the audience.", "recording_license": "", "do_not_record": false, "persons": [{"code": "EZMJWT", "name": "Theodore Meynard", "avatar": "https://pretalx.com/media/avatars/EZMJWT_XWSm1xx.jpg", "biography": "Theodore Meynard is a data science manager at GetYourGuide.He leads the evolution of their ranking algorithm, helping customers to find the best activities to book and locations to explore. Beyond work, he is one of the co-organizers of the Pydata Berlin meetup and the conference. \r\nWhen he is not programming, he loves riding his bike, looking for the best bakery-patisserie in town.", "public_name": "Theodore Meynard", "guid": "86973d97-e18a-5002-9b99-7690509f6220", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/EZMJWT/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/KDGZ8K/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/KDGZ8K/", "attachments": [{"title": "Slides", "url": "/media/pyconde-pydata-2025/submissions/KDGZ8K/resources/202504_pCclkkX.pdf", "type": "related"}]}], "Platinum3": [{"guid": "95f668d4-28c9-53ee-9434-2dc7305ed16f", "code": "ADSXCA", "id": 61064, "logo": null, "date": "2025-04-25T10:15:00+02:00", "start": "10:15", "duration": "00:30", "room": "Platinum3", "slug": "pyconde-pydata-2025-61064-where-have-all-the-post-offices-gone-discovering-neighborhood-facilities-with-python-and-osm", "url": "https://pretalx.com/pyconde-pydata-2025/talk/ADSXCA/", "title": "Where have all the post offices gone? Discovering neighborhood facilities with Python and OSM", "subtitle": "", "track": "PyData: Data Handling & Engineering", "type": "Talk", "language": "en", "abstract": "When it comes to open geographic data, OpenStreetMap is an awesome resource. Getting started and figuring out how to make the most out of the data available can be challenging.\r\n\r\nUsing a personal example: frustration at the apparent lack of post offices in my neighborhood, we'll walk through examples of how to parse, filter, process, and visualize geospatial data with Python.\r\n\r\nAt the end of this talk, you will know how to process geographic data from OpenStreetMap using Python and find out some surprising info that I learned while answering the question: Where have all the post offices gone?", "description": "**Problem statement**\r\n\r\nNeeding an international postcard stamp, I headed to my nearest post office only to find out that it was permanently closed, the latest closure among others in recent memory. Was this just in my neighborhood or was this happening all over the state? To answer these questions, I turned to open data and Python.\r\n- What is OpenStreetMap?\r\n\r\n**How can we identify types of places, like post offices and districts, in OpenStreetMap?**\r\n\r\n- Types of data in OSM\r\n- Tags\r\n- Tools for diving into the data to get an idea of how it is structured and how to construct queries: Overpass API, overpass turbo\r\n\r\n**How can we access the raw OSM data and work with it in Python?**\r\n\r\n- How many post offices are there in each neighborhood? What about by area or population?\r\n- Working with PBF files: parsing and filtering with the PyOsmium library\r\n- Using GeoPandas to store the data in a GeoDataFrame and apply transformations\r\n\r\n**What are some tools for visualizing the data?**\r\n\r\n- How can we make an interactive plot of post offices in each neighborhood? What about other facilities and resources?\r\n- Plot directly from a GeoDataFrame\r\n- Interactive plotting\r\n\r\nWhile this talk is aimed at those beginning with geographic data, it would be helpful to have some background knowledge about Python and data handling.", "recording_license": "", "do_not_record": false, "persons": [{"code": "ST3QCB", "name": "Katie Richardson", "avatar": "https://pretalx.com/media/avatars/ST3QCB_Duk0V2b.jpg", "biography": "Katie Richardson is a Staff Data Scientist at Blue Yonder, where she currently works on demand forecasting. Before joining Blue Yonder, she was primarily focused on the domain of search, ranking, and recommendation. With a background in Anthropology and several years experience working with geographical data, she's passionate about exploring spatial inequalities using open data. In her free time, she's an avid tap dancer.", "public_name": "Katie Richardson", "guid": "fac4f959-d4f8-56da-9e8e-b8874b8a6bbd", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/ST3QCB/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/ADSXCA/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/ADSXCA/", "attachments": [{"title": "Slides", "url": "/media/pyconde-pydata-2025/submissions/ADSXCA/resources/slides_qWPX2o0.pdf", "type": "related"}]}, {"guid": "8b9b4a76-dbb3-523f-a476-1acf3e9229db", "code": "XRHEYZ", "id": 61121, "logo": null, "date": "2025-04-25T10:55:00+02:00", "start": "10:55", "duration": "00:30", "room": "Platinum3", "slug": "pyconde-pydata-2025-61121-the-foundation-model-revolution-for-tabular-data", "url": "https://pretalx.com/pyconde-pydata-2025/talk/XRHEYZ/", "title": "The Foundation Model Revolution for Tabular Data", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Statistics", "type": "Talk", "language": "en", "abstract": "What if we could make the same revolutionary leap for tables that ChatGPT made for text? While foundation models have transformed how we work with text and images, tabular / structured data (spreadsheets and databases) - the backbone of economic and scientific analysis - has been left behind. TabPFN changes this. It's a foundation model that achieves in 2.8 seconds what traditional methods need 4 hours of hyperparameter tuning for - while delivering better results. On datasets up to 10,000 samples, it outperforms every existing Python library, from XGBoost to CatBoost to Autogluon.\r\n\r\nBeyond raw performance, TabPFN brings foundation model capabilities to tables: native handling of messy data without preprocessing, built-in uncertainty estimation, synthetic data generation, and transfer learning - all in a few lines of Python code. Whether you're building risk models, accelerating scientific research, or optimizing business decisions, TabPFN represents the next major transformation in how we analyze data. Join us to explore and learn how to leverage these new capabilities in your work.", "description": "TabPFN shows how foundation model concepts can advance tabular data analysis in Python. Published in Nature Magazine in January 2025, it found strong community adoption with >3,000+ GitHub stars and 1,000,000+ downloads.\r\n\r\n**Detailed Outline:**\r\n\r\n1. **Motivation** \r\n- Why tabular data: examples of tabular prediction tasks and time series forecasting \r\n- Why foundation models for tabular data\r\n- Learning from the foundation model revolution in text and vision\r\n\r\n2. **Technical Insights**\r\n- How we adapted transformers for tabular data\r\n- Making in-context learning work for structured data\r\n- Performance characteristics and resource requirements\r\n- How to apply TabPFN to time series\r\n\r\n3. **Practical Applications**\r\n- When to choose TabPFN vs traditional methods\r\n- Resource requirements and scalability limits  \r\n- What's next for TabPFN\r\n\r\n4. **Colab Demo**\r\n\r\n- Q&A\r\n\r\n**Key Takeaways:**\r\n- Practical understanding of TabPFN's capabilities and limitations\r\n- Hands-on experience integrating with Python data science workflows\r\n- Best practices for working with foundation models on tabular data\r\n- Insight into emerging approaches for structured data analysis", "recording_license": "", "do_not_record": false, "persons": [{"code": "WEJ8W7", "name": "Noah Hollmann", "avatar": null, "biography": null, "public_name": "Noah Hollmann", "guid": "a37cae43-ce60-5a3d-bacc-77e5bcf505e1", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/WEJ8W7/"}, {"code": "79EEXA", "name": "Frank Hutter", "avatar": "https://pretalx.com/media/avatars/79EEXA_1PzAEHn.jpg", "biography": "Frank is a Hector-Endowed Fellow and PI at the ELLIS Institute T\u00fcbingen and has been a full professor for Machine Learning at the University of Freiburg (Germany) since 2016. Previously, he has been an Emmy Noether Research Group Lead at the University of Freiburg since 2013. Before that, he did a PhD (2004-2009) and postdoc (2009-2013) at the University of British Columbia (UBC) in Canada. He received the 2010 CAIAC doctoral dissertation award for the best thesis in AI in Canada, as well as several best paper awards and prizes in international ML competitions. He is a Fellow of ELLIS and EurAI, Director of the ELLIS unit Freiburg, and the recipient of 3 ERC grants. Frank is best known for his research on automated machine learning (AutoML), including neural architecture search, efficient hyperparameter optimization, and meta-learning. He co-authored the first book on AutoML and the prominent AutoML tools Auto-WEKA, Auto-sklearn and Auto-PyTorch, won the first two AutoML challenges with his team, is co-teaching the first MOOC on AutoML, co-organized 15 AutoML-related workshops at ICML, NeurIPS and ICLR, and founded the AutoML conference as general chair in 2022. In recent years, his focus has been on the intersection of foundation models and AutoML, prominently including the first foundation model for tabular data, TabPFN.", "public_name": "Frank Hutter", "guid": "77892c7e-dbbc-520e-b54b-d477aa894431", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/79EEXA/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/XRHEYZ/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/XRHEYZ/", "attachments": []}, {"guid": "277dbbec-18e6-5c75-bd73-299969ca6379", "code": "JABVHK", "id": 61251, "logo": null, "date": "2025-04-25T11:35:00+02:00", "start": "11:35", "duration": "00:30", "room": "Platinum3", "slug": "pyconde-pydata-2025-61251-enhancing-rag-with-fast-graphrag-and-instructlab-a-scalable-interpretable-and-efficient-framework", "url": "https://pretalx.com/pyconde-pydata-2025/talk/JABVHK/", "title": "Enhancing RAG with Fast GraphRAG and InstructLab: A Scalable, Interpretable, and Efficient Framework", "subtitle": "", "track": "PyData: Generative AI", "type": "Talk", "language": "en", "abstract": "Retrieval Augmented Generation (RAG) has become a cornerstone in enriching GenAI outputs with external data, yet traditional frameworks struggle with challenges like data noise, domain specialization, and scalability. In this talk, Tuhin will dive into open-source frameworks Fast GraphRAG and InstructLab, which addresses these limitations by combining knowledge graphs with the classical PageRank algorithm and Fine-tuning, delivering a precision-focused, scalable, and interpretable solution. By leveraging the structured context of knowledge graphs, Fast GraphRAG enhances data adaptability, handles dynamic datasets efficiently, and provides traceable, explainable outputs while InstructLab adds domain depth to the LLM through Fine-tuning. Designed for real-world applications, it bridges the gap between raw data and actionable insights, redefining intelligent retrieval for developers, researchers, and enterprises. This talk will showcase Fast GraphRAG\u2019s transformative features coupled with domain specific Fine-tuning leveraging InstructLab and demonstrate its potential to elevate RAG\u2019s capabilities in handling the evolving demands of large language models (LLMs) for developers, researchers, and businesses.", "description": "Retrieval Augmented Generation (RAG) has changed the way AI systems incorporate external knowledge, but it often falls short when faced with real-world challenges like adapting to new data, managing complexity, or delivering reliable answers. Fast GraphRAG steps in to address these gaps with a refreshing approach that blends the structure of knowledge graphs with the proven efficiency of algorithms like PageRank. By focusing on interpretability, scalability, and adaptability, Fast GraphRAG creates a pathway for building AI systems that don\u2019t just retrieve data but leverage it in a meaningful way. \r\n\r\nThe agenda for the talk is as follows\r\n\r\nChallenges in Traditional RAG\r\n    - Lack of interpretability leads to untrustworthy outputs.\r\n    - High computational costs limit scalability.\r\n    - Inflexibility makes adapting to evolving data cumbersome.\r\nFast GraphRAG\u2019s Core Innovations\r\n    - Interpretability: Knowledge graphs provide clear, traceable reasoning.\r\n    - Scalability: Efficient query resolution with minimal overhead.\r\n    - Adaptability: Dynamic updates ensure relevance in changing domains.\r\n    - Precision: PageRank sharpens focus on high-value information.\r\n    - Robust Workflows: Typed and asynchronous handling for complex scenarios.\r\nHow Fast GraphRAG Works\r\n    - Architecture and algorithmic innovations.\r\n    - Knowledge graphs for intelligent reasoning.\r\n    - PageRank for multi-hop exploration and precise retrieval.\r\n    - Entity extraction, incremental updates, and graph exploration.\r\n    - Role of InstructLab and Fine-tuning.\r\nDemo and Practical Takeaways\r\n    - Building a knowledge graph and resolving queries.\r\n    - Open-source tools for scaling Fast GraphRAG.\r\n    - Real-World applications\r\n\r\nFast GraphRAG isn\u2019t just another tool. It's a game-changer for anyone frustrated by the limitations of traditional RAG systems. By combining the structured clarity of knowledge graphs with the power of algorithms like PageRank and fine-tuning by InstructLab, it makes retrieval smarter, faster, and the LLM more adaptable. This session will leave you with a clear understanding of how to build/train AI systems that deliver meaningful results while being transparent and trustworthy. Whether you\u2019re a developer, researcher, or just someone passionate about AI, Fast GraphRAG is a framework that sparks possibilities and redefines what intelligent retrieval can achieve.", "recording_license": "", "do_not_record": false, "persons": [{"code": "8BDEZN", "name": "Tuhin Sharma", "avatar": "https://pretalx.com/media/avatars/8BDEZN_hfr9PYu.jpg", "biography": "Tuhin Sharma is Senior Principal Data Scientist at Redhat in the Data Development Insights & Strategy AI team. Prior to that, he worked at Hypersonix as an AI architect. He also co-founded and has been CEO of Binaize (backed by Techstars), a website conversion intelligence product for e-commerce SMBs. Previously, he was part of  IBM Watson where he worked on NLP and ML projects featured on Star Sports and CNN-IBN. He received a master's degree from IIT Roorkee and a bachelor's degree from IIEST Shibpur in Computer Science. He loves to code and collaborate on open-source projects. He is one of the top 20 contributors of pandas. He has 4 research papers and 5 patents in the fields of AI and NLP. He is a reviewer of the IEEE MASS conference, Springer nature and Packt publication in the AI track. He writes deep learning articles for O'Reilly in collaboration with the AWS MXNET team. He is a regular speaker at prominent AI conferences like O'Reilly Strata & AI, ODSC, GIDS, Devconf, Datahack Summit etc.", "public_name": "Tuhin Sharma", "guid": "53e39e6f-b5ae-5c35-8bd2-1dda4a8637ff", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/8BDEZN/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/JABVHK/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/JABVHK/", "attachments": []}, {"guid": "e3547b4a-fb8b-5225-a87f-fad55effe4d4", "code": "P8GUWG", "id": 61840, "logo": null, "date": "2025-04-25T13:20:00+02:00", "start": "13:20", "duration": "00:30", "room": "Platinum3", "slug": "pyconde-pydata-2025-61840-is-your-llm-any-good-at-writing-benchmarking-on-creative-writing-and-editing-tasks", "url": "https://pretalx.com/pyconde-pydata-2025/talk/P8GUWG/", "title": "Is your LLM any good at writing? Benchmarking on creative writing and editing tasks", "subtitle": "", "track": "PyData: Natural Language Processing & Audio (incl. Generative AI NLP)", "type": "Talk", "language": "en", "abstract": "Many LLM benchmarks focus on reasoning and coding tasks. These are exciting tasks! But the majority of LLM usage is still in writing and editing related tasks, and there's a surprising lack of benchmarks on these. \r\n\r\nIn this talk you'll learn what it took to create a writing benchmark, and which model performs best!", "description": "Large Language Models (LLMs) have demonstrated impressive capabilities in generating human-quality text, but how do we objectively measure their performance on complex writing and editing tasks? This talk explores the challenges of benchmarking LLMs for these tasks and presents a novel framework for evaluating their effectiveness. \r\n\r\nThe talk will provide practical guidance on how to evaluate and compare the performance of different LLMs. Basic familiarity with language models is required for this talk. \r\n\r\n**Outline:**\r\n\r\nIntroduction\r\n\r\n- Briefly introduce LLMs and their growing role in writing and editing.\r\n- Highlight the need for standardized benchmarks to compare and improve LLM performance. Majority of LLM usage is still on writing tasks*! \r\n\r\n*Source: https://arxiv.org/pdf/2405.01470\r\n\r\nChallenges in benchmarking LLMs for writing and editing:\r\n\r\n- Defining objective metrics for subjective tasks like writing quality and editing accuracy.\r\n- Addressing the issue of bias in training data and its impact on evaluation.\r\n- Accounting for the diverse range of writing and editing tasks.\r\n\r\nA framework for evaluating LLM performance:\r\n\r\n- Proposing a set of key metrics that encompass fluency, coherence, accuracy, and style.\r\n- Introducing a methodology for constructing diverse and representative test datasets.\r\n\r\nResults:\r\n\r\n- Showcasing examples of how the proposed framework can be applied to evaluate different LLMs.\r\n- Presenting findings from recent benchmarking studies and discussing their implications.\r\n\r\nFuture directions:\r\n\r\n- Exploring the potential of LLMs to assist with increasingly complex writing and editing tasks.\r\n- Identifying areas for future research and development in LLM benchmarking.", "recording_license": "", "do_not_record": false, "persons": [{"code": "MUAVLE", "name": "Azamat Omuraliev", "avatar": "https://pretalx.com/media/avatars/MUAVLE_GBfWBIN.jpeg", "biography": "Incoming Solutions Architect at Databricks. \r\n\r\nBefore April 2025, AI engineer at Typetone, where I'm taming LLMs to automate end-to-end marketing. \r\n\r\nWe help unburden SMEs and solopreneurs from doing their content marketing, and this task is surprisingly hard for LLMs to solve yet!\r\n\r\nIn past lives personalized marketing at ING as a data scientist and ran a non-profit in Kyrgyzstan.", "public_name": "Azamat Omuraliev", "guid": "39ee263a-ec8c-51b1-aa05-eae4fa73c24e", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/MUAVLE/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/P8GUWG/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/P8GUWG/", "attachments": []}, {"guid": "a8ae160c-b570-5a46-9dec-211c1287ac4e", "code": "MNTFRG", "id": 59374, "logo": null, "date": "2025-04-25T14:00:00+02:00", "start": "14:00", "duration": "00:30", "room": "Platinum3", "slug": "pyconde-pydata-2025-59374-using-causal-thinking-to-make-media-mix-modeling", "url": "https://pretalx.com/pyconde-pydata-2025/talk/MNTFRG/", "title": "Using Causal thinking to make Media Mix Modeling", "subtitle": "", "track": "PyData: PyData & Scientific Libraries Stack", "type": "Talk", "language": "en", "abstract": "In today's data-driven landscape, understanding causal relationships is essential for effective marketing strategies. This talk will explore the link between Bayesian causal thinking and media mix modeling, utilizing Directed Acyclic Graphs (DAGs), Structural Causal Models (SCMs), and the Data Generation Process (DGP).\r\n\r\nWe will examine how DAGs represent causal assumptions, how SCMs define relationships in media mix models, and how to implement these models within a Bayesian framework. By using media mix models as causal inference tools, we can estimate counterfactuals and causal effects, offering insights into the effectiveness of media investments.", "description": "In the era of data-driven decision-making, understanding causal relationships is crucial for effective marketing strategies. This talk delves into the underexplored connection between Bayesian causal thinking and media mix modeling, linking Directed Acyclic Graphs (DAGs), Structural Causal Models (SCMs), and the Data Generation Process (DGP). By navigating through these key concepts, we will demonstrate how we can build models that not only predict outcomes but also represent causal mechanisms within the marketing ecosystem.\r\n\r\nStarting from foundational principles, we will explore how DAGs serve as a formal language for encoding causal assumptions, how Structural Causal Modeling define relationships in media mix models, and how we implement those in the Bayesian framework through the famous DGP. We will further illustrate how media mix models can be employed as causal inference tools to estimate counterfactuals and causal effects, providing actionable insights into the effectiveness of media investments.\r\n\r\nFinally, we\u2019ll show how Bayesian inference enables us to update these causal beliefs in light of data. This synthesis of causal reasoning and probabilistic modeling is not only theoretically rich but practically powerful\u2014offering a robust framework for constructing media mix models that more accurately reflect the complexities of real-world marketing dynamics.\r\n\r\nAttendees will leave with an understanding of how to apply Bayesian causal discovery (guided by an example in an IPython notebook) to develop causally valid models that can be applied to real-world marketing data. They will learn how to use Media Mix Models as causal inference tools to estimate counterfactual scenarios and causal effects, unlocking deeper insights into the effectiveness of media investments. This presentation aims to reveal a new pathway for marketers, data scientists, and researchers to harness the potential of these powerful methodologies together, empowering them to drive more informed, causally grounded decisions.", "recording_license": "", "do_not_record": false, "persons": [{"code": "FNLJZD", "name": "Carlos Trujillo", "avatar": null, "biography": "Six years ago, I discovered that my passion was in the field of data and artificial intelligence. I decided to move from Venezuela to Chile in search of new challenges and currently found one of them in Estonia, where I have had the opportunity to work on teams in Latin America, Europe and Africa.\r\n\r\nI have been able to work on projects related to artificial intelligence, machine learning and deep learning, especially in the field of marketing, which has allowed me to help traditional companies adopt a data-driven approach. However, my latest challenge has been working at the fastest mobility company in Europe, where I have been able to apply all my knowledge and skills in a highly dynamic and constantly evolving environment.\r\n\r\nI have had to develop different programs in Python, using SQL and No-SQL to build cloud structures that can handle large volumes of information. I have learned to work with DataBricks and DBT, and I am familiar with Google Cloud and AWS. I have also explored tools such as Airflow, CloudRun, App Engine, BigQuery, S3, DynamoDB, MongoDB, among others.\r\n\r\nMy focus has always been on the areas of statistics and mathematics, seeking to solve recurrent problems in business through techniques of computer vision, natural language processing, regression or classification algorithms, and neural networks. I have generated dashboards in Looker, Looker Studio, Tableau and Power BI, also custom reports, alerts and complex artificial intelligence models, leading teams of four to ten people, being the bridge between marketers and technical teams.\r\n\r\nI still have a long way to go to become the \"Marketing Scientist\" I want to be, but I am grateful for all the opportunities and challenges I have faced so far. I am certainly eager to continue learning and growing in my career, looking for new challenges that will take me to spaces that I have not yet explored.", "public_name": "Carlos Trujillo", "guid": "4731478e-1b00-5874-b5b8-ce234d9c4e3a", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/FNLJZD/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/MNTFRG/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/MNTFRG/", "attachments": []}, {"guid": "815c6d3d-853a-5b6d-9acd-1db041f6602f", "code": "9CRNU3", "id": 61788, "logo": null, "date": "2025-04-25T14:40:00+02:00", "start": "14:40", "duration": "00:30", "room": "Platinum3", "slug": "pyconde-pydata-2025-61788-building-a-hybridrag-document-question-answering-system", "url": "https://pretalx.com/pyconde-pydata-2025/talk/9CRNU3/", "title": "Building a HybridRAG Document Question-Answering System", "subtitle": "", "track": "PyData: Natural Language Processing & Audio (incl. Generative AI NLP)", "type": "Talk", "language": "en", "abstract": "Retrieval Augmented Generation (RAG) is a powerful technique for searching across unstructured documents, but it often falls short when the task demands an understanding of intricate relationships between entities. GraphRAG addresses this by leveraging knowledge graphs to capture these relationships, but it struggles with scalability and handling diverse unstructured formats. In this talk, we\u2019ll explore how HybridRAG combines the strengths of both approaches - RAG for scalable unstructured data retrieval and GraphRAG for semantic richness- to deliver accurate and contextually relevant answers. We\u2019ll dive into its application, challenges, and the significant improvements it offers for question-answering systems across various domains.", "description": "#### Outline:\r\n\r\n1. Introduction  \r\n   - The challenge of extracting information from unstructured and domain-specific text (e.g., legal documents).  \r\n   - Overview of traditional RAG techniques and their limitations:  \r\n     - Scalability and unstructured data handling.  \r\n     - Lack of semantic depth to capture intricate relationships.  \r\n   - Why HybridRAG is a game-changer.  \r\n\r\n2. What is RAG? \r\n   - Explanation of vector-based retrieval using embeddings and databases.  \r\n   - Advantages of RAG:  \r\n     - Scalable search across diverse unstructured formats.  \r\n     - Domain-agnostic retrieval capabilities.  \r\n   - Limitations:  \r\n     - Inability to capture relationships between entities.  \r\n     - Difficulty handling domain-specific or complex queries.  \r\n\r\n3. What is GraphRAG?  \r\n   - Explanation of GraphRAG: How knowledge graphs enhance retrieval by mapping relationships between entities.  \r\n   - Benefits of GraphRAG:  \r\n     - Semantic richness and contextual understanding.  \r\n     - Effective for domains requiring deep relational reasoning (e.g., finance, healthcare).  \r\n   - Challenges of GraphRAG:  \r\n     - Building high-quality knowledge graphs from unstructured data.  \r\n     - Scalability and integration with generative models.  \r\n\r\n4. Introducing HybridRAG: Combining RAG and GraphRAG \r\n   - The HybridRAG architecture:  \r\n     - RAG for scalable retrieval of unstructured data.  \r\n     - GraphRAG for refining answers with relational and semantic context.  \r\n   - Benefits of HybridRAG:  \r\n     - Combining scalability with semantic depth.  \r\n     - Improved retrieval accuracy and contextual relevance.  \r\n   - Use case: Legal documents processing (e.g., extracting Q&A insights).  \r\n     - How RAG retrieves general context.  \r\n     - How GraphRAG captures relationships (e.g., between companies, documents, events).  \r\n\r\n5. Challenges in Building HybridRAG Systems\r\n   - Creating high-quality knowledge graphs from diverse and unstructured data.  \r\n   - Balancing computational overhead from combining RAG and GraphRAG.  \r\n   - Addressing domain-specific terminology and ensuring generalizability to other domains.  \r\n\r\n6. Key Takeaways\r\n   - HybridRAG effectively combines the strengths of RAG and GraphRAG.  \r\n   - It\u2019s particularly powerful for domains requiring both scalability and semantic depth.  \r\n   - Practical advice for building HybridRAG systems in your projects.  \r\n\r\n#### What You\u2019ll Learn:\r\n- The strengths and limitations of RAG and GraphRAG techniques for question-answering systems.  \r\n- How HybridRAG bridges the gap by combining scalable retrieval with semantic richness.  \r\n- Practical challenges and solutions for building HybridRAG systems, including knowledge graph creation and integration.  \r\n- Insights into real-world applications where HybridRAG delivers superior results.", "recording_license": "", "do_not_record": false, "persons": [{"code": "37RKBL", "name": "Darya Petrashka", "avatar": "https://pretalx.com/media/avatars/37RKBL_ZRMXTOD.jpg", "biography": "Darya Petrashka is a Data Scientist at SLB with 5 years of experience, focusing on supply chain projects in data analysis, NLP, and generative AI. She is passionate about using data for problem-solving, with a strong interest in classical machine learning, NLP, and AWS services. An AWS Community Builder and Authorized Instructor, Darya actively shares her expertise through public speaking at various industry events, including AWS Community Days, AWS Cloud Day, and PyCon. A dedicated learner, Darya continually hones her skills by participating in workshops, courses, and tech schools.", "public_name": "Darya Petrashka", "guid": "0915c6fa-61c8-5ce2-86bc-7fa6780fae0a", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/37RKBL/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/9CRNU3/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/9CRNU3/", "attachments": []}], "Europium2": [{"guid": "5592ba0d-671b-594f-b050-4962493d4c6b", "code": "VKYDBD", "id": 59310, "logo": null, "date": "2025-04-25T10:15:00+02:00", "start": "10:15", "duration": "00:30", "room": "Europium2", "slug": "pyconde-pydata-2025-59310-building-bare-bones-game-physics-in-rust-with-python-integration", "url": "https://pretalx.com/pyconde-pydata-2025/talk/VKYDBD/", "title": "Building Bare-Bones Game Physics in Rust with Python Integration", "subtitle": "", "track": "General: Rust", "type": "Talk", "language": "en", "abstract": "Learn how to build a minimalist game physics engine in Rust and make it accessible to Python developers using PyO3. This talk explores fundamental concepts like collision detection and motion dynamics while focusing on Python integration for scripting and testing. Ideal for developers interested in combining Rust\u2019s performance with Python\u2019s ease of use to create lightweight and efficient tools for games or simulations.", "description": "Python\u2019s simplicity makes it the go-to choice for scripting, while Rust excels in performance-critical tasks like game physics. This talk demonstrates how to build a minimalist physics engine in Rust, focusing on core concepts like collision detection, basic rigid body dynamics, and force application, while providing seamless Python integration using PyO3.\r\n\r\nWe\u2019ll explore how PyO3 allows developers to expose Rust functionality as native Python modules, enabling Python developers to easily script and interact with the physics engine. Through practical examples, attendees will see how Python can be used for rapid prototyping and gameplay scripting, while Rust handles the heavy lifting of physics calculations.\r\n\r\nBy the end of this session, participants will not only understand the basics of implementing physics in Rust but also how to use PyO3 to bridge the gap between Rust\u2019s performance and Python\u2019s flexibility. This talk is perfect for Python enthusiasts curious about Rust or Rustaceans looking to make their libraries accessible to the Python ecosystem.", "recording_license": "", "do_not_record": false, "persons": [{"code": "SS7YVE", "name": "Sam Kaveh", "avatar": "https://pretalx.com/media/avatars/SS7YVE_NbiW0Kr.jpeg", "biography": "Born in Iran, I have embraced diverse roles throughout my career, ranging from founding a startup and software development to consulting companies on cloud migrations and integrating machine learning technologies into their operations. My professional journey has been shaped by a passion for problem-solving and innovation across various domains.\r\n\r\nAcademically, I hold a Ph.D. in particle physics, specializing in Higgs boson precision measurements as part of the CMS experiment at CERN's Large Hadron Collider. This experience honed my analytical skills and gave me a deep appreciation for collaboration in high-stakes, cutting-edge environments.\r\n\r\nToday, I draw on my multidisciplinary background to create solutions at the intersection of software, data science, and high-performance computing, continually seeking to bridge theory and practice in impactful ways.", "public_name": "Sam Kaveh", "guid": "aa6fe836-434b-5dc4-8348-9543d28fc144", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/SS7YVE/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/VKYDBD/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/VKYDBD/", "attachments": [{"title": "Slides for the \u201cBuilding Bare-Bones Game Physics in Rust with Python Integration\u201d talk", "url": "/media/pyconde-pydata-2025/submissions/VKYDBD/resources/sam_ka_qpMzRov.pdf", "type": "related"}]}, {"guid": "e2fda11c-5027-525f-ad62-809a08080332", "code": "JUQ9JJ", "id": 61810, "logo": null, "date": "2025-04-25T10:55:00+02:00", "start": "10:55", "duration": "00:30", "room": "Europium2", "slug": "pyconde-pydata-2025-61810-high-performance-dataframe-agnostic-glms-with-glum", "url": "https://pretalx.com/pyconde-pydata-2025/talk/JUQ9JJ/", "title": "High-performance dataframe-agnostic GLMs with glum", "subtitle": "", "track": "PyData: PyData & Scientific Libraries Stack", "type": "Talk", "language": "en", "abstract": "Generalized linear models (GLMs) are interpretable, relatively quick to train, and specifying them helps the modeler understand the main effects in the data. This makes them a popular choice today to complement other machine-learning approaches. `glum` was conceived with the aim of offering the community an efficient, feature-rich, and Python-first GLM library with a scikit-learn-style API. More recently, we are striving to keep up with PyData community's ongoing push for dataframe-agnosticism.\r\nWhile `glum` was originally heavily based on `pandas`, with the help of `narwhals`, we are close to being able to fit models on any dataset that the latter supports. This talk presents our experiences with achieving this goal.", "description": "Arguably, `glum`'s standout feature is its ability to efficiently handle datasets consisting of a mix of dense, sparse and categorical features. To facilitate this, it relies on our (similarly open-source) `tabmat` library, which provides classes and useful methods for mixed-sparsity data. `glum` fits models by first converting input data to `tabmat` matrices, and then using those matrices to do the necessary computations.\r\n\r\nTherefore, dataframe-agnostism in our case mostly boils down to handling the conversion of different dataframes to `tabmat` matrices (which themselves store data in `numpy` arrays and sparse `scipy` matrices) in an efficient manner. Most of it is rather smooth and straightforward due to `narwhals` providing a convenient compatibility layer for a wide range of dataframe functionality. However, we have encountered a couple of pain points that might be of interest to other package maintainers and the PyData community. In particular,\r\n\r\n- We heavily rely on manipulating the category order and encoding of categorical variables, for which there is somewhat limited support. This is due to various dataframe libraries handling cateegorical columns somewhat differently.\r\n- Most dataframe libraries do not support sparse columns, while for us, it is important to be able to accept sparse inputs.\r\n\r\nIn this talk I demonstrate how we used `narwhals` to easily accept multiple types of dataframes. I will go into details about categorical and sparse columns, and present the challenges we encountered with those. I will also examine the benefits and challenges of supporting sparse columns in dataframe libraries and the Arrow stardard. These points are meant to facilitate discussion among the participants and in the PyData community.\r\n\r\nAt the end of the talk I will also briefly mention potential future plans for `glum` and `tabmat`, including the possibility to do computations directly on Arrow objects without converting them to `numpy` and `scipy` arrays. \r\n\r\n### Outline\r\n\r\n1. A short intro to `glum`, it's backend library `tabmat`, and the main ideas that make them performant.\r\n2. Making `glum` dataframe-agnostic.\r\n    - Showcase how `narwhals` simplifies handling a wide variety of dataframes.\r\n    - Discuss handling categorical (and enum/dictionary) columns.\r\n    - Talk about representing sparse columns in dataframes.\r\n3. Concluding remarks and potential future plans.\r\n\r\n### Target audience\r\n- Basic understanding of the scientific Python ecosystem (with a focus on dataframe libraries) is recommended.\r\n- While some familiarity with linear models might be useful to get the most out of this talk, it is by no means required.\r\n\r\n### Main takeaways\r\n\r\n- How `glum` efficiently handles mixed-sparsity data\r\n- How `narwhals` helps to achieve dataframe-agnosticism with little effort\r\n- Differences between categorical types in various packages and the Apache Arrow specification.\r\n- How support for sparse column could be incorporated into dataframe libraries and the Arrow Columnar Format", "recording_license": "", "do_not_record": false, "persons": [{"code": "TP9PQN", "name": "Martin Stancsics", "avatar": null, "biography": "Martin is a data-scientist/engineer at QuantCo. He is mainly working on developing the software packages that Quantco uses for insurance risk modeling and pricing. This includes QuantCo's open-source generalized linear modeling package, glum.\r\n\r\nHe has a background in economics, and has previously worked at the Central Bank of Hungary as an applied researcher. He also taught a number of 'Programming for Economists' courses for college and PhD students.", "public_name": "Martin Stancsics", "guid": "96c9b3e7-76ab-5d32-ad7a-77286c209005", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/TP9PQN/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/JUQ9JJ/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/JUQ9JJ/", "attachments": []}, {"guid": "38e5dd2b-571e-52cc-a31f-7a6320339a65", "code": "DPAPUA", "id": 60315, "logo": null, "date": "2025-04-25T11:35:00+02:00", "start": "11:35", "duration": "00:30", "room": "Europium2", "slug": "pyconde-pydata-2025-60315-gitmlops-how-we-are-managing-100-ml-pipelines-in-aws-sagemaker", "url": "https://pretalx.com/pyconde-pydata-2025/talk/DPAPUA/", "title": "GitMLOps \u2013 How we are managing 100+ ML pipelines in AWS SageMaker", "subtitle": "", "track": "PyCon: MLOps & DevOps", "type": "Talk", "language": "en", "abstract": "Scaling machine learning pipelines is no small feat - especially when you\u2019re managing over 100 of them on AWS SageMaker. In this talk, I\u2019ll take you behind the scenes of how our team at idealo built a Git-based MLOps framework that powers millions of real-time recommendations every minute.\r\n\r\nI\u2019ll share the challenges we faced, the solutions we implemented, and the lessons we learned while streamlining model versioning, deployment, and monitoring. This session is packed with actionable takeaways for ML engineers, data scientists, and DevOps professionals looking to simplify their MLOps workflows and operate efficiently at scale.\r\n\r\nWhether you\u2019re running a handful of pipelines or preparing to scale up, this talk will equip you with the tools and strategies to tackle MLOps with confidence.", "description": "In 2022, idealo\u2019s Machine Learning Engineering (MLE) team took on a bold mission: to transform and scale the recommendation systems powering the idealo website. Fast forward to today, we\u2019re delivering over 1 million recommendations per minute across 20 key user touchpoints - driving seamless, personalized experiences at scale.\r\n\r\nBut how do you manage over 100 machine learning pipelines without breaking a sweat? In this talk, I\u2019ll reveal the three core principles that helped us build a sustainable and efficient MLOps workflow in AWS SageMaker:\r\n\r\n* Decoupling pipeline releases from deployments for ultimate flexibility\r\n* Testing pipelines to ensure seamless performance\r\n* Centrally managing infrastructure as code for full control and scalability\r\nIf you\u2019re ready to supercharge your MLOps game, this session will leave you with practical strategies and battle-tested solutions for running ML pipelines like a pro.", "recording_license": "", "do_not_record": false, "persons": [{"code": "HECXGZ", "name": "Bogdan Girman", "avatar": "https://pretalx.com/media/avatars/HECXGZ_ZxTZ22y.png", "biography": "Bogdan Girman is an expert in Machine Learning and DevOps, with extensive experience in implementing scalable, reproducible ML systems. He is passionate about bridging the gap between development and operations in AI.", "public_name": "Bogdan Girman", "guid": "0bff1cc0-1eca-5253-95e8-5e7fe9ee82a1", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/HECXGZ/"}], "links": [{"title": "Article on Medium URL", "url": "https://medium.com/p/b00f9451abaf", "type": "related"}], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/DPAPUA/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/DPAPUA/", "attachments": [{"title": "Slides", "url": "/media/pyconde-pydata-2025/submissions/DPAPUA/resources/GitMLO_kGDOqix.pdf", "type": "related"}]}, {"guid": "dd6bb741-b570-5d84-a1ac-6b13ca6b886f", "code": "KTJY9V", "id": 61839, "logo": null, "date": "2025-04-25T13:20:00+02:00", "start": "13:20", "duration": "00:30", "room": "Europium2", "slug": "pyconde-pydata-2025-61839-responsible-ai-with-fmeval-an-open-source-library-to-evaluate-llms", "url": "https://pretalx.com/pyconde-pydata-2025/talk/KTJY9V/", "title": "Responsible AI with fmeval - an open source library to evaluate LLMs", "subtitle": "", "track": "PyData: PyData & Scientific Libraries Stack", "type": "Talk", "language": "en", "abstract": "The term \"Responsible AI\" has seen a threefold increase in search interest compared to 2020 across the globe. As developers, the questions like \"How can we build large language model-enabled applications that are responsible and accountable to its users?\" encountered in the conversation more often than before. And the discussion is further compounded by concerns surrounding uncertainty, bias, explainability, and other ethical considerations.\r\n\r\nIn this session, the speaker will guide you through fmeval, an open-source library designed to evaluate Large Language Models (LLMs) across a range of tasks. The library provides notebooks that you can integrate into your daily development process, enabling you to identify, measure, and mitigate potential responsible AI issues throughout your system development lifecycle.", "description": "The term \"Responsible AI\" has seen a threefold increase in search interest compared to 2020 across the globe. As developers, the questions like \"How can we build large language model-enabled applications that are responsible and accountable to its users?\" encountered in the conversation more often than before. And the discussion is further compounded by concerns surrounding uncertainty, bias, explainability, and other ethical considerations.\r\n\r\nIn this session, the speaker will guide you through fmeval, an open-source library designed to evaluate Large Language Models (LLMs) across a range of tasks. The library provides notebooks that you can integrate into your daily development process, enabling you to identify, measure, and mitigate potential responsible AI issues throughout your system development lifecycle.\r\n\r\nTarget Audience: Machine Learning Engineers/Data Scientists, AI/ML Researchers, Software Developers, AI/ML Project Managers, Solutions Architectures.", "recording_license": "", "do_not_record": false, "persons": [{"code": "3MKT9M", "name": "Mia Chang", "avatar": "https://pretalx.com/media/avatars/3MKT9M_dhPVHGQ.png", "biography": "Mia Chang is a GenAI/ML Specialist Solutions Architect for Amazon Web Services. She shares best practices for running GenAI/ML workloads through customer engagements, public speaking, blog posts, and authoring books. She works in a multi-culture environment with customers in EMEA, which brings her to see technology with different culture lens. In her free time, Mia spends time mentoring aspired data scientists, and she enjoys traveling, hiking, board games, and meditation.", "public_name": "Mia Chang", "guid": "efe47bf9-3c0a-55ae-bade-c95e89d56caf", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/3MKT9M/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/KTJY9V/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/KTJY9V/", "attachments": []}, {"guid": "c89e9260-0943-5b48-b600-1161838cd87d", "code": "3VYSMS", "id": 61786, "logo": null, "date": "2025-04-25T14:00:00+02:00", "start": "14:00", "duration": "00:30", "room": "Europium2", "slug": "pyconde-pydata-2025-61786-you-don-t-think-about-your-streamlit-app-optimization-until-you-try-to-deploy-it-to-the-cloud", "url": "https://pretalx.com/pyconde-pydata-2025/talk/3VYSMS/", "title": "You don\u2019t think about your Streamlit app optimization until you try to deploy it to the cloud", "subtitle": "", "track": "PyCon: MLOps & DevOps", "type": "Talk", "language": "en", "abstract": "Building Streamlit apps is easy for Data Scientists - but when it\u2019s time to deploy them to the cloud, challenges like slow model loading, scalability, and security can become major hurdles. This talk bridges two perspectives: the Data Scientist who builds the app and the MLOps engineer who deploys it. We'll dive into optimizing model loading from Hugging Face Hub, implementing features like autoscaling and authentication, and securing your app against potential threats. By the end of this talk, you\u2019ll be ready to design Streamlit apps that are functional and deployment-ready for the cloud.", "description": "#### Talk Outline:\r\n\r\n1. Introduction\r\n   - The disconnect: challenges when transitioning a Streamlit app from development to deployment.  \r\n   - Why deployment considerations should influence app design.  \r\n\r\n2. Optimizing model loading from HuggingFace hub \r\n   - Challenges:  \r\n     - Large model sizes slowing down app performance.  \r\n     - Inefficient loading processes increasing costs and user wait times.  \r\n   - Solutions:  \r\n     - Using Streamlit caching to reuse loaded models across sessions.  \r\n     - Preloading models during image build.\r\n     - Deploying models and calling them as APIs\r\n   - MLOps Perspective: How optimized model loading reduces deployment complexity and cloud costs.  \r\n\r\n3. AWS deployment considerations: autoscaling, authentication, and security  \r\n   - Autoscaling:  \r\n     - Challenges: Handling variable user traffic without incurring unnecessary costs.  \r\n     - Solutions:  \r\n       - Using Fargate with ECS for containerized apps with auto-scaling policies.  \r\n       - Setting thresholds to scale instances based on traffic and resource utilization.  \r\n       - Optimizing cost-performance balance with reserved vs. spot instances.  \r\n\r\n   - Authentication:  \r\n     - Challenges: Providing a secure and user-friendly authentication mechanism.  \r\n     - Solutions:  \r\n       - Integrating AWS Cognito for user management.  \r\n       - Adding role-based access control to limit app functionality based on user roles.  \r\n\r\n   - Security:  \r\n     - Challenges: Protecting the app from attacks and unauthorized access.  \r\n     - Solutions:  \r\n       - Using AWS Web Application Firewall (WAF) to block malicious traffic.  \r\n       - Configuring CloudFront to protect against DDoS attacks and improve performance.  \r\n       - Setting up HTTPS with Route 53 and TLS certificates for secure connections.  \r\n   - MLOps Perspective: Balancing simplicity and scalability in app deployment.  \r\n\r\n4. Secrets Storage  \r\n   - Challenges: Hardcoding sensitive credentials into the app.  \r\n   - Solutions:  \r\n     - Using AWS Secrets Manager or Parameter Store for secure secrets management.  \r\n     - Employing environment variables for flexible app configuration.  \r\n   - MLOps Perspective: How to ensure security without complicating deployment workflows.  \r\n\r\n5. Key Takeaways \r\n   - Data Scientist\u2019s Perspective:  \r\n     - Why it\u2019s critical to consider performance, scalability, authentication, and security during app development.  \r\n   - MLOps Perspective:  \r\n     - How to simplify deployment while ensuring performance and security.  \r\n   - Encouraging collaboration between Data Scientists and MLOps engineers for smoother deployment processes.  \r\n\r\n#### What you will learn:  \r\n- How to efficiently load Hugging Face models in Streamlit apps to reduce costs and improve performance.  \r\n- How to design apps with AWS autoscaling to handle variable traffic seamlessly.  \r\n- Best practices for implementing user authentication with AWS Cognito.  \r\n- How to secure your Streamlit app using cloud services.\r\n- Best practices for secure secrets management in Streamlit apps.\r\n- How to approach Streamlit app development with deployment in mind.", "recording_license": "", "do_not_record": false, "persons": [{"code": "37RKBL", "name": "Darya Petrashka", "avatar": "https://pretalx.com/media/avatars/37RKBL_ZRMXTOD.jpg", "biography": "Darya Petrashka is a Data Scientist at SLB with 5 years of experience, focusing on supply chain projects in data analysis, NLP, and generative AI. She is passionate about using data for problem-solving, with a strong interest in classical machine learning, NLP, and AWS services. An AWS Community Builder and Authorized Instructor, Darya actively shares her expertise through public speaking at various industry events, including AWS Community Days, AWS Cloud Day, and PyCon. A dedicated learner, Darya continually hones her skills by participating in workshops, courses, and tech schools.", "public_name": "Darya Petrashka", "guid": "0915c6fa-61c8-5ce2-86bc-7fa6780fae0a", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/37RKBL/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/3VYSMS/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/3VYSMS/", "attachments": []}, {"guid": "f80a4eab-456c-5b24-b567-33bca992d972", "code": "MJD7TG", "id": 61098, "logo": null, "date": "2025-04-25T14:40:00+02:00", "start": "14:40", "duration": "00:30", "room": "Europium2", "slug": "pyconde-pydata-2025-61098-what-do-a-tree-and-the-human-brain-have-in-common-a-not-so-serious-introduction-to-digital-pathology", "url": "https://pretalx.com/pyconde-pydata-2025/talk/MJD7TG/", "title": "What do a tree and the human brain have in common-a not so serious introduction to digital pathology", "subtitle": "", "track": "PyData: Computer Vision (incl. Generative AI CV)", "type": "Talk", "language": "en", "abstract": "While trees and human brains don't share that many properties regarding their domain, the analysis of the height of a tree and cancer in human brains does.\r\nThis talk provides a not-so-serious introduction to the domain of computer vision for pathological use cases. \r\nBesides a general introduction to (digital) pathology and the technical similarities between satellite images (GeoTIFs) and pathological images (Whole-Slide Images), we will take a look at computer vision for medical tasks using Python.\r\nWhether you have never done image processing in Python, are an expert (ready to share some tricks with me), or are just curious to see pictures of a human brain, this talk is for you.\r\nWarning: this talk contains quite abstract pink-ish pictures of human tissue (and trees^^). If you are unsure this is something you are comfortable with (have a friend), do a quick search for \"HE-stained whole-slide image\".", "description": "Inspired by last year's talk about the height of a tree [\ud83c\udf33 The taller the tree, the harder the fall. Determining tree height from space using Deep Learning and very high resolution satellite imagery \ud83d\udef0\ufe0f] and the strong similarities between optical high resolution satellite images and pathological images, this talk will give a not-so-serious introduction to a quite serious topic: Python for digital pathology.\r\nThe main content is:\r\n- \"Cancer detection\"\r\n- An introduction to (digital) pathology (know your domain)\r\n- The similarities between a tree and your brain (technically speaking, there are a lot)\r\n- A shallow view of ML-based and conventional computer vision in Python with some practical use cases\r\n- Why we can steal (nearly) everything from radiology and get away with it\r\n- What potential pitfalls could be\r\n- How you can start doing medical computer vision on your own\r\n\r\nWarning: this talk contains quite abstract pink-ish pictures of human tissue (and trees^^). If you are unsure this is something you are comfortable with (have a friend), do a quick search for \"HE-stained whole-slide image\".", "recording_license": "", "do_not_record": false, "persons": [{"code": "CZVWXB", "name": "Daniel Hieber", "avatar": "https://pretalx.com/media/avatars/CZVWXB_jqVHirF.png", "biography": "Hi, I'm Daniel, a PhD student in digital neuropathology at Julius-Maximilians-University W\u00fcrzburg and a research associate at the University Hospital Augsburg as well as Neu-Ulm University of Applied Sciences. My work focuses on applying computer vision techniques to automate analysis processes in the pathological departments and provide physicians with the tools to conduct machine learning on their own.", "public_name": "Daniel Hieber", "guid": "c55366e6-e133-52a1-bb8c-3f759efc0a86", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/CZVWXB/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/MJD7TG/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/MJD7TG/", "attachments": [{"title": "Handout", "url": "/media/pyconde-pydata-2025/submissions/MJD7TG/resources/BrainT_g7Qig22.pdf", "type": "related"}]}], "Hassium": [{"guid": "171ec60d-52db-5306-ba6b-3970299083a4", "code": "LRUKZQ", "id": 61259, "logo": null, "date": "2025-04-25T10:15:00+02:00", "start": "10:15", "duration": "00:30", "room": "Hassium", "slug": "pyconde-pydata-2025-61259-beyond-dall-e-advanced-image-generation-workflows-with-comfyui", "url": "https://pretalx.com/pyconde-pydata-2025/talk/LRUKZQ/", "title": "Beyond DALL-E: Advanced Image Generation Workflows with ComfyUI", "subtitle": "", "track": "PyData: Computer Vision (incl. Generative AI CV)", "type": "Talk", "language": "en", "abstract": "Image generation using AI has made huge progress over the last years, and many people still think that DALL-E with a text prompt is the best way to generate images. There are well-known models like Stable Diffusion and Flux, which can be used with easy-to-use frontends like A1111 or Invoke AI, but if you want to do more complex or bleeding-edge workflows, you need something else. In this talk, I want to show you ComfyUI, an open-source node-based GUI written in Python where you can build complex pipelines that are otherwise only possible using plain code.", "description": "Image generation using AI has made huge progress over the last years, and many people still think that DALL-E with a text prompt is the best way to generate images. But thanks to Stable Diffusion, Flux, and many supplementary models like ControlNet or an Image Prompt Model, we have much more control over the images we want to create. There are frontends for that, like A1111 or Invoke AI, but if you want to try bleeding-edge models or do something more complex, you will have a hard time implementing such a pipeline in code yourself, and it requires a steep learning curve. In this talk, I want to show you ComfyUI, an open-source node-based GUI written in Python where you can build workflows as a DAG. Thanks to many other contributors, there are a lot of plugins available which bring in new functionality. This talk shows the capabilities and power of this tool using practical examples and how you can combine many things together to create a complex workflow much faster than coding it yourself.\r\n\r\nI want to cover the following topics:\r\n - What are the limits of a simple text-to-image workflow?\r\n - What is ComfyUI?\r\n - What are the requirements to use ComfyUI? (Resources, OS, etc.)\r\n - What can you do with ComfyUI that you can't do with a simple text-to-image interface?\r\n   - Pre- and post-processing of images in a single workflow\r\n   - Advanced conditioning using images, bounding boxes, depth maps, etc., all together\r\n - The examples shown as a demonstration:\r\n   - Integrating existing objects from a photo into a generated scenery\r\n   - Creating optical illusions and surreal images", "recording_license": "", "do_not_record": false, "persons": [{"code": "NZNFP8", "name": "Ren\u00e9 Fa", "avatar": "https://pretalx.com/media/avatars/NZNFP8_T1r5soK.JPG", "biography": "Just another Python nerd with a freshly gained enthusiasm for image gen AI. I'm working as a Data Engineer for nearly three years with focus on computer vision topics.", "public_name": "Ren\u00e9 Fa", "guid": "8b5bc6c7-ec81-57fa-8bd2-51672dd4c455", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/NZNFP8/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/LRUKZQ/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/LRUKZQ/", "attachments": []}, {"guid": "398c4798-53bd-532a-a9ea-a58cbe4c9952", "code": "AEUZGX", "id": 61171, "logo": null, "date": "2025-04-25T10:55:00+02:00", "start": "10:55", "duration": "00:30", "room": "Hassium", "slug": "pyconde-pydata-2025-61171-posepie-replace-your-keyboard-and-mouse-with-ai-driven-gesture-control", "url": "https://pretalx.com/pyconde-pydata-2025/talk/AEUZGX/", "title": "PosePIE: Replace Your Keyboard and Mouse With AI-Driven Gesture Control", "subtitle": "", "track": "PyData: Computer Vision (incl. Generative AI CV)", "type": "Talk", "language": "en", "abstract": "In this talk, we show how to leverage publicly available tools to control any game or program using hand or body movements. To achieve this, we introduce PosePIE, an open-source programmable input emulator that generates input events on virtual gamepads, keyboards and mice based on gestures recognized by using AI-driven pose estimation. PosePIE is fully configurable by the user through Python scripts, making it easily adaptable to new applications.", "description": "Recent advancements in machine learning and AI hardware acceleration have enabled the use of complex models for solving computer vision problems in real-time applications. Pose estimation is one such problem, involving the detection of keypoints of the human body within an image.\r\n\r\nIn this talk, we show how PosePIE uses pose estimation to control any game or program using hand or body movements. By using state-of-the-art models, PosePIE does not require expensive specialized sensors but works entirely on the monocular image from an off-the-shelf webcam. By leveraging readily available Graphics Processing Unit (GPU) hardware, it is able to do all processing at a high frame rate to support interactive applications.\r\n\r\nAs PosePIE is fully configurable by the user through Python scripts, it can be easily adapted to new applications. This lowers the barrier to use pose estimation and gesture recognition in creative ways and for novel applications.\r\n\r\nThe source code of PosePIE is available on GitHub under the GNU GPLv3+ license: https://github.com/tegtmeier-inkubator/PosePIE", "recording_license": "", "do_not_record": false, "persons": [{"code": "V3UMNK", "name": "Daniel Stolpmann", "avatar": "https://pretalx.com/media/avatars/V3UMNK_RdbfXrc.jpg", "biography": "Daniel Stolpmann received the B.Sc. and M.Sc. degrees in computer science and engineering from Hamburg University of Technology (TUHH), Germany, in 2017 and 2019. During his master studies, he started working at the Institute of Communication Networks (ComNets) as a student assistant and became a research fellow after his graduation. At ComNets, he conducted research on machine learning for communication networks, network coding and network emulation. In 2024, he joined Tegtmeier Inkubator as a senior software developer and started working on AI-enabled smart home systems.", "public_name": "Daniel Stolpmann", "guid": "1662136e-1d46-57a5-8220-c32091968792", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/V3UMNK/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/AEUZGX/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/AEUZGX/", "attachments": []}, {"guid": "47c803af-c4f1-5154-b3cd-6a6738b039d8", "code": "CVMPVG", "id": 59620, "logo": null, "date": "2025-04-25T11:35:00+02:00", "start": "11:35", "duration": "00:30", "room": "Hassium", "slug": "pyconde-pydata-2025-59620-guardians-of-the-code-safeguarding-machine-learning-models-in-a-climate-tech-world", "url": "https://pretalx.com/pyconde-pydata-2025/talk/CVMPVG/", "title": "Guardians of the Code: Safeguarding Machine Learning Models in a Climate Tech World", "subtitle": "", "track": "PyCon: MLOps & DevOps", "type": "Talk", "language": "en", "abstract": "LLMs, Machine learning and AI are everywhere, yet their security is often overlooked, leaving your systems vulnerable to serious attacks. What happens when someone tampers with your model\u2019s input, poisons your training data, or steals your model?\r\n\r\nIn this talk, I\u2019ll explore these risks through the lens of the OWASP Machine Learning Security Top 10 using relatable, real-world examples from the climate tech world. I\u2019ll explain how these attacks happen, their impact, and why they matter to you as a Python developer, data scientist, or data engineer.\r\n\r\nYou\u2019ll learn practical ways to defend your models and pipelines, ensuring they\u2019re robust against adversarial forces. Bridging theory and practice, you'll leave equipped with insights and strategies to secure your machine learning systems, whether you\u2019re training models or deploying them in production. By the end, you\u2019ll have a solid understanding of the risks, a toolkit of best practices, and maybe even a new perspective on how important security is everywhere.", "description": "Machine learning is applied to a variety of challenges in climate tech, from optimising renewable energy to forecasting energy demands or predicting solar production. We rely more on these models, but we often forget a critical piece: their security. What happens if someone tampers with your model\u2019s inputs, poisons your training data, or sneaks malicious code into an open-source package you\u2019re using? These attacks can throw off predictions and disrupt energy systems or even the grid itself.\r\n\r\nIn this talk, I\u2019ll walk you through the OWASP Machine Learning Security Top 10, using real-world examples from climate tech to show how these attacks can happen. I'll show you cases like manipulating energy consumption forecasts, poisoning datasets, or sneaking malware into open-source libraries used for climate modelling. It\u2019s not just a hypothetical threat, these risks are real and the consequences can be serious.\r\n\r\nI\u2019ll also share practical solutions you can use as a Python developer, data scientist, or data engineer to protect your models and systems. I\u2019ll talk about securing your ML supply chain, validating data, and monitoring your pipelines for suspicious activity. You'll leave with strategies to defend your work so you can build systems that are not only smart but also safe and reliable.\r\n\r\nWhy does this matter? Because in climate tech, the stakes are incredibly high. The predictions we make and the systems we build influence the grid, energy policies, resource allocation, and consumers trust.\r\n\r\nDuring the talk, we'll cover:\r\n\r\n- How attacks on machine learning models can disrupt climate tech applications.\r\n- Examples of adversarial attacks, poisoned datasets, and supply chain vulnerabilities in renewable energy systems.\r\n- Practical steps to protect your machine learning pipelines.\r\n- Why security should be at the core of any ML project, especially in mission-critical fields like climate tech.\r\n\r\nOutline of the Talk:\r\n\r\n1. Why Security in Climate Tech Machine Learning Matters\r\n    - How machine learning is powering renewable energy and climate solutions.\r\n    - What can go wrong when systems are vulnerable.\r\n2. Breaking Down the OWASP ML Security Top 10\r\n    - Input manipulation: How attackers trick models with tampered data.\r\n    - Data poisoning: Real-life example of skewing optimization models with bad data.\r\n    - Supply chain attacks: How a hacked library could disrupt energy demand predictions.\r\n3. Real-World Impact of Attacks\r\n    - Manipulated energy consumption forecasts causing grid instability.\r\n    - Corrupted solar panel efficiency datasets leading to poor resource allocation.\r\n4. How to Protect Your Models\r\n    - How to spot tampered inputs.\r\n    - Data validation, cleaning and checking datasets.\r\n    - Best practices for safe use of open-source libraries.\r\n    - Monitoring and auditing: Setting up checks for unusual activity in your pipelines.\r\n\r\nKey Takeaways\r\n\r\n- Recap of risks and defences.\r\n- Practical steps you can take today to secure your ML systems.\r\n- A call to prioritize security as a core part of building trustworthy ML.\r\n\r\nClimate tech is one of the most exciting and meaningful areas to work in. The systems we\u2019re building have the potential to shape a more sustainable future. But if we don\u2019t make security a priority, we risk undermining the customer's trust. This talk will give you the tools and confidence to keep your machine learning models safe and ensure they\u2019re as reliable and impactful as they need to be.\r\n#ThereIsNoPlanetB", "recording_license": "", "do_not_record": false, "persons": [{"code": "HQ9CRY", "name": "Doreen Sacker", "avatar": "https://pretalx.com/media/avatars/HQ9CRY_mO2JeIu.jpg", "biography": "I'm an MLOps Engineer from Berlin working at the start-up 1KOMMA5\u00b0, and I'm part of the women's tech podcast Unmute IT. I aim to empower underrepresented groups to have a say in shaping the algorithms that impact our world today. Also, I\u2019m always on the lookout for the best coffee shop in town \u2615\ufe0f", "public_name": "Doreen Sacker", "guid": "0a51cd69-f3f4-5ebf-8dc9-529b86e4f7e4", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/HQ9CRY/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/CVMPVG/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/CVMPVG/", "attachments": []}, {"guid": "8be899bf-a368-5f4b-9da0-10d98452a6f9", "code": "SFDRTR", "id": 61257, "logo": null, "date": "2025-04-25T13:20:00+02:00", "start": "13:20", "duration": "00:30", "room": "Hassium", "slug": "pyconde-pydata-2025-61257-vector-streaming-the-memory-efficient-indexing-for-vector-databases", "url": "https://pretalx.com/pyconde-pydata-2025/talk/SFDRTR/", "title": "Vector Streaming: The Memory Efficient Indexing for Vector Databases", "subtitle": "", "track": "General: Rust", "type": "Talk", "language": "en", "abstract": "Vector databases are everywhere, powering LLMs. But indexing embeddings, especially multivector embeddings like ColPali and Colbert, at a bulk is memory intensive. Vector streaming solves this problem by parallelizing the tasks of parsing, chunking, and embedding generation and indexing it continuously chunk by chunk instead of bulk. This not only increase the speed but also makes the whole task more optimized and memory efficient.\r\n\r\nThe library gives many vector database supports, like Pinecone, Weavaite, and Elastic.", "description": "Embedding creation is mostly done synchronously; a lot of time is wasted while the chunks are being created, as chunking is not a compute-heavy operation. As the chunks are being made, passing them to the embedding model would be efficient. This problem further intensifies with late interaction embeddings like CoLBert or ColPali.\r\n\r\nThe solution is to create an asynchronous chunking and embedding task. We can effectively spawn threads to handle this task using Rust's concurrency patterns and thread safety. This is done using Rust's MPSC (Multi-producer Single Consumer) module, which passes messages between threads. Thus, this creates a stream of chunks passed into the embedding thread with a buffer. Once the buffer is complete, it embeds the chunks and sends the embeddings back to the main thread, where they are sent to the vector database. This ensures no time is wasted on a single operation and no bottlenecks. Moreover, only the chunks and embeddings in the buffer are stored in the system memory. They are erased from the memory once moved to the vector database.\r\n\r\nAll this is then bound into Python using pyo3 and maturin, so it's easily accessible from Python, but the core is still asynchronous with rust.", "recording_license": "", "do_not_record": false, "persons": [{"code": "FTZPWC", "name": "Sonam Pankaj", "avatar": "https://pretalx.com/media/avatars/FTZPWC_fwShrFm.jpg", "biography": "Sonam is the creator of the open-source library called Embed-Anything, which helps to create local and multimodal embeddings and index them efficiently to vector databases, it\u2019s built in rust and thus it\u2019s more greener and efficient. She works as the GenerativeAI Evangelist at Articul8, spun-off of Interl, Articul8 is the go-to generativeAI platform for enterprise.", "public_name": "Sonam Pankaj", "guid": "baefcab1-2ffc-5372-97d8-749afc1fff0b", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/FTZPWC/"}, {"code": "L8YCSG", "name": "Akshay Ballal", "avatar": "https://pretalx.com/media/avatars/L8YCSG_M4zlZk9.jpg", "biography": "I am and AI developer who loves Rust. I want to bring Rust to the AI Community. I am the author of EmbedAnything and Lumo.", "public_name": "Akshay Ballal", "guid": "8aaaf852-62e2-55f3-b2f3-c8fbfe7b9ad8", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/L8YCSG/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/SFDRTR/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/SFDRTR/", "attachments": []}, {"guid": "b775172f-5b72-5417-88ef-4d5d06534b30", "code": "JH97CL", "id": 61137, "logo": null, "date": "2025-04-25T14:00:00+02:00", "start": "14:00", "duration": "00:30", "room": "Hassium", "slug": "pyconde-pydata-2025-61137-pipeline-level-differentiable-programming-for-the-real-world", "url": "https://pretalx.com/pyconde-pydata-2025/talk/JH97CL/", "title": "Pipeline-level differentiable programming for the real world", "subtitle": "", "track": "PyData: Research Software Engineering", "type": "Talk", "language": "en", "abstract": "Automatic Differentiation (AD) is not only the backbone of modern deep learning but also a transformative tool across various domains such as control systems, materials science, weather prediction, 3D rendering, data-driven scientific discovery, and so on. Thanks to a mature ML framework ecosystem, powered by libraries like PyTorch and JAX, AD performs remarkably well at a component level; however, integrating these components into differentiable pipelines still remains a significant challenge. In this talk, we will provide an accessible introduction to (pipeline-level) AD, demonstrate some cool applications you can build with it, and see how to build differentiable pipelines that hold up in the real world.", "description": "The tools enabling automatic differentiation (AD), like JAX and PyTorch, are increasingly being adopted beyond machine learning to tackle optimization problems in various scientific and engineering contexts. These tools have catalyzed the development of differentiable simulators, solvers, 3D renderers, and other powerful components, under the umbrella of differentiable programming (DP).\r\n\r\nHowever, building pipelines that propagate gradients effortlessly across components introduces unique challenges. Real-world pipelines often span diverse technologies, frameworks (e.g., JAX, TensorFlow, PyTorch, Julia), computing environments (local vs. distributed clusters; CPU vs. GPU), and teams with varying expertise. Additionally, legacy systems and non-differentiable components often need to coexist with modern AD-enabled frameworks.\r\n\r\nThis talk will provide an overview of differentiable pipelines: why they matter and the types of optimization problems they address. We will revisit foundational concepts of automatic differentiation to set the stage for understanding the intricacies of orchestrating differentiable pipelines in Python.\r\n\r\nThen, using our open-source project, Tesseract, as a case study, we will share lessons learned and best practices for designing AD-friendly APIs with tools like Pydantic and FastAPI, achieving seamless integration with JAX, packaging scientific software, and enabling end-to-end systems-level optimization. \r\n\r\nAttendees will leave with practical insights on why they should care about differentiable programming, and how to overcome the challenges of building real-world differentiable pipelines.", "recording_license": "", "do_not_record": false, "persons": [{"code": "YHXMHL", "name": "Alessandro Angioi", "avatar": "https://pretalx.com/media/avatars/YHXMHL_4sWs7z0.jpg", "biography": "I work at the boundary between physical simulations and machine learning. I have 5+ years experience in machine learning and data science, and my background is in theoretical physics. Born in Sardinia, but I've been living in the Rhein-Neckar region for the past 10 years. Cat person.", "public_name": "Alessandro Angioi", "guid": "3adf5390-b120-5162-b324-8e7803d2182d", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/YHXMHL/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/JH97CL/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/JH97CL/", "attachments": []}, {"guid": "f949509f-3ede-5fb1-a0e9-15e04f6efb5c", "code": "ZT3MGL", "id": 60403, "logo": null, "date": "2025-04-25T14:40:00+02:00", "start": "14:40", "duration": "00:30", "room": "Hassium", "slug": "pyconde-pydata-2025-60403-from-rules-to-reality-python-s-role-in-shaping-roundnet", "url": "https://pretalx.com/pyconde-pydata-2025/talk/ZT3MGL/", "title": "From Rules to Reality: Python's Role in Shaping Roundnet", "subtitle": "", "track": "PyData: Data Handling & Engineering", "type": "Talk", "language": "en", "abstract": "Roundnet is a dynamic and fast-growing sport that combines quick reaction, athleticism, and strong community. However, like many emerging sports, it faces challenges in balancing competition, optimizing rules, and increasing accessibility for both players and spectators. This is where Python and data analysis come into play.\r\n\r\nIn this talk, I'll share insights from my role as Data Lead on the International Roundnet rule committee, where we use Python-powered data analysis to make informed decisions about the future of the sport. We'll explore how analyzing gameplay patterns and testing rule changes with simulation can lead to fairer, more exciting games and attract a broader audience.", "description": "Roundnet is a dynamic and fast-growing sport that combines quick reaction, athleticism, and strong community.  But what's truly unique about Roundnet is the opportunity it offers: as a new and emerging sport, we have the rare chance to shape its global rule changes entirely through data analysis. This is a groundbreaking approach \u2013 a first for any sport in the modern era.\r\n\r\nIn this talk, I\u2019ll share insights from my role as Data Lead on the International Roundnet Rule Committee and take you through how we are leveraging Python and data analysis to guide these changes. Over the past year, we\u2019ve collected rule proposals and have set up a series of experiments designed to test their effects. Using Python and statistical modeling, we\u2019re planning to select key tournaments worldwide this year to observe how rule adjustments impact gameplay data.\r\n\r\nOur ultimate goal is to discover if specific combinations of rule changes can make Roundnet fairer, more exciting, and accessible for players and spectators alike. This journey is an exploration of how data-driven decision-making can transform a sport from the ground up, using real-world insights and experimentation.\r\n\r\nThis talk will take you on the journey of how we set up our testing framework, what tools we\u2019re using, and how we\u2019ve employed Python-powered analysis to bring empirical evidence into the decision-making process. It will equip you with a new perspective on data's role in shaping real-world change \u2013 especially in grassroots movements, community building, and sports.", "recording_license": "", "do_not_record": false, "persons": [{"code": "HVSNHD", "name": "Larissa Haas", "avatar": "https://pretalx.com/media/avatars/HVSNHD_qOTHnaa.jpg", "biography": "I'm Squad Lead for Automation & Analytics, coordinating Process Automation projects and drafting solutions for intelligent enterprises. Within projects, I work as a Senior Data Scientist and Cloud Solution Architect, combining various SAP BTP services with Artificial Intelligence. \r\n\r\nIn our sovanta Innovation Factory I drive innovation and automation with traditional Artificial Intelligence and fancy Generative AI services, to facilitate business processes on SAP BTP and make SAP as easy as it never was before.\r\n\r\nSo if you like to chat about Artificial Intelligence, Science Fiction, bots gone rogue and seeking for world domination, or the Art of Python you're more than welcome to contact me!", "public_name": "Larissa Haas", "guid": "8076675d-8b92-5d5f-a648-f14bdf8a61d8", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/HVSNHD/"}], "links": [{"title": "Slides", "url": "https://github.com/LarissaHa/talks/blob/master/pyconde-2025/2025-04-From-Rules-to-Reality.pdf", "type": "related"}], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/ZT3MGL/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/ZT3MGL/", "attachments": []}], "Palladium": [{"guid": "5a08d292-898b-50c0-b596-1b45d96efd4b", "code": "89BX8V", "id": 61203, "logo": null, "date": "2025-04-25T10:15:00+02:00", "start": "10:15", "duration": "00:30", "room": "Palladium", "slug": "pyconde-pydata-2025-61203-towards-intelligent-monitoring-detecting-degraded-flame-torch-nozzles", "url": "https://pretalx.com/pyconde-pydata-2025/talk/89BX8V/", "title": "Towards Intelligent Monitoring: Detecting Degraded Flame Torch Nozzles", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Statistics", "type": "Talk", "language": "en", "abstract": "Flame cutting is a method where metals are efficiently cut using precise control of the oxygen jet and consistent mixing of fuel gas. The condition of the nozzle is changing over time: deposits formed during the cutting process can degrade the flame quality, reducing the precision of the cut. Traditionally, nozzles suspected of wear are sent back for manual inspection, where experts evaluated the flame visually and audibly to determine whether repair or replacement is needed. This project leverages machine learning to optimize this process by analyzing acoustic emission data.", "description": "Flame cutting is a technique that enables efficient metal cutting by precisely controlling the oxygen jet and maintaining a consistent mix of fuel gas. Over time, the nozzle\u2019s condition deteriorates as deposits accumulate during the cutting process, leading to a decline in flame quality and cutting precision. Currently, nozzle testing is performed manually, with experts assessing the flame based on its appearance and sound. This approach is risky because worn nozzles can remain in use, increasing the danger of high-temperature material being ejected. Moreover, it is a costly process, particularly when damage to industrial equipment occurs.\r\n\r\nLaboratory Evaluation: This section outlines the preliminary experiments aimed at assessing whether this sensor is suitable for distinguishing different machine states. The experiments focus on identifying the optimal sensor placement and analyzing how various machine states impact sensor readings. The design process for the laboratory experiments and the subsequent systematic data collection is shown. The results suggest that while detecting every machine state may not be feasible, the sensor shows promise in identifying degraded nozzles.\r\n\r\nData Preprocessing & Annotation: For a proof of concept, the raw acoustic emission data required manual labeling, as prior assessments depended on expert evaluations. Here, we utilized Label Studio, an annotation tool that streamlines the labeling process. \r\nModelling  & Feature Engineering: We extract features using statistical methods and transform the acoustic emission signals into the frequency domain through scipy, focusing on features in  the frequency domain.\r\n\r\nEvaluation: We discuss the approach for splitting the data, considering that multiple observations from the same nozzle are present. In a computational study, we evaluate the feature sets developed in the previous step using two different classification models: Support Vector Classifier and Multilayer Perceptron. This section explains how the experiments are computed and parallelized including the time required for execution.\r\n\r\nLastly, we discuss the dataset's limitations and the challenges faced during development. We also highlight steps taken to improve generalization and provide an outlook on future objectives, mostly aimed at a broader applicability of the models.", "recording_license": "", "do_not_record": false, "persons": [{"code": "LJLEJV", "name": "Dominik Falkner", "avatar": "https://pretalx.com/media/avatars/LJLEJV_FI3vXdO.png", "biography": "Dominik Falkner completed his bachelor's degree in Software Engineering in 2018 and his master's degree in Data Science and Engineering with a specialization in Data Analysis in Production and Marketing at Hagenberg University of Applied Sciences in 2020. \r\n\r\nDuring his studies, he already worked on various software systems, including some for collecting and storing data. Since 2019, he has been employed by the RISC Software GmbH as a Data Scientist, working in customer and research projects. His interests and focus lie in the following disciplines:\r\n\r\n    Employing machine learning techniques.\r\n    The fusion of expert knowledge and machine learning methods\r\n    Predictive and prescriptive analytics\r\n    Design and architecture of software systems\r\n\r\nIn the course of his work as a Data Scientist, he mainly deals with time series analyses and classification settings from various industries. In 2022 he will start his PhD studies at the Institute for Formal Models and Verification at Johannes Kepler University in Linz.", "public_name": "Dominik Falkner", "guid": "6ae11229-31b9-5ef6-8444-3158045a56bb", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/LJLEJV/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/89BX8V/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/89BX8V/", "attachments": [{"title": "Slides", "url": "/media/pyconde-pydata-2025/submissions/89BX8V/resources/degra_hu9cLTE.pptx", "type": "related"}]}, {"guid": "55ac1e00-f462-57f4-8013-221615e104b7", "code": "CZXBEP", "id": 61281, "logo": null, "date": "2025-04-25T10:55:00+02:00", "start": "10:55", "duration": "00:30", "room": "Palladium", "slug": "pyconde-pydata-2025-61281-filling-in-the-gaps-when-terraform-falls-short-python-and-typer-step-in", "url": "https://pretalx.com/pyconde-pydata-2025/talk/CZXBEP/", "title": "Filling in the Gaps: When Terraform Falls Short, Python and Typer Step In", "subtitle": "", "track": "General: Infrastructure - Hardware & Cloud", "type": "Talk", "language": "en", "abstract": "Not all resources in today\u2019s cloud environments have native Terraform providers. That\u2019s where Python\u2019s Typer library can step in, offering a flexible, production-ready command-line interface (CLI) framework to help fill in the gaps. In this session, we\u2019ll explore how to integrate Typer with Terraform to manage resources that fall outside Terraform\u2019s direct purview. We\u2019ll share a real-life example of how Typer was used alongside Terraform to automate and streamline the management of an otherwise unsupported API. You\u2019ll learn how Terraform can invoke Python scripts\u2014passing arguments and parameters to control complex operations\u2014while still benefiting from Terraform\u2019s declarative model and lifecycle management. We\u2019ll also discuss best practices for defining resource lifecycles to ensure easy maintainability and consistency across deployments. By the end, participants will see how combining Terraform\u2019s robust infrastructure-as-code approach with Python\u2019s versatility and Typer\u2019s user-friendly CLI can create a powerful, cohesive strategy for managing even the trickiest resources in production environments.", "description": "In this session, we\u2019ll address a common challenge in managing resources and APIs that lack native Terraform providers but still need to integrate seamlessly into your CI/CD pipeline. I\u2019ll demonstrate how Python\u2019s Typer library can help bridge this gap by offering a straightforward yet powerful command-line interface (CLI). I\u2019ll explain how to create and configure Typer applications, pass parameters, and integrate these scripts with Terraform. \r\n\r\n1. Problem Statement (Managing APIs or resources with incomplete Terraform provider support) - 5 mins\r\n2. Typer (Key components, advantages, and how to use in production enviroment) - 10 mins\r\n3. Terraform resources that can execute CLI and how to work with them - 10 mins\r\n4. Conclusion - 2 mins", "recording_license": "", "do_not_record": false, "persons": [{"code": "Z8HXML", "name": "Yuliia Barabash", "avatar": "https://pretalx.com/media/avatars/Z8HXML_xn1gnuK.png", "biography": "Over the last five years living in Germany, during which I have gained a diverse range of experiences in the tech industry. My expertise spans from developing web applications in Python to constructing AWS cloud solutions. I have a good understanding of design patterns, Object-Oriented Programming (OOP), event-driven architecture, and microservices architectures, REST API design and database technologies. I have hands-on experience with creating a web application as part of Cloud Foundation framework to manage and secure AWS accounts and creating a lightweight web application to quickly generate and provide results to users.", "public_name": "Yuliia Barabash", "guid": "8aca423b-d47d-545b-8d66-ce05184220d1", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/Z8HXML/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/CZXBEP/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/CZXBEP/", "attachments": []}, {"guid": "e4a0862b-b86b-57e4-a836-26c7682a4f06", "code": "PLMJZ8", "id": 60381, "logo": null, "date": "2025-04-25T11:35:00+02:00", "start": "11:35", "duration": "00:30", "room": "Palladium", "slug": "pyconde-pydata-2025-60381-code-community-the-synergy-of-community-building-and-task-automation", "url": "https://pretalx.com/pyconde-pydata-2025/talk/PLMJZ8/", "title": "Code & Community: The Synergy of Community Building and Task Automation", "subtitle": "", "track": "PyData: Natural Language Processing & Audio (incl. Generative AI NLP)", "type": "Talk", "language": "en", "abstract": "The Python community is built on a culture of support, inclusion, and collaboration. Sustaining this welcoming environment requires intentional community-building efforts, which often involve repetitive or time-consuming tasks. These tasks, however, can be automated without compromising their value\u2014freeing up time for meaningful human engagement.\r\n\r\nThis talk showcases my project aimed at supporting underrepresented groups in tech, specifically through building Python communities on Mastodon and Bluesky. A key part of this initiative is the \"Awesome PyLadies\" repository, a curated collection of PyLadies blogs and YouTube channels that celebrates their work. To enhance visibility, I created a PyLadies bot for social media. This bot automates regular posts and reposts tagged content, significantly extending their reach and fostering an engaged community.\r\n\r\nIn this session, I\u2019ll cover:\r\n- The role of automation in community building\r\n- The technical architecture behind the bot\r\n- A hands-on demo on integrating Google\u2019s Gemini into community tools\r\n- Upcoming features and opportunities for collaboration\r\n\r\nBy combining Python, automation, and modern AI capabilities, we can create thriving, inclusive communities that scale impact while staying true to the human-centered ethos of open source.", "description": "My planned outline for the talk is as follows:\r\n\r\n- **Introduction**: A brief overview of the project and its goals, focusing on community building and inclusivity within the Python ecosystem (3 minutes)\r\n- **The Importance of Visibility**: Explain the background of the project and why visibility is important (3 minutes)\r\n- **Bot Architecture and Setup**: A technical walkthrough of the bot, its architecture, and how it operates to extend the reach of community content on platforms like Mastodon or Bluesky (5 minutes)\r\n- **Hands-On Demo: Task Automation with Google\u2019s Gemini and GitHub Actions**: A step-by-step guide to integrating Google\u2019s Gemini and GitHub Actions for creating low-barrier, automated workflows tailored for community-building tasks (12 minutes)\r\n- **Looking Ahead**: Provide a forward-looking perspective (upcoming features of the project and future developments) (2 minutes)\r\n- **Q&A and Buffer** (5 minutes)\r\n\r\nI hope that the talk will inspire more Pythonistas to automate their tasks, and also more PyLadies to share material publicly and make the public perception of experts in the field more diverse.", "recording_license": "", "do_not_record": false, "persons": [{"code": "7GY7XH", "name": "Cosima Meyer", "avatar": "https://pretalx.com/media/avatars/7GY7XH_Js5SOKe.jpg", "biography": "Cosima Meyer is a data scientist with a strong focus on making machine learning models explainable and accessible. Passionate about trustworthy AI, she is committed to building systems that are not only technically robust but also transparent and ethical. As a Google's Women Techmakers Ambassador and an active member of PyLadies, Cosima is dedicated to fostering inclusive and collaborative communities, working to bridge the two groups and create spaces for knowledge-sharing and growth.\r\n\r\nDuring her PhD studies at the University of Mannheim, Cosima discovered her enthusiasm for sharing knowledge through technical blog posts and developing open-source software. Her work reflects a blend of technical expertise and a passion for community building, inspiring others to explore, learn, and contribute to the fields of AI and data science.", "public_name": "Cosima Meyer", "guid": "5b698689-8687-50a1-9c8a-9ec5ba8810b7", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/7GY7XH/"}], "links": [{"title": "Slides - Code & Community", "url": "https://drive.google.com/file/d/1vMlaJ3vbV7ONJg24dqBFIUp6sJUjYL_u/view?usp=share_link", "type": "related"}], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/PLMJZ8/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/PLMJZ8/", "attachments": []}, {"guid": "2da5b38c-8f2f-50ae-89cd-ae180c8934a5", "code": "98FQDY", "id": 61889, "logo": null, "date": "2025-04-25T13:20:00+02:00", "start": "13:20", "duration": "00:30", "room": "Palladium", "slug": "pyconde-pydata-2025-61889-what-we-talk-about-when-we-talk-about-ai-skills", "url": "https://pretalx.com/pyconde-pydata-2025/talk/98FQDY/", "title": "What we talk about when we talk about AI skills.", "subtitle": "", "track": "General: Education, Career & Life", "type": "Talk", "language": "en", "abstract": "Defining what constitutes AI skills has always been ambiguous. As AI adoption accelerates across industries and the European AI Act mandates companies to ensure AI literacy among their staff, organizations face growing even more challenges in defining and developing AI competencies. In this talk, we'll present a comprehensive framework developed by the appliedAI Institute's experts that categorizes AI skills across technical, regulatory, strategic, and innovation domains. We'll also share initial data on current AI skills levels and upskilling needs and provide practical strategies for organizations to assess, develop, and acquire the AI capabilities required for their specific needs.", "description": "What it means to \"work in/with AI\" and the corresponding roles, tasks, and required skills have been ambiguous since the emergence of AI professionals in industry. And while the demand for AI-skilled professionals continues to grow, both organizations and individuals seeking to work in the AI field often struggle with two challenges: 1) clearly defining the competencies and responsibilities that positions and projects require, and 2) identifying appropriate upskilling opportunities to match these needs.\r\nThe urgency to upskill professionals in AI topics has not only become more nuanced since the emergence of generative AI but is also growing rapidly. This trend is further amplified by the upcoming European AI Act, which will soon require companies to \"ensure, to their best extent, a sufficient level of AI literacy among their staff.\" This regulation has created an urgent need to define and understand what constitutes AI literacy and AI skills in practical terms.\r\n\r\nTo help organizations and professionals navigate this landscape, we have developed a comprehensive framework categorizing AI skills into distinct domains spanning technical competencies, regulatory knowledge, AI strategy, and ecosystem understanding. Our framework, developed by the multidisciplinary team of AI experts in the appliedAI Institute for Europe, provides a structured approach to defining skill requirements, guiding career development, identifying training gaps, and helping educational providers align their offerings with market demands.\r\n\r\nIn this presentation, we will introduce our framework and share initial data reflecting the current state of AI skills levels and upskilling needs across a sample of companies. We will also discuss practical strategies for implementing this AI Skills framework within organizations, enabling them to better assess, develop, and acquire the AI capabilities they need to fulfill their specific needs.", "recording_license": "", "do_not_record": false, "persons": [{"code": "WVNMPG", "name": "Paula Gonzalez Avalos", "avatar": "https://pretalx.com/media/avatars/WVNMPG_opng10Z.png", "biography": "Data Nerd & Python Pydata community lover. AI education specialist with five years of experience shaping data science and AI educational offers. Currently leading the AI Academy at the appliedAI Institute for Europe.", "public_name": "Paula Gonzalez Avalos", "guid": "50a454c2-0f8e-5d15-9a7a-b91319a30558", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/WVNMPG/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/98FQDY/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/98FQDY/", "attachments": []}, {"guid": "06ce3f5a-aa19-5757-a0fd-45bbfe73f851", "code": "B8TUR9", "id": 60771, "logo": null, "date": "2025-04-25T14:00:00+02:00", "start": "14:00", "duration": "00:30", "room": "Palladium", "slug": "pyconde-pydata-2025-60771-optimizing-energy-tariffing-system-with-formal-concept-analysis-and-dash", "url": "https://pretalx.com/pyconde-pydata-2025/talk/B8TUR9/", "title": "Optimizing Energy Tariffing System with Formal Concept Analysis and Dash", "subtitle": "", "track": "PyData: Visualisation & Jupyter", "type": "Talk", "language": "en", "abstract": "As a data scientist, I value the power of insightful visualizations to unlock unique interpretations of complex data. In my talk, I will introduce an elegant mathematical framework called Formal Concept Analysis (FCA), developed in the 1980s in Darmstadt.\r\n\r\nFCA transforms binary data into concepts that can be visualized as a hierarchical graph, offering a fresh perspective on multidimensional data analysis. Leveraging this theory and its open-source Python libraries, I am developing an interactive Dash-based tool featuring interactive tables and graphs to explore data insights.\r\n\r\nTo illustrate its potential, I will showcase an optimization of the entire tariffing system of an energy provider company, highlighting how FCA can bring structure and clarity to even such tangled datasets.", "description": "My goal is to introduce Formal Concept Analysis (FCA) as a fascinating mathematical framework. I aim to inspire Python enthusiasts to explore its potential and uncover insights in their data analysis tasks. The talk is divided into three sections:\r\n\r\n1 FCA Basics\r\n\r\n- What is a \"concept\"? *First, I am going to introduce the main terms used in FCA and define the central object of the theory - the formal concept.*\r\n\r\n- Illustrative example. *To show the power of FCA in action, I will provide a relatable example to explain the hierarchical structure of the graph visualization.*\r\n\r\n2 Python Implementation\r\n\r\n- ``fcapy`` Python library. *Core functionality overview of the library and the data formats it can use.*\r\n\r\n- Introducing interactivity with Python Dash: *Enhancing exploration and user experience with interactive tables (AG Grid) and dynamic graph visualizations (Cytoscape).*\r\n\r\n3 Applications and Practical Relevance \r\n\r\n- Use Case: Energy Tariffing System Optimization. *In this section, I am going to showcase the real data in its original complexity and the optimization process of identifying redundancies, overlaps, or inefficiencies.*\r\n- Examples of other  applications and key takeaways", "recording_license": "", "do_not_record": false, "persons": [{"code": "RTUMVM", "name": "Dr. Irina Smirnova-Pinchukova", "avatar": "https://pretalx.com/media/avatars/RTUMVM_znDxfQb.png", "biography": "After my PhD in Astronomy @ Max Planck Institute for Astronomy in Heidelberg, I have switched from academia to industry. Working as a Data Scientist @ DSC GmbH I am developing in python for various projects including those involving language models. I am attending PyData meetings in Heidelberg and even presented a lightning talk on my \"Croshapes\" hobby project.", "public_name": "Dr. Irina Smirnova-Pinchukova", "guid": "801466a3-013c-5887-99b1-9a09bc676817", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/RTUMVM/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/B8TUR9/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/B8TUR9/", "attachments": []}, {"guid": "c6ef2c4c-0cbc-5a28-9fc8-3aed960a4470", "code": "HKYQDB", "id": 61375, "logo": null, "date": "2025-04-25T14:40:00+02:00", "start": "14:40", "duration": "00:30", "room": "Palladium", "slug": "pyconde-pydata-2025-61375-langfuse-openlit-and-phoenix-observability-for-the-genai-era", "url": "https://pretalx.com/pyconde-pydata-2025/talk/HKYQDB/", "title": "Langfuse, OpenLIT, and Phoenix: Observability for the GenAI Era", "subtitle": "", "track": "PyCon: Python Language & Ecosystem", "type": "Talk", "language": "en", "abstract": "Large Language Models (LLMs) are transforming digital products, but their non-deterministic behaviour challenges predictability and testing, making observability essential for quality and scalability.\r\n\r\nThis talk presents **observability for LLM-based applications**, spotlighting three tools: Langfuse, OpenLIT, and Phoenix. We'll share best practices about what and how to monitor LLM features and explore each tool's strengths and limitations. \r\n\r\nLangfuse excels in tracing and quality monitoring but lacks OpenTelemetry support and customization. OpenLIT, while less mature, integrates well with existing observability stacks using **OpenTelemetry**. Phoenix stands out in debugging and experimentation but struggles with real-time tracing.\r\n\r\nThe comparison will be enhanced by **live coding examples**.\r\n\r\nAttendees will walk away with an improved understanding of observability for **GenAI applications** and will understand which tool to use for their use case.", "description": "Large Language Models (LLMs) are becoming core components of modern digital products. However, their **non-deterministic nature** means that their behaviour cannot be fully predicted or tested before deployment. This makes **observability** an essential practice for building and maintaining applications with generative AI features.\r\n\r\nThis session focuses on observability in LLM-based systems.\r\n\r\nWe start by motivating why monitoring and understanding your application is key to ensuring quality, reliability, and scalability. We\u2019ll analyze three leading tools for observability in this domain: **Langfuse**, **OpenLIT**, and **Phoenix**. Each has unique strengths and challenges that make them suitable for different use cases.\r\n\r\nThrough examples and real-world scenarios, we\u2019ll explore:\r\n\r\n- How **Langfuse** provides detailed tracing and quality monitoring through developer-friendly APIs. While it supports multi-step workflows effectively, it lacks support for the OpenTelemetry protocol and can be difficult to customize for non-standard use cases.\r\n- Why **OpenLIT**, built on OpenTelemetry, offers strong observability for distributed systems. Although it is the least mature of the three tools, it integrates well with established observability stacks and has promising potential for future growth.\r\n- Where **Phoenix** fits into the process by combining experimentation and debugging capabilities with evaluation pipelines. Its strength lies in development-focused observability, but it has limitations in handling real-time tracing once systems are in production.\r\n\r\nThis talk will provide a clear, straightforward comparison of these tools, helping you understand which option best fits your LLM applications.\r\n\r\nYou\u2019ll leave with practical insights into how observability can enhance the reliability and performance of your generative AI systems.", "recording_license": "", "do_not_record": false, "persons": [{"code": "9CGGBC", "name": "Emanuele Fabbiani", "avatar": "https://pretalx.com/media/avatars/9CGGBC_JGHNmeM.png", "biography": "Emanuele is an engineer, researcher, and entrepreneur with a passion for artificial intelligence.\r\n\r\nHe earned his PhD by exploring time series forecasting in the energy sector and spent time as a guest researcher at EPFL in Lausanne. Today, he is co-founder and Head of AI at xtream, a boutique company that applies cutting-edge technology to solve complex business challenges.\r\n\r\nEmanuele is also a contract professor in AI at the Catholic University of Milan. He has published eight papers in international journals and contributed to over 30 international conferences worldwide. His engagements include AMLD Lausanne, ODSC London, WeAreDevelopers Berlin, PyData Berlin, PyData Paris, PyCon Florence, the Swiss Python Summit in Zurich, and Codemotion Milan.\r\n\r\nEmanuele has been a guest lecturer at Italian, Swiss, and Polish universities.", "public_name": "Emanuele Fabbiani", "guid": "c296f2b6-fd5d-5c56-8de9-22854fda30ff", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/9CGGBC/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/HKYQDB/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/HKYQDB/", "attachments": []}], "Ferrum": [{"guid": "22fec43b-ed47-58f2-a21e-cdb4cdeb45d1", "code": "SVLRGG", "id": 61420, "logo": null, "date": "2025-04-25T10:15:00+02:00", "start": "10:15", "duration": "01:30", "room": "Ferrum", "slug": "pyconde-pydata-2025-61420-agentic-ai-build-a-multi-agent-application-with-crewai", "url": "https://pretalx.com/pyconde-pydata-2025/talk/SVLRGG/", "title": "Agentic AI: Build a Multi-Agent Application with CrewAI", "subtitle": "", "track": "PyData: Generative AI", "type": "Tutorial", "language": "en", "abstract": "This hands-on tutorial will dive into the fundamentals of building multi-agent systems using the CrewAI Python library. Starting from the basics, we\u2019ll cover key concepts, explore advanced features, and guide you step-by-step through building a complete application from scratch. We\u2019ll discuss implementing guardrails, securing interactions, and preventing query injection vulnerabilities along the way.", "description": "### **Short Abstract**  \r\n**Agentic AI: Build a Multi-Agent Application with CrewAI**  \r\nIn this hands-on tutorial, we\u2019ll dive into the fundamentals of building multi-agent systems using the CrewAI Python library. Starting from the basics, we\u2019ll cover key concepts, explore advanced features, and guide you step-by-step through building a complete application from scratch. Along the way, we\u2019ll discuss implementing guardrails, securing interactions, and preventing query injection vulnerabilities.\r\n\r\n---\r\n\r\n### **Detailed Description**  \r\nThis tutorial introduces **Agentic AI**\u2014a design approach where multiple agents collaborate to solve complex tasks efficiently. Using the **CrewAI Python library**, we\u2019ll start with the fundamentals and progressively move towards advanced concepts, focusing on practical implementation.\r\n\r\n#### **What We\u2019ll Cover:**  \r\n1. **Understanding Agentic AI:** Core principles and why multi-agent systems are valuable.  \r\n2. **Getting Started with CrewAI:** Setting up the library and creating simple agents.  \r\n3. **Advanced Agent Interactions:** Defining workflows, collaboration patterns, and communication protocols.  \r\n4. **Building from Scratch:** Step-by-step guide to developing a complete multi-agent application.  \r\n5. **Implementing Guardrails:** Techniques to ensure agents operate within defined constraints.  \r\n6. **Preventing Query Injection:** Strategies for securing agent queries against malicious inputs.  \r\n\r\n#### **Why Attend?**  \r\nBy the end of this session, you\u2019ll have hands-on experience building an agent-based application, understand how to implement security measures, and be equipped with best practices for maintaining control over agent behavior. Whether you're new to agentic systems or looking to refine your skills, this tutorial will provide both the theory and the practical insights needed to start building with CrewAI.  \r\n\r\n**Prerequisites:**   \r\n- **OpenAI Key** or another LLM or Cloud provider. This is needed to implement the solutions.\r\n- **SerperDev Tool Key** from https://serper.dev/. (Free Trial is more than enough)\r\n- Familiarity with Python and basic AI concepts will help you get the most out of this session.\r\n\r\nLINK TO THE WORKSHOP WEBSITE: https://pigna90.github.io/crewai-workshop-pyconde-2025", "recording_license": "", "do_not_record": false, "persons": [{"code": "9NYEDM", "name": "Alessandro Romano", "avatar": "https://pretalx.com/media/avatars/9NYEDM_hXkevkp.jpg", "biography": "Alessandro is a highly experienced data scientist with a Bachelor\u2019s degree in computer science and a Master\u2019s in data science. He has collaborated with various companies and organizations and currently holds the role of senior data scientist at logistics giant Kuehne+Nagel. Alessandro is particularly passionate about statistics and digital experimentation and has a strong track record of applying these skills to solve complex problems. He shares his knowledge regularly, speaking at events like the Data Innovation Summit and ODSC.", "public_name": "Alessandro Romano", "guid": "366abd23-847c-5dc0-b869-6756cbf9802a", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/9NYEDM/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/SVLRGG/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/SVLRGG/", "attachments": []}, {"guid": "e377d53c-4d47-5331-9ac7-d2b927d80dad", "code": "VBW3EK", "id": 61808, "logo": null, "date": "2025-04-25T13:05:00+02:00", "start": "13:05", "duration": "01:30", "room": "Ferrum", "slug": "pyconde-pydata-2025-61808-reinforcement-learning-for-finance", "url": "https://pretalx.com/pyconde-pydata-2025/talk/VBW3EK/", "title": "Reinforcement Learning for Finance", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Statistics", "type": "Tutorial", "language": "en", "abstract": "Reinforcement Learning and related algorithms, such as Deep Q-Learning (DQL), have led to major breakthroughs in different fields. DQL, for example, is at the core of the AIs developed by DeepMind that achieved superhuman levels in such complex games as Chess, Shogi, and Go (\"AlphaGo\", \"AlphaZero\"). Reinforcement Learning can also be beneficially applied to typical problems in finance, such as algorithmic trading, dynamic hedging of options, or dynamic asset allocation. The workshop addresses the problem of limited data availability in finance and solutions to it, such as synthetic data generation through GANs. It also shows how to apply the DQL algorithm to typical financial problems. The workshop is based on my new O'Reilly book \"Reinforcement Learning for Finance -- A Python-based Introduction\".", "description": "Reinforcement Learning and related algorithms, such as Deep Q-Learning (DQL), have led to major breakthroughs in different fields. DQL, for example, is at the core of the AIs developed by DeepMind that achieved superhuman levels in such complex games as Chess, Shogi, and Go (\"AlphaGo\", \"AlphaZero\"). Reinforcement Learning can also be beneficially applied to typical problems in finance, such as algorithmic trading, dynamic hedging of options, or dynamic asset allocation. The workshop addresses the problem of limited data availability in finance and solutions to it, such as synthetic data generation through GANs. It also shows how to apply the DQL algorithm to typical financial problems.\r\n\r\nThe workshop covers the following topics:\r\n\r\n* Learning through interaction\r\n* Deep Q-Learning applied to Finance\r\n* Synthetic Data Generation\r\n* Dynamic Asset Allocation with DQL\r\n\r\nThe workshop is based on my new O'Reilly book \"Reinforcement Learning for Finance -- A Python-based Introduction\".", "recording_license": "", "do_not_record": false, "persons": [{"code": "YL8CRG", "name": "Dr. Yves J. Hilpisch", "avatar": "https://pretalx.com/media/avatars/YL8CRG_u0EcvAG.jpg", "biography": "Dr. Yves J. Hilpisch is the founder and CEO of The Python Quants (https://tpq.io), a group focusing on the use of open source technologies for financial data science, artificial intelligence, algorithmic trading, computational finance, and asset management.\r\n\r\nYves has a Diploma in Business Administration, a Ph.D. in Mathematical Finance, and is Adjunct Professor for Computational Finance.\r\n\r\nYves is the author of seven books (https://home.tpq.io/books):\r\n\r\n* Reinforcement Learning for Finance (2024, O\u2019Reilly)\r\n* Financial Theory with Python (2021, O\u2019Reilly)\r\n* Artificial Intelligence in Finance (2020, O\u2019Reilly)\r\n* Python for Algorithmic Trading (2020, O\u2019Reilly)\r\n* Python for Finance (2018, 2nd ed., O\u2019Reilly)\r\n* Listed Volatility and Variance Derivatives (2017, Wiley Finance)\r\n* Derivatives Analytics with Python (2015, Wiley Finance)\r\n\r\nYves is the director of Certificate in Python for Finance (CPF) Program, a comprehensive, systematic online training program preparing students, academics, and professionals alike for the challenges faced by financial institutions in data science, computation, trading, and artificial intelligence. He also lectures on computational finance, machine learning, and algorithmic trading at the CQF Program (http://cqf.com).\r\n\r\nYves is the originator of the financial analytics library DX Analytics (http://dx-analytics.com) and organizes Meetup group events, conferences, and Bootcamps about Python, artificial intelligence, and algorithmic trading in London (http://pqf.tpq.io) and New York (http://aifat.tpq.io). He has given keynote speeches at technology conferences in the United States, Europe, and Asia.", "public_name": "Dr. Yves J. Hilpisch", "guid": "f761da25-b65b-591f-895b-cfff4cb7452f", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/YL8CRG/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/VBW3EK/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/VBW3EK/", "attachments": []}, {"guid": "a7a5e1ed-5f7f-5b9a-b914-928f16191065", "code": "RQ8JBM", "id": 61237, "logo": null, "date": "2025-04-25T14:40:00+02:00", "start": "14:40", "duration": "00:30", "room": "Ferrum", "slug": "pyconde-pydata-2025-61237-intuitive-a-b-test-evaluations-for-coders", "url": "https://pretalx.com/pyconde-pydata-2025/talk/RQ8JBM/", "title": "Intuitive A/B Test Evaluations for Coders", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Statistics", "type": "Talk", "language": "en", "abstract": "A/B testing is a critical tool for making data-driven decisions, yet its statistical underpinnings\u2014p-values, confidence intervals, and hypothesis testing\u2014are often challenging for those without a background in statistics. Coders frequently encounter these concepts but lack a straightforward way to compute and interpret them using their existing skill set.\r\nThis talk presents a practical approach to A/B test evaluations tailored for coders. By utilizing Python\u2019s random number generator and basic loops, it introduces bootstrapping as an accessible method for calculating p-values and confidence intervals directly from data. The goal is to simplify statistical concepts and provide coders with an intuitive understanding of how to evaluate test results without relying on complex formulas or statistical jargon.", "description": "Making A/B Test Evaluations Intuitive for Coders: A Python-Based Approach\r\n\r\nA/B testing is an essential method for data-driven decision-making, but interpreting the results can be daunting. Complex jargon around p-values and confidence intervals often creates barriers to understanding. This talk simplifies A/B testing by introducing a practical, Python-powered approach using bootstrapping\u2014a flexible and accessible method that aligns with how software engineers think and works without requiring statistical knowledge.\r\n\r\nSession Highlights:\r\n\r\n1. Statistical Significance and Hypothesis Testing:\r\n    * Why is statistical testing crucial for A/B tests? Simple comparisons overlook randomness.\r\n    * Using Python, we\u2019ll demonstrate how to simulate \"what-if\" scenarios by shuffling and resampling data, allowing participants to compute p-values and understand the likelihood of observed differences occurring by chance.\r\n2. Confidence Intervals with Bootstrapping:\r\n    * Confidence intervals clarify the range of plausible outcomes.\r\n    * We\u2019ll explore how to resample experiment data repeatedly to estimate variability and construct intuitive confidence intervals\u2014all using basic tools like random number generators and loops, without requiring advanced math.\r\n    * \r\nKey Takeaways:\r\n* Hands-on skills to compute p-values and confidence intervals using basic programming concepts.\r\n* Clear, step-by-step demonstrations of shuffling, resampling, and generating statistical insights.\r\n* Practical knowledge to move beyond black-box libraries and understand the \"why\" and \"how\" behind A/B test evaluations.\r\n\r\nBy the end of the session, attendees will be equipped to demystify A/B testing with a coder-friendly workflow, empowering them to make confident, data-driven decisions in their projects.\r\n\r\nTalk Outline:\r\n\r\n1. Setting the Stage (5 minutes)\r\n    * What is A/B testing?\r\n    * Why isn't it enough to just compare numbers? Why do we need statistics to interpret results?\r\n2. Statistical Significance and P-Values (5 minutes)\r\n    * Statistical tests (t-test, z-test, binomial test) are frequently used, but what is the intuition behind them?\r\n    * Introducing the basic idea of bootstrapping.\r\n3. Bootstrapping Explained (8 minutes)\r\n    * Step-by-step illustration of the bootstrapping approach.\r\n    * What is a p-value? An intuitive description using resampling.\r\n4. Confidence Intervals Explained (7 minutes)\r\n    * Importance of confidence intervals and how they help interpret results.\r\n    * Intuitive computation of confidence intervals using bootstrapping.\r\n    * Impact of sample size on confidence intervals and certainty.\r\n5. Why These Statistics Matter (5 minutes)\r\n    * Discussion on the practical necessity of statistical techniques.\r\n    * How these methods ensure data-driven decision-making in A/B testing.", "recording_license": "", "do_not_record": false, "persons": [{"code": "ED3NBU", "name": "Thomas Mayer", "avatar": "https://pretalx.com/media/avatars/ED3NBU_8FsU28V.jpg", "biography": "**Thomas Mayer** holds a PhD in Quantitative Language Comparison and brings a profound background in Machine Learning and Natural Language Processing (NLP) to his work. As Team Lead in the Data Intelligence team at HolidayCheck, Thomas combines his passion for data-driven insights with his expertise in linguistics and AI to drive innovation in the travel industry. With a deep understanding of both technical and business challenges, he plays a pivotal role in leveraging data to enhance customer experiences and inform strategic decisions.", "public_name": "Thomas Mayer", "guid": "3418a251-5719-512a-ab33-a70e9f90d549", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/ED3NBU/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/RQ8JBM/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/RQ8JBM/", "attachments": []}], "Dynamicum": [{"guid": "12551859-405e-51fd-b552-b2f7e9012e09", "code": "WJPEQH", "id": 60503, "logo": null, "date": "2025-04-25T10:15:00+02:00", "start": "10:15", "duration": "01:30", "room": "Dynamicum", "slug": "pyconde-pydata-2025-60503-the-mighty-dot-customize-attribute-access-with-descriptors", "url": "https://pretalx.com/pyconde-pydata-2025/talk/WJPEQH/", "title": "The Mighty Dot - Customize Attribute Access with Descriptors", "subtitle": "", "track": "PyCon: Python Language & Ecosystem", "type": "Tutorial", "language": "en", "abstract": "Whenever you use a dot after an object in Python you access an attribute. While this seems a very simple operation, behind the scenes many things can happen. This tutorial looks into this mechanism that is regulated by descriptors. You will learn how a descriptor works and what kind of problems it can help to solve. Python properties are based on descriptors and solve one type of problems. Descriptors are more general, allow more use cases, and are more re-usable. Descriptors are an advanced topic. But once mastered, they provide a powerful tool to hide potentially complex behavior behind a simple dot.", "description": "Whenever you use a dot in Python you access an attribute. While this seems a very simple operation,\r\nbehind the scenes many things can happen. This tutorial looks into this mechanism that is regulated by descriptors. You will learn how a descriptor works and what kind of problems it can help to solve.\r\nPython properties are based on descriptors and solve one type of problems. Descriptors are more general, allow more use cases, and are more re-usable. Descriptors are an advanced topic. But once mastered, they provide a powerful tool to hide potentially complex behavior behind a simple dot.\r\n\r\nIn this tutorial you will:\r\n\r\n* Learn how to use Python's descriptors to add new functionality to attribute access\r\n* Acquired solid background knowledge on how descriptors work\r\n* Work with practical examples for applying descriptors\r\n* Learn when to use a property or reach for a descriptor\r\n* Get to know how popular Python libraries apply descriptors for tasks such as\r\n  data structure access, REST-APIs, ORMs, and serialization", "recording_license": "", "do_not_record": false, "persons": [{"code": "9KSJ3K", "name": "Mike M\u00fcller", "avatar": "https://pretalx.com/media/avatars/9KSJ3K_7arDp8I.jpg", "biography": "I've been a Python user since 1999, teaching Python professionally since 2004.\r\nI am also active in the community, organizing Python conferences such as\r\nPyCon DE, EuroSciPy, and BarCamps.\r\nI am a PSF Fellow, PSF Community Service Award winner,\r\nand chair of the German Python Software Verband.", "public_name": "Mike M\u00fcller", "guid": "c84d882a-39c5-51be-b6d5-7b7408e7002b", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/9KSJ3K/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/WJPEQH/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/WJPEQH/", "attachments": []}, {"guid": "471c929d-26f9-565d-abf7-66be7aed0596", "code": "LBKU3T", "id": 60835, "logo": null, "date": "2025-04-25T13:05:00+02:00", "start": "13:05", "duration": "01:30", "room": "Dynamicum", "slug": "pyconde-pydata-2025-60835-what-s-inside-the-box-building-a-deep-learning-framework-from-scratch", "url": "https://pretalx.com/pyconde-pydata-2025/talk/LBKU3T/", "title": "What's inside the box? Building a deep learning framework from scratch.", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Statistics", "type": "Tutorial", "language": "en", "abstract": "Explore the inner workings of deep learning frameworks like TensorFlow and PyTorch by building your own in this workshop. We will start with the fundamental automatic differentiation mechanics and proceed to implementing more complex components like layers, modules and optimizers. This workshop is mainly designed for experienced data scientists, who want to expand their intuition about lower level framework internals.", "description": "Data scientists typically concentrate on the mathematical foundations when designing and training neural networks, often treating the process by which deep learning frameworks link high-level code with lower-level mathematical operations as a black box. As a result, the internal workings of these frameworks are frequently overlooked.\r\n\r\nThis workshop is aimed to open the black box by letting the participants construct a small deep learning framework from scratch. We will begin with creating a simple automatic differentiation engine, followed by more advanced elements such as modules, and optimizers.\r\n\r\nAs a result, the participants will be able to construct and train a neural networks architecture using the framework they have built in just 1.5 hours.\r\n\r\nThe detailed text guide and solutions for all of the exercises are going to be provided as a public GitHub repository.\r\n\r\nAfter constructing the framework from scratch, the participants will gain a comprehensive understanding of:\r\n- the inner workings of deep learning frameworks;\r\n- the mapping of high-level framework components to lower-level operations;\r\n- the operational principles of autograd engine and dynamic computational graphs;\r\n- higher-level abstractions such as modules, and their mechanisms of automatic parameters tracking;\r\n\r\n**Target audience**\r\n\r\nThis workshop is primarily intended for those with some experience in building deep learning models using popular frameworks like PyTorch, TensorFlow, or JAX. However, prior experience is not absolutely mandatory, as essential fundamentals will be briefly covered.\r\n\r\n**Outline**\r\n\r\nIntroduction, motivation and essential theory [15 min]\r\nImplementation [60 min]\r\nTensors + autograd engine [25 min]\r\nModules and layers [25 min]\r\nOptimizers [10 min]\r\nUsing the framework to build and train the model [10 min]\r\nConcluding remarks + sharing bonus exercises [5 min]", "recording_license": "", "do_not_record": false, "persons": [{"code": "NWAQCX", "name": "Oleh Kostromin", "avatar": "https://pretalx.com/media/avatars/NWAQCX_aiMHOjX.png", "biography": "I am a Data Scientist primarily focused on Deep Learning and MLOps. In my spare time I contribute to several open-source python libraries.", "public_name": "Oleh Kostromin", "guid": "68c7801e-b9b7-5c17-bc44-d5e705e5c269", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/NWAQCX/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/LBKU3T/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/LBKU3T/", "attachments": []}, {"guid": "2c89e629-7195-5bdc-ae77-d8986c2c4529", "code": "YLKDJK", "id": 61781, "logo": null, "date": "2025-04-25T14:40:00+02:00", "start": "14:40", "duration": "00:30", "room": "Dynamicum", "slug": "pyconde-pydata-2025-61781-the-forecast-whisperer-secrets-of-model-tuning-revealed", "url": "https://pretalx.com/pyconde-pydata-2025/talk/YLKDJK/", "title": "The Forecast Whisperer: Secrets of Model Tuning Revealed", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Statistics", "type": "Talk", "language": "en", "abstract": "Forecasting can often feel like interpreting vague signals\u2014unclear yet full of potential. In this talk, we\u2019ll cover advanced techniques for tuning forecasting models in professional settings, moving beyond the basics to explore methods that enhance both accuracy and interpretability.\r\n\r\nYou\u2019ll learn:\r\n\r\nHow to set clear business goals for ML model tuning and align technical work with business needs, including balancing forecast granularity and accuracy and selecting statistically correct metric.\r\n\r\nPractical data preparation methods, including business-driven data cleaning and detecting data problems with statistical and buiness driven approaches.\r\n\r\nAdvanced feature selection techniques such as recursive feature elimination and SHAP values, alongside hyperparameter tuning strategies including Bayesian optimization and ensemble methods.\r\n\r\nHow generative AI can support model tuning by automating feature generation, hyperparameter search, and enhancing model explainability through SHAP and LIME techniques.\r\n\r\nReal-world case studies, including how Blue Yonder\u2019s data science team optimized demand forecasting models for retail and supply chain applications.\r\n\r\nWe'll also discuss common mistakes like overfitting and data leakage, best practices for reliable validation, and the importance of domain knowledge in successful forecasting. Whether you're a seasoned data scientist or exploring time series forecasting, you'll gain advanced insights and techniques you can apply immediately.", "description": "Forecasting can often feel like trying to make sense of unclear patterns\u2014difficult to interpret but rich with potential. This talk clarifies the process, focusing on actionable steps for tuning forecasting models in professional environments where accuracy and performance drive business outcomes.\r\n\r\n1. Defining Clear Business Objectives:\r\n\r\nImportance of aligning machine learning efforts with tangible business goals.\r\nScoping forecasting problems and selecting appropriate success metrics.\r\n\r\n2. Data Preparation Techniques:\r\n\r\nCleaning data with a focus on business relevance and systematically enriching it\r\nIn addition, we show how to tune the model by tuning the data nad the corresponding feature engineering. \r\n\r\n3. Feature Selection and Hyperparameter Tuning:\r\n\r\nAdvanced feature selection strategies and their impact on model performance.\r\nTechniques for identifying impactful features.\r\nBest practices for hyperparameter tuning and optimization strategies.\r\n\r\n4. The Role of interpretability and Generative AI in Model Tuning:\r\n\r\nAutomating feature generation.\r\nHyperparameter optimization techniques using generative AI.\r\nmodel tuning through model interpretation\r\n\r\n5. Real-World Applications and Case Studies:\r\n\r\nHow Blue Yonder improved retail forecast accuracy.\r\nLessons learned from industry case studies.\r\n\r\n6. Common Pitfalls and Best Practices:\r\n\r\nTypical mistakes made during model tuning.\r\n\r\nBest practices for ensuring model reliability and relevance.\r\nThe importance of domain knowledge in successful forecasting.\r\n\r\nConclusion:\r\nWhether you are a seasoned data scientist or just starting your forecasting journey, this session will provide you with actionable insights to fine-tune your forecasting models effectively. Expect practical techniques, real-world examples, and expert tips that you can apply immediately. Join us and learn how better forecasts lead to better business decisions.", "recording_license": "", "do_not_record": false, "persons": [{"code": "YTUJ88", "name": "Illia Babounikau", "avatar": "https://pretalx.com/media/avatars/YTUJ88_5mSDi6i.jpg", "biography": "Dr. Illia Babounikau is an accomplished data scientist with extensive expertise in machine learning and forecasting. He holds a Ph.D. in Physics from Hamburg University and initially pursued an academic career, focusing on large-scale data analysis and machine learning applications. His contributions have been instrumental in international scientific collaborations, including the CMS experiment at CERN\u2019s Large Hadron Collider and the COMET project at J-PARC.\r\n\r\nFor the past five years, Dr. Babounikau has been a Data Scientist at Blue Yonder, specializing in developing and fine-tuning advanced forecasting models for retail planning and inventory management. He leads the design and implementation of tailored machine-learning solutions, addressing complex challenges within supply chains across diverse industries.\r\n\r\nDr. Babounikau is passionate about bridging the gap between data science and business strategy, ensuring machine learning models are aligned with business objectives to drive data-informed decision-making.", "public_name": "Illia Babounikau", "guid": "802f2df6-bbb8-5c01-917b-98e933e7513b", "url": "https://pretalx.com/pyconde-pydata-2025/speaker/YTUJ88/"}], "links": [], "feedback_url": "https://pretalx.com/pyconde-pydata-2025/talk/YLKDJK/feedback/", "origin_url": "https://pretalx.com/pyconde-pydata-2025/talk/YLKDJK/", "attachments": []}]}}]}}}