{"$schema": "https://c3voc.de/schedule/schema.json", "generator": {"name": "pretalx", "version": "2024.4.0.dev0"}, "schedule": {"url": "https://pretalx.com/pyconde-pydata-2024/schedule/", "version": "0.17", "base_url": "https://pretalx.com", "conference": {"acronym": "pyconde-pydata-2024", "title": "PyConDE & PyData Berlin 2024", "start": "2024-04-22", "end": "2024-04-24", "daysCount": 3, "timeslot_duration": "00:05", "time_zone_name": "Europe/Berlin", "colors": {"primary": "#E2552C"}, "rooms": [{"name": "Kuppelsaal", "guid": "903626c5-d978-588b-b2a9-b61bd7190aa6", "description": "Kuppelsaal", "capacity": 950}, {"name": "B09", "guid": "326022dd-45e0-5c04-8156-9115246de316", "description": "B09", "capacity": 240}, {"name": "B07-B08", "guid": "27445b1e-73d4-5390-be29-ff190c46d817", "description": "B07-B08", "capacity": 200}, {"name": "B05-B06", "guid": "d8bda6b5-29c4-5476-b372-33ecd48d425a", "description": "B05-B06", "capacity": 300}, {"name": "A1", "guid": "527b7a85-5042-59f0-89d4-49395f06066f", "description": "A1", "capacity": 80}, {"name": "A03-A04", "guid": "646e09f5-0ee2-5f1d-87b6-882de676d3b7", "description": "A03-A04", "capacity": 140}, {"name": "A05-A06", "guid": "5c605154-93f3-5c76-85e7-756915d00a2e", "description": "A05-A06", "capacity": 140}], "tracks": [{"name": "PyCon: MLOps & DevOps", "color": "#000000"}, {"name": "PyCon: Programming & Software Engineering", "color": "#000000"}, {"name": "PyCon: Python Language & Ecosystem", "color": "#000000"}, {"name": "PyCon: Security", "color": "#000000"}, {"name": "PyCon: Testing", "color": "#000000"}, {"name": "PyCon: Django & Web", "color": "#000000"}, {"name": "PyData: Data Handling & Engineering", "color": "#000000"}, {"name": "PyData: Machine Learning & Deep Learning & Stats", "color": "#000000"}, {"name": "PyData: Natural Language Processing & Computer Vision", "color": "#000000"}, {"name": "PyData: Generative AI", "color": "#000000"}, {"name": "PyData: PyData & Scientific Libraries Stack", "color": "#000000"}, {"name": "PyData: Visualisation & Jupyter", "color": "#000000"}, {"name": "General: Community, Diversity, Career, Life and everything else", "color": "#000000"}, {"name": "General: Ethics & Privacy", "color": "#000000"}, {"name": "General: Infrastructure - Hardware & Cloud", "color": "#000000"}, {"name": "General: Industry & Academia Use-Cases", "color": "#000000"}, {"name": "General: Others", "color": "#000000"}, {"name": "Sponsor", "color": "#000000"}, {"name": "Plenary", "color": "#000000"}, {"name": "Empty", "color": "#000000"}], "days": [{"index": 1, "date": "2024-04-22", "day_start": "2024-04-22T04:00:00+02:00", "day_end": "2024-04-23T03:59:00+02:00", "rooms": {"Kuppelsaal": [{"url": "https://pretalx.com/pyconde-pydata-2024/talk/TNHMGN/", "id": 44831, "guid": "42fcc49b-b883-531b-b4b7-03f523af7bd0", "date": "2024-04-22T10:15:00+02:00", "start": "10:15", "logo": null, "duration": "00:45", "room": "Kuppelsaal", "slug": "pyconde-pydata-2024-44831-keynote-a-view-from-my-window-an-outside-perspective-of-open-source-scientific-computing-from-the-inside", "title": "Keynote - A View From My Window - An Outside Perspective of Open Source Scientific Computing From the Inside", "subtitle": "", "track": "Plenary", "type": "Keynote", "language": "en", "abstract": "Twelve years as the Executive Director of NumFOCUS has given me a unique perspective of the open source scientific ecosystem. Building an organization to support project communities has taken me down many roads. Navigating these paths has been rewarding and challenging. We will look at lessons learned as I share my experiences through observations and insights on projects, community leadership, education, and fundraising.", "description": "Twelve years as the Executive Director of NumFOCUS has given me a unique perspective of the open source scientific ecosystem. Building an organization to support project communities has taken me down many roads. Navigating these paths has been rewarding and challenging. We will look at lessons learned as I share my experiences through observations and insights on projects, community leadership, education, and fundraising. \r\n\r\nNumFOCUS is a nonprofit organization that serves open source scientific computing projects and their communities. Our support programming includes fiscal sponsorship, affiliation services, development grants, educational and DEI initiatives, and collaborative opportunities in open source science. PyData is an educational program of NumFOCUS.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "dabb3d62-7324-518a-92de-f4c97bf53e98", "id": 40284, "code": "MXXZST", "public_name": "Leah Silen", "avatar": null, "biography": "As the Executive Director of NumFOCUS, Leah currently leads a network of support initiatives to sustain open source scientific computing. She has spent twelve years building a 501(c)(3) nonprofit organization positioned to serve projects critical to global research and technological innovation. Under her leadership, NumFOCUS has launched vital programming, including Fiscal Sponsorship, PyData Events and Meetups, Small Development Grants, Diversity in Scientific Computing, Project Contributor and Diversification Research, corporate sponsorship, governance advisement, and Project Sustainability Summits. She has overseen over $26 million in project-restricted donations and grants, which have gone towards maintenance, infrastructure, roadmaps, documentation, community management, legal services, DEI initiatives, developer and community events, and promotion of the open source projects in NumFOCUS\u2019s portfolio. (or under NumFOCUS\u2019s umbrella.\r\n\r\nBefore her tenure at NumFOCUS, Leah worked in the nonprofit and educational fields. Her experience spans roles as a public relations manager, educator, and program director with a focus on community development and fundraising. Leah has also served as a volunteer on committees and boards for several nonprofits.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/7EC3UY/", "id": 42949, "guid": "23dcc642-f37d-5df9-9923-6cd1351f1a30", "date": "2024-04-22T11:25:00+02:00", "start": "11:25", "logo": null, "duration": "00:45", "room": "Kuppelsaal", "slug": "pyconde-pydata-2024-42949-pycon-community-backstage-a-decade-of-camaraderie-growth-and-lessons-learned", "title": "PyCon Community Backstage: A Decade of Camaraderie, Growth, and Lessons Learned", "subtitle": "", "track": "General: Community, Diversity, Career, Life and everything else", "type": "Talk (long)", "language": "en", "abstract": "For the past decade, my journey as a dedicated community organizer has allowed me to immerse myself deeply in the Python community, experiencing its extraordinary growth firsthand. The transition of Python from being a top-10 contender to becoming the foremost programming language has been an exhilarating experience, propelled by a burgeoning community and its foray into fields such as data science and artificial intelligence. The inclusivity and camaraderie within the Python community have been pivotal, illustrating how collective effort and a nurturing culture are instrumental to its current standing.\r\n\r\nThis presentation is crafted to disseminate the pivotal lessons and best practices that have emerged from my decade-long engagement. During this period, I have played a key role in organizing over twenty Python/PyData conferences, including notable events like PyCon.DE, PyData Berlin, EuroPython, EuroSciPy, and PyData Global.\r\n\r\nIt is for anyone who wants to learn more about, contribute to and organize themselves in the Python and PyData community. \r\n\r\nThis talk will address:\r\n* How it works: community backstage\r\n* Why it works: community organizations\r\n* Lessons learned:\r\n  * community leadership & team dynamics\r\n  * balancing ideas and realities\r\n  * personal & professional growth\r\n* How to contribute as an individual, community or company\r\n* How organizations like the [PySV](https://pysv.org), [NumFOCUS](https://numfocus.org) or PioneersHub serve the community", "description": "Through organizing numerous community conferences, both small and large, I've gained invaluable insights into what makes a team and a community function effectively, and equally important, what doesn't.\r\n\r\nLeadership has been a key learning area for me. Through understanding my strengths and weaknesses, I have grown not just as a community leader but also in my professional career, enhancing how I work and lead.\r\n\r\nIn \"PyCon Backstage All Access,\" I will cover:\r\n1.  Organizational Experiences: The nuances of organizing conferences of various scales.\r\n2. Leadership Lessons: Insights into team dynamics - what works and what doesn't in building a great community team.\r\n3. Balancing Ideas and Realities: The driving factors behind enjoyable community conferences. How to listen to others. When to embrace complexity, and when to say no.\r\n4. Handling the Mundane: Strategies for dealing with administrative, tax, and legal aspects. How and where organisations can help.\r\n5. Future Outlook: Strategies for sustaining the European Python Community amidst growing challenges. This includes my reasons for rejoining the EuroPython board to help shape its future beyond being just a conference organizer.\r\n\r\nMy community service \"CV\":\r\n \r\n * 2013 local MongoDBB meetup\r\n * 2014 joined EuroPython\r\n * 2015-2020 core EuroPython organizer, 2 years board member\r\n * 2017-2018 PyCon DE organizer\r\n * 2018-today EuroSciPy organizer\r\n * 2018-today PyData S\u00fcdwest meetup organizer\r\n * 2019-2022 PyCon DE & PyData Berlin chair\r\n * 2019-today PyData Frankfurt meetup organizer\r\n * 2019-today Python Software Verband chair (German Python association)\r\n * 2023 PyCon DE & PyData Berlin organizer\r\n * 2023 EuroPython board member", "recording_license": "", "do_not_record": false, "persons": [{"guid": "e61ae96e-6f0d-5312-867d-6bf04eefb64f", "id": 228, "code": "8F38DV", "public_name": "Alexander CS Hendorf", "avatar": "https://pretalx.com/media/avatars/8F38DV_0aO0cup.jpg", "biography": "Alexander Hendorf is responsible for data and artificial intelligence at a boutique consultancy in Germany. He has many years of experience in the practical application, introduction and communication of data and AI-driven strategies and decision-making processes.\r\nThrough his commitment as a speaker and chair of various international conferences as PyConDE & PyData Berlin, he is a proven expert in the field of data intelligence. He's been appointed Python Software Foundation and EuroPython fellow for this various contributions. Currently he is sitting board member of Python Software Verband (Germany) and the EuroPython Society (EPS). Currently he's building Pioneers Hub - a new non-profit organisation to support tech-communities.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/CBVTEG/", "id": 40238, "guid": "5a327780-35f8-5775-a2ea-71c662945837", "date": "2024-04-22T12:15:00+02:00", "start": "12:15", "logo": null, "duration": "00:30", "room": "Kuppelsaal", "slug": "pyconde-pydata-2024-40238-streamlining-python-development-a-guide-to-a-modern-project-setup", "title": "Streamlining Python Development: A Guide to a Modern Project Setup", "subtitle": "", "track": "PyCon: Programming & Software Engineering", "type": "Talk", "language": "en", "abstract": "Designed for beginners, this presentation demystifies Python project management using [Hatch](https://hatch.pypa.io/) and delves into `pyproject.toml` for efficient configuration. We'll guide you through organizing directories, implementing unit testing for code reliability, and using [mypy](https://mypy-lang.org/) for type checking to enhance code quality. The session concludes with insights into [ruff](https://github.com/astral-sh/ruff), a modern linter for maintaining Python standards, which is replacing black, isort, flake8. This talk is a comprehensive toolkit for anyone eager to learn and apply the latest practices in Python development.", "description": "In the dynamic world of Python programming, an efficient project setup is key to success. 'Streamlining Python Development: A Guide to a Modern Project Setup' is a presentation tailored specifically for Python beginners, aiming to demystify the process of setting up a Python project with clarity and efficiency. In this session, we'll introduce Hatch, a cutting-edge tool that simplifies project management. We'll delve into the functionalities and benefits of using `pyproject.toml`, a cornerstone in modern Python development for its streamlined approach to project configuration.\r\n\r\nThe talk will also cover effective strategies for organizing your project's directory structure, ensuring a clean and manageable workspace. Understanding the importance of testing, we'll discuss unit testing techniques for enhancing code reliability. Additionally, the presentation will feature mypy for type checking, an essential practice for catching errors early and improving code quality. Finally, we'll explore the use of ruff, a modern linter, to keep your code clean and in line with Python standards.\r\n\r\nBy the end of this presentation, Python beginners will have gained a comprehensive understanding of the tools and methodologies necessary for a modern Python project setup, empowering them to create well-structured, high-quality Python applications.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "d8a2dd67-d397-54f5-88e9-b2c680fb4e5c", "id": 102, "code": "8LQU9C", "public_name": "Florian Wilhelm", "avatar": "https://pretalx.com/media/avatars/8LQU9C_vv210Xj.jpg", "biography": "Florian is Head of Data Science & Mathematical Modeling at inovex GmbH, an IT project center driven by innovation and quality, focusing its services on \u2018Digital Transformation\u2019.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/7LQEJ3/", "id": 42952, "guid": "f0b66d8d-2b57-52fb-a3eb-709c44d42531", "date": "2024-04-22T13:45:00+02:00", "start": "13:45", "logo": null, "duration": "00:45", "room": "Kuppelsaal", "slug": "pyconde-pydata-2024-42952-you-shall-not-pass-strengthen-your-python-code-against-attacks-", "title": "You shall not pass! \ud83e\uddd9 Strengthen your python code against attacks.", "subtitle": "", "track": "PyCon: Security", "type": "Talk (long)", "language": "en", "abstract": "Have you ever thought about IT Security when coding your Python application? If not, you are not alone \u2013 but also not safe.\r\n\r\nJust recently, a research study counted almost 4000 secrets published on PyPI. Most of the secrets such as AWS Keys, Google API Keys or database credentials were most likely leaked accidentally. Leaked credentials top the list of entry points for attackers into protected areas. In this talk you\u2019ll gain insights into how malicious attacks on Python applications are performed \u2013 and most importantly, how to protect yourself against them.\r\n\r\nWe\u2019ll kick off with a basic review of how to crack a password not only with brute force and continue with the most important IT Security principles. After understanding the importance of adhering to common security precautions, we will dive into Python coding hygiene. Where do the most common vulnerabilities lie? How can we strengthen the security of our code?\r\nWe\u2019ll cover secure coding practices such as code analysis, input validation and dependency vulnerabilities in theory and practice. Lastly, we will look at some case studies of common attacks on Python code and how to protect yourself against them.\r\n\r\nIf you have never thought about security aspects in Python, this talk is for you!", "description": "This talk will highlight the theoretical concepts on security. We\u2019ll start with a general overview and dive into specifics for Python applications. We will address five main questions:\r\n\r\n1. How can we retrieve a password with a Python function?\r\n2. What are the most essential IT Security practices?\r\n3. Where can we find information on current security vulnerabilities? \r\n4. What should we keep in mind to write secure Python code?\r\n5. What are some historical attacks on Python code? What can we learn from them?\r\n\r\nListeners will walk away with a general overview of how to approach security issues when building their Python application and make their future code more secure.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "17dcad64-003e-5c4a-bcba-107e5c7cfa92", "id": 25742, "code": "KFCSXT", "public_name": "Antonia Scherz", "avatar": "https://pretalx.com/media/avatars/KFCSXT_ybtn1sa.jpg", "biography": "Antonia Scherz is senior specialist for machine learning applications at PD - Berater der \u00f6ffentlichen Hand in Berlin. At PD she builds proof of concept tools and assists in software development for machine learning applications in public administration. She is passionate about making machine learning and open software tools widely and securely used by public administration and is fascinated by how new tools can be integrated into old structures for the public good.", "answers": []}, {"guid": "fb8858ca-9e9d-50b6-a4c7-3153ceb289ca", "id": 40485, "code": "UVH7GW", "public_name": "Roman Krafft", "avatar": "https://pretalx.com/media/avatars/UVH7GW_6KTl5l3.jpg", "biography": "Roman Krafft has been employed at PD - Consultant of the Public Sector GmbH since June 2021 and has worked there as a senior specialist since October 2023. He oversees projects in the strategic administrative modernization division with a focus on software development and machine learning.\r\n\r\nRoman Krafft studied computer science (Bachelor of Science degree) at the Technical University of Kaiserslautern from 2014 to 2018 and then studied computer science (Master of Science degree) at the same university from 2018 to 2021.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/PRH3QU/", "id": 41572, "guid": "fb746d47-0dd2-59fe-910e-d49566271e40", "date": "2024-04-22T14:35:00+02:00", "start": "14:35", "logo": null, "duration": "00:30", "room": "Kuppelsaal", "slug": "pyconde-pydata-2024-41572-better-safe-than-sorry-threat-modeling-for-python-developers", "title": "Better safe than sorry: Threat Modeling for Python Developers", "subtitle": "", "track": "PyCon: Security", "type": "Talk", "language": "en", "abstract": "Every developer wants to write good code. Good code, that also means security against attackers and their threats. But how secure is your code really?\r\nThe talk explains how you can use Threat Modeling to assess your application in a systematic approach against the threats that are relevant to your use cases and their attack surface.", "description": "In the ever-evolving landscape of cybersecurity, Python applications play a pivotal role in handling critical data and supporting essential business functions, making them prime targets for malicious actors. As the stakes continue to rise, developers want to prioritize the implementation of security measures to safeguard against potential threats. However, the definition of \"secure\" remains elusive and often subjective. This does not only cause insecurity of the application, but especially among the people that develop it.\r\nThis talk explains how to move from \"best effort security\" to a comprehensive and systematic approach to application security. It introduces the tried and tested method \u201cThreat Modeling\u201d and explains its value in a Python development project.\r\nPython developers will gain practical insights to identify, assess, and prioritize security risks systematically. Real-world examples illustrate the impact of effective threat modeling, empowering developers to proactively secure their applications against the threats that are really relevant for them.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "2211c8c4-18d2-5bc3-8719-afbfb0213fe8", "id": 38288, "code": "DS3TQU", "public_name": "Clemens H\u00fcbner", "avatar": "https://pretalx.com/media/avatars/DS3TQU_q2Okxq8.jpg", "biography": "For more than ten years, Clemens has been working at the interface between software and security. After roles as a software developer and in penetration testing, he joined inovex in 2018 as a software security engineer. Today, he supports development projects at the conception and implementation level, advises on DevSecOps and loves giving trainings and talks.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/TU9EUQ/", "id": 43027, "guid": "125f0d97-e9d7-5135-958b-177227b7d3e2", "date": "2024-04-22T15:35:00+02:00", "start": "15:35", "logo": null, "duration": "00:30", "room": "Kuppelsaal", "slug": "pyconde-pydata-2024-43027-how-to-embrace-your-leadership-role-as-a-data-nerd-or-other-creative-types-", "title": "How to embrace your Leadership role as a Data Nerd (or other creative types)", "subtitle": "", "track": "General: Community, Diversity, Career, Life and everything else", "type": "Talk", "language": "en", "abstract": "The transition from a hands-on creative job to a leadership role isn't always smooth. The tasks you excelled at are now handled by your team, and your new title brings added responsibilities, numerous meetings, leaving little room for deep work. So, how do we\u2014 the data people, the coaches, the coders\u2014thrive in management roles? In this talk, I'll share my journey into management and how I learned to embrace and find reward in my leadership role.", "description": "You've been working as a Data person/coder/designer/coach for a while and enjoy the creative task at hand. Investing your time in something meaningful that you're very good at brings you a deep sense of satisfaction, making your job truly enjoyable. As your career advances, you climb the ranks to become a senior professional and at some point, you find yourself taking on a management role.\r\n\r\nSuddenly, creative time is scarce, pressure is high, your schedule is full of meetings, and you are responsible for projects and a team. A great team, that too often you envy for getting to do the actual hands-on job. Sounds familiar? Or is this step something to better avoid?\r\n\r\nIn this talk, I'll discuss my not-so-smooth transition from a senior position to a leadership role. I'll share lessons learned in my last years as a Head and ultimately, I\u2019ll share my tips on how to not only survive but actually like and thrive in a management role.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "50a454c2-0f8e-5d15-9a7a-b91319a30558", "id": 15984, "code": "WVNMPG", "public_name": "Paula Gonzalez Avalos", "avatar": "https://pretalx.com/media/avatars/WVNMPG_opng10Z.png", "biography": "Scientist, Data Scientist, Coach. Currently Head of AI Academy at the appliedAI institute gGmbH. Paula loves learning, teaching & playing with data. She is also an active member of the PyData community,  a PyLadies organizer, and a diversity advocate.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/UBNVYW/", "id": 39507, "guid": "58e868aa-0950-5230-a327-68775640705a", "date": "2024-04-22T16:10:00+02:00", "start": "16:10", "logo": null, "duration": "00:45", "room": "Kuppelsaal", "slug": "pyconde-pydata-2024-39507-when-and-how-to-start-coding-with-kids", "title": "When and how to start coding with kids", "subtitle": "", "track": "General: Community, Diversity, Career, Life and everything else", "type": "Talk (long)", "language": "en", "abstract": "Our world is driven by technology and there are many reasons to teach our kids how to code. For example, coding allows them to develop logical reasoning skills and teaches attention to detail. Allowing children to discover how much fun coding can be supports them in their development and opens many doors for their future.\r\n\r\nBut when and how should we start coding with kids? This talk will approach the question from a scientific perspective, looking into how children's brains develop, how children learn and how to best teach them coding abilities. It will answer important questions like \"At what age can a child start coding?\" or \"What are the benefits of learning to code?\". It will also present possible starting points, like learning platforms or tutorials.", "description": "Being able to code is becoming a more valuable skill every day. Besides the obvious advantages of being able to code (e.g. better career opportunities), coding teaches important skills like logical reasoning, attention to detail and creativity. But what is the best time to start coding? Are kids even able to learn how to code? And at what age?\r\n\r\nIn this talk I would like to approach these questions from a scientific perspective, discussing the biological backgrounds and giving concrete advice on when and how to start coding with kids.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "9b697e10-d673-5739-8580-75f8612f8ff2", "id": 24562, "code": "9CX9CB", "public_name": "Anna-Lena Popkes", "avatar": "https://pretalx.com/media/avatars/9CX9CB_cJ2hJM2.jpg", "biography": "I'm Anna-Lena, a machine learning engineer living in Bonn, Germany. I'm very passionate about learning and love to share my knowledge with other people. Besides machine learning I love teaching Python and have been a regular guest on PyCon events and podcasts.", "answers": []}], "links": [], "attachments": [], "answers": []}], "B09": [{"url": "https://pretalx.com/pyconde-pydata-2024/talk/LMMM7D/", "id": 44953, "guid": "92d9b785-34ef-54a8-9a1e-e9cb2078eece", "date": "2024-04-22T11:25:00+02:00", "start": "11:25", "logo": null, "duration": "00:45", "room": "B09", "slug": "pyconde-pydata-2024-44953-better-search-relevance-using-learning-to-rank-at-mobile-de", "title": "Better search relevance using Learning to Rank at mobile.de", "subtitle": "", "track": "Sponsor", "type": "Sponsored Talk (long)", "language": "en", "abstract": "At mobile.de, we aim to provide a satisfactory search experience so users can find the vehicles quickly they are looking for. We make it happen using our machine learning systems working 24X7 in the backend which continuously learns changing user interests and optimize the search experience. Based on techniques like learning to rank using XGBoost, this talk will discuss our current search relevance ranking framework and how it ranks millions of searches daily.", "description": "At mobile.de, we continuously strive to provide our users with a better, faster and a unique search experience. Machine learning and Python plays a key role in providing this experience.\r\n\r\nEvery day, millions of people visit mobile.de to find their dream car. The user journey typically starts by entering a search query and later refining it based on their requirements. If the user finds a relevant listing, they contact the seller to purchase the vehicle. Our search engine is responsible for matching users with the right sellers.\r\n\r\nIn this talk, I will talk about:\r\n- Introduction\r\n- Why search is important  \r\n- How learning to rank helps ? \r\n- Current challenges with our ranking models\r\n- Proposed solution\r\n- How we deploy our ranking models ? (Under strict latency SLA <30ms) \r\n- AB Test results\r\n- Key Learnings\r\n- How can we improve further", "recording_license": "", "do_not_record": false, "persons": [{"guid": "89d28502-3e6f-5264-9c31-f48c079a8560", "id": 37484, "code": "JYLX3E", "public_name": "Manish Saraswat", "avatar": "https://pretalx.com/media/avatars/JYLX3E_FqgEWsV.jpeg", "biography": "Manish is currently working as a Senior Data Scientist with a strong focus on building, deploying and serving models. With over nine years working on machine learning problems, he really enjoys building data products around improving search, ranking and recommendations. Outside work, he likes to do outdoor activities like running, swimming etc.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/GLXJPC/", "id": 45527, "guid": "6dccfe33-d747-57cf-a660-2e1a9876c163", "date": "2024-04-22T12:15:00+02:00", "start": "12:15", "logo": null, "duration": "00:30", "room": "B09", "slug": "pyconde-pydata-2024-45527-haystack-2-0-the-story-of-a-rewrite", "title": "Haystack 2.0: the story of a rewrite", "subtitle": "", "track": "Sponsor", "type": "Sponsored Talk", "language": "en", "abstract": "To rewrite or not to rewrite: it's a major question.\r\n\r\nReleasing new\u00a0software versions with breaking changes can be disruptive to a community, but sometimes they are necessary in the long run to move forward.\u00a0\r\n\r\nHaystack is a free open source Python LLM framework. It was launched in 2020, before LLMs were cool. In 2023 we decided to undergo a major re-architecture, culminating in the GA release of Haystack 2.0. It wasn't an easy decision. By involving the open source community and some big companies in our design process early on, we are confident we built a more usable, flexible foundation for years to come.\r\n\r\nIn this talk I'll tell you the story of this rewrite. The decisions we made to bring the project forward with the right level of flexibility / composability in the rapidly changing LLM landscape. I won't only show you the new features 2.0 provides, but give you a peek into our future roadmap. You'll walk away with a better understanding of how modern LLM frameworks can help you solve problems for yourself and your users, as well as an enriched understanding of how to think for the long-term when building for an open source community.\r\n\r\nYou\u2019ll see how the strength of Haystack modularity and ease of use makes it stand out from other libraries. Demos will make it much clear and give you some great ideas on how to integrate Haystack in your projects.", "description": "To rewrite or not to rewrite: it's a major question.\r\n\r\nReleasing new\u00a0software versions with breaking changes can be disruptive to a community, but sometimes they are necessary in the long run to move forward.\u00a0\r\n\r\nHaystack is a free open source Python LLM framework. It was launched in 2020, before LLMs were cool. In 2023 we decided to undergo a major re-architecture, culminating in the GA release of Haystack 2.0. It wasn't an easy decision. By involving the open source community and some big companies in our design process early on, we are confident we built a more usable, flexible foundation for years to come.\r\n\r\nIn this talk I'll tell you the story of this rewrite. The decisions we made to bring the project forward with the right level of flexibility / composability in the rapidly changing LLM landscape. I won't only show you the new features 2.0 provides, but give you a peek into our future roadmap. You'll walk away with a better understanding of how modern LLM frameworks can help you solve problems for yourself and your users, as well as an enriched understanding of how to think for the long-term when building for an open source community.\r\n\r\nYou\u2019ll see how the strength of Haystack modularity and ease of use makes it stand out from other libraries. Demos will make it much clear and give you some great ideas on how to integrate Haystack in your projects.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "13e7836a-4ad5-5a08-b772-1b79b3e02ab0", "id": 41156, "code": "YRBLMH", "public_name": "Silvano Cerza", "avatar": "https://pretalx.com/media/avatars/YRBLMH_SgfAjRV.jpg", "biography": "Generalist software engineer that worked in tons of different languages. Expert in Python and C++. Worked for companies in different fields like Arduino and Pitch. Currently at deepset playing with AI.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/GVTJW8/", "id": 41326, "guid": "b0c184d9-9b05-597c-8318-ab28355a6854", "date": "2024-04-22T13:45:00+02:00", "start": "13:45", "logo": null, "duration": "00:30", "room": "B09", "slug": "pyconde-pydata-2024-41326-from-idea-to-production-in-a-day-leveraging-azure-ml-and-streamlit-to-build-and-user-test-machine-learning-ideas-quickly", "title": "From idea to production in a day: Leveraging Azure ML and Streamlit to build and user test machine learning ideas quickly", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Stats", "type": "Talk", "language": "en", "abstract": "Getting a machine learning solution in front of users usually takes some time. The data science tech stack is full of time traps and infrastructure issues might slow down deployment. The Azure Machine Learning platform, automated machine learning, and Streamlit are predestined tools for circumventing common development and deployment issues \u2013 if you know how to use them. Based on our learnings in corporate hackathons, we will use the stack to rapidly prototype a computer vision application users can interact with. You will walk away with Python code snippets and inspiration to build and user test your own machine learning ideas quickly.", "description": "Experimentation, bringing machine learning ideas in front of users, is essential to innovation. Yet, in our corporate hackathons, our data science team has struggled many times with how to build and deploy user-facing machine learning ideas in just a single day. \r\n\r\nOver the past 2+ years, we have developed a routine around using Azure Machine Learning, automated machine learning, and Streamlit to build and user test machine learning ideas quickly. The aim of this talk is to pass on practical, technical knowledge to fellow data scientists about how to leverage this stack to achieve high build and user test speeds. \r\n\r\nDuring the talk, we will walk through the process of building a computer vision system for identifying trash in images via an app using the open-source TACO dataset (http://tacodataset.org/). Working through a Jupyter notebook, we will load the data into Azure Machine Learning and trigger an automated machine learning run on the data. In this context, we will quickly get to know the training and testing metrics available in Azure ML to evaluate the model. We will then download the machine learning model as a file packaged in the open-source ONNX format (https://onnx.ai/). Using the open-source Python web application framework Streamlit (https://github.com/streamlit/streamlit), we will program an application in which users can upload images and embed the machine learning model in it to identify trash in these images. Using a to-be-published infrastructure-as-code pipeline on Azure DevOps, we will deploy the application to the public internet on the Azure platform. From here, users can test it.\r\n\r\nThe stack and code presented in this talk will enable fellow data scientists to accelerate their data science development, leading to quicker experimentation and, therefore, to faster innovation of products with machine learning at their core.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "3c2fd9ab-976c-582b-bd9c-e57e2142fd2a", "id": 38167, "code": "MCVV8H", "public_name": "Florian Roscheck", "avatar": "https://pretalx.com/media/avatars/MCVV8H_JK9n88X.jpg", "biography": "Florian is a Sr. Data Scientist at Henkel where he develops machine learning solutions for R&D and production use cases across the company's adhesive and consumer good portfolios. He is also known as online instructor for the open-source data engineering framework Apache Spark. Florian volunteers his time as the current Vice President of the Affiliated Project Selection Committee at NumFOCUS, helping scientific open-source projects grow.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/SQUNWS/", "id": 43007, "guid": "2733ce1c-e93d-549d-95d6-fe34aa237fe1", "date": "2024-04-22T14:35:00+02:00", "start": "14:35", "logo": null, "duration": "00:30", "room": "B09", "slug": "pyconde-pydata-2024-43007-going-beyond-parquet-s-default-settings-be-surprised-what-you-can-get", "title": "Going beyond Parquet's default settings \u2013 be surprised what you can get", "subtitle": "", "track": "PyData: Data Handling & Engineering", "type": "Talk", "language": "en", "abstract": "Apache Parquet has become the de facto format for storing tabular (DataFrame) data on disk. This is done through universal compression and efficient knowledge of the stored data structure. As part of this talk, we would like to show the core structure of Parquet and the knobs that allow you to get even more of the capabilities of the file format.", "description": "In the last decade, Apache Parquet has become the standard format to store tabular data on disk regardless of the technology stack used. This is due to its read/write performance, efficient compression technology, interoperability and especially outstanding performance with the default settings. \r\n\r\nWhile these default settings and access patterns already provide decent performance, by understanding the format in more detail and using recent developments, one can get much better performance,  smaller files, and utilise Parquet's newer partial reading features to read even smaller subsets of a file for a given query.\r\n\r\nThis talk aims to provide insight into the Parquet format and its recent development that are useful for end users' daily workflows. One only needs prior knowledge to know what a DataFrame/tabular data is.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "afec7d82-fdf6-51db-814d-87663157151a", "id": 15975, "code": "TH7EPJ", "public_name": "Uwe L. Korn", "avatar": "https://pretalx.com/media/avatars/TH7EPJ_Gmjlo4J.jpg", "biography": "Uwe Korn is a CTO at the data science company QuantCo. His expertise is in building scalable architectures for machine learning services and the teams & culture around them. Nowadays, he focuses on the data engineering infrastructure that is needed to provide the building blocks to bring machine learning models into production. As part of his work to provide an efficient data interchange, he became a core committer to the Apache Parquet, Apache Arrow and conda-forge projects.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/XNY3HX/", "id": 44950, "guid": "e093800b-a2e5-5d98-96d4-264118c542c8", "date": "2024-04-22T15:35:00+02:00", "start": "15:35", "logo": null, "duration": "00:30", "room": "B09", "slug": "pyconde-pydata-2024-44950-bridging-the-gap-from-analytical-models-to-operational-success", "title": "Bridging the Gap: From Analytical Models to Operational Success", "subtitle": "", "track": "Sponsor", "type": "Sponsored Talk", "language": "en", "abstract": "Deploying machine learning models in production carries its own unique set of challenges. Some challenges stem from different, and sometimes conflicting, objectives between analytics and production. Others arise from technological limitations, business requirements, and even regulatory needs.\r\n\r\nIn this talk, we will focus on the part of the problem surrounding the handover of models from analytics to production.  We expect data scientists, operation specialists, and product owners to benefit from our stories.", "description": "Deploying machine learning models in production carries its own unique set of challenges. Some challenges stem from different, and sometimes conflicting, objectives between analytics and production. Others arise from technological limitations, business requirements, and even regulatory needs.\r\n\r\nIn this talk, we will focus on the part of the problem surrounding the handover of models from analytics to production. This process has multiple facets, with tasks executed at different points in time and with different degrees of automation possible. To name a few: model packaging, inference reproducibility, establishing what needs to be deployed, and deployment-related actions. \r\n\r\nWe'll share some of our experiences and strategies to tackle these challenges. For example, how we tackle the topic of contracts, interfaces, and responsibilities between modeling and production. Or how the role of automation in the pre-deployment process ensures a smooth and efficient model transition from an analytics model store to something ready for production once a model is approved.\r\n\r\nWhether you are a data scientist developing models, an operations specialist tasked with deploying them, or a product/project owner supervising the process, we aim to ignite engaging and fruitful discussions. For data scientists, to have a window into what happens after they are done with training a model. For operations specialists, to gain some strategies to improve their experience and success rate. And for a product owner, to get a framework on how to drive alignment.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "f679d352-7d85-5873-abb5-ae49e3615e11", "id": 41615, "code": "UZCZJC", "public_name": "Ignacio Vergara", "avatar": null, "biography": null, "answers": []}, {"guid": "4dc0e52d-53ee-502f-97f7-683926dea611", "id": 41618, "code": "BPSJZK", "public_name": "Nick Harmening", "avatar": "https://pretalx.com/media/avatars/BPSJZK_Tka0IcU.JPG", "biography": "Software Engineer at QuantCo, former Cloud Architect and DevOps team lead at BMW Group. I love building and running systems as well as fostering a collaborative working culture.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/YYKJMP/", "id": 48144, "guid": "990b29a9-cb7f-5047-b3d1-adc2f4314fa3", "date": "2024-04-22T16:10:00+02:00", "start": "16:10", "logo": null, "duration": "00:30", "room": "B09", "slug": "pyconde-pydata-2024-48144-documenting-r-d-progress-using-jupyter-book-and-feel-safe-for-the-next-performance-audit", "title": "Documenting R&D Progress using jupyter-book - and feel safe for the next performance audit", "subtitle": "", "track": "Sponsor", "type": "Sponsored Talk", "language": "en", "abstract": "Rosenxt has only just been founded, and yet we are already very busy researching great things and making them usable. The ideas are bubbling, the motivation is high. The urge to try out the next idea quickly is high. But progress needs to be well documented, as the next performance audit is sure to come.", "description": "Rosenxt has been founded to offer experience and excellence gathered in the last decades for the most challenging environments in the future, such as subsea, industrial, renewables, or the integrity of water and energy supply.\r\n\r\nHighly motivated, we can hardly wait to try out the next idea to make rapid progress. But we are also aware of the rules of business. At the end there is always the performance audit. This is where you have to prove that you can really deliver what you have promised. And to do this, you better have everything well documented.\r\n\r\nAt our venture we have chosen a jupyter-book based workflow. Here come the Jupyter Notebook based steps for data analysis we're using anyways along with some simple markdown based documents embracing everything. Using a clever file system structure and a few tools, we create appealing documents that document the development progress very well.\r\n\r\nIn this talk, I would like to present this workflow in more detail using the tests with a specific water pressure sensor that we are currently evaluating.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "df23dc13-20d6-55c5-a620-5d4717e311aa", "id": 25799, "code": "DUSVE9", "public_name": "Jens Nie", "avatar": "https://pretalx.com/media/avatars/DUSVE9_x6pgG1p.jpg", "biography": "A physicist who has filled a variety of roles in a leading service company in the oil and gas industry, currently tackling the development of embedded devices at Rosenxt based on the Raspberry Pi, LinuX and Python with a Python history going back to version 1.4.", "answers": []}], "links": [], "attachments": [], "answers": []}], "B07-B08": [{"url": "https://pretalx.com/pyconde-pydata-2024/talk/RBNJRK/", "id": 41601, "guid": "afecadc0-190e-5667-9874-d75bd336cda2", "date": "2024-04-22T11:25:00+02:00", "start": "11:25", "logo": null, "duration": "00:45", "room": "B07-B08", "slug": "pyconde-pydata-2024-41601-select-ml-from-databases", "title": "Select ML from Databases", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Stats", "type": "Talk (long)", "language": "en", "abstract": "This talk introduces a new workflow for building your machine learning models using the capabilities of modern databases that support machine learning use cases natively. There is an overview of how machine learning models are being created today to how they could look in the near future by utilising the features provided by current databases.", "description": "Developing machine learning models involves the use of data to identify patterns that would help solve business problems. Over the years as the scale of data increased, data started to get stored in databases. The model-building workflows would typically fetch the data from the databases, perform some transformations to create features, and use them to train the models. In some cases, these features would get stored in databases known as feature stores for reuse. To infer the model output in real-time, typically, there would be a small service or an API endpoint that would be deployed to get the results to the consumers.\r\n\r\nAs these use cases became more common, modern databases started incorporating features that aid in building machine learning models. This talk covers some of the features provided by some of the databases like including common models like linear regression, image classification, text processing, support for functions with custom models, etc. Apart from these features, many of them also make it easy to deploy the model without needing an external service for the inference. Instead, they provide native interfaces for inference like querying in SQL like languages.\r\n\r\nThis talk includes an example of how to build your custom model in Python and then include it inside your Couchbase database making inference a matter of using database queries. The example would help to understand some of the capabilities of modern databases in building machine learning model", "recording_license": "", "do_not_record": false, "persons": [{"guid": "f574cf10-ee7b-566b-9aa9-625a75722b3a", "id": 38299, "code": "XFYQFE", "public_name": "Gregor Bauer", "avatar": "https://pretalx.com/media/avatars/XFYQFE_0SOrHSY.png", "biography": "Gregor Bauer from Couchbase is driven by understanding customer needs and delivering suitable solutions. With a telecommunications background, he has led technical teams in delivering customized device management and IoT solutions globally. As Manager Solutions Engineering CEUR at Couchbase, he specializes in application modernization, multi-cloud strategies, and sustainable high user experience, with a particular focus on edge computing.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/WNHAG8/", "id": 41793, "guid": "e9634f11-4769-5010-89e8-7ecf29aeac23", "date": "2024-04-22T12:15:00+02:00", "start": "12:15", "logo": null, "duration": "00:30", "room": "B07-B08", "slug": "pyconde-pydata-2024-41793-data-valuation-for-machine-learning", "title": "Data valuation for machine learning", "subtitle": "", "track": "PyData: Data Handling & Engineering", "type": "Talk", "language": "en", "abstract": "Data valuation techniques compute the contribution of training points to the final performance of machine learning models. They are part of so-called data-centric ML, with immediate applications in data engineering like data pruning or improved collection processes, and in model debugging and development. In this talk we demonstrate how the open source library [pyDVL](https://pydvl.org) can be used to detect mislabeled and out-of-distribution samples with little effort. We cover the core ideas behind the most successful algorithms and illustrate how they can be used to inspect your data to extract the most out of it.", "description": "The core idea of so-called data-centric machine learning is that any effort spent on improving the quality of the data used to train a model is probably better spent than on improving the model itself. This tested rule of thumb is particularly relevant for applications where data is scarce, expensive to acquire or difficult to annotate.\r\n\r\nConcepts of the usefulness of a datum or its influence on the outcome of a prediction have a long history in statistics and ML, in particular through the notion of the influence function. However, it has only been recently that rigorous and practical notions of value for data, and in particular data-sets, have appeared in the ML literature. The core idea is to look at data points known to be \u201cuseful\u201d in some sense \u2014 for instance in that they substantially contribute to the final performance of a model \u2014 and focus acquisition or labelling efforts around similar ones, while eliminating or \u201ccleaning\u201d the less useful ones.\r\n\r\nIn a nutshell, data valuation for machine learning is the task of assigning a scalar to each element of a training set which reflects its contribution to the final performance of some model trained on it. This can be used to repair or prune corrupt or superfluous data, or for data collection, like active learning strategies when labelling is expensive.\r\n\r\nWhile many exact methods have exponential time complexity in the size of the training set, recent advances provide either good approximation strategies or introduce alternative approaches which are starting to make this field relevant in practice. In this context, [pyDVL](https://pydvl.org) is an LGPL library aiming to provide robust, parallel implementations of every relevant method for simple usage in applications and research. In this talk we showcase how it can be used to detect issues in data pipelines and to improve final performance. pyDVL is still in early stages of development but already provides over a dozen algorithms, runs in parallel using ray and supports sklearn-compatible interfaces and large pytorch models with out-of-core computation thanks to dask.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "eb89c0a5-5ce6-5a26-a47d-efd5249f7723", "id": 38367, "code": "SNTWWE", "public_name": "Miguel de Benito Delgado", "avatar": "https://pretalx.com/media/avatars/SNTWWE_OmYbkDO.jpg", "biography": "After several years working as a software developer, Miguel pursued studies in pure mathematics in Madrid and Munich. After finishing his PhD in mathematics, and a short research stay in machine learning, he finally transitioned into the field and ended up working as an applied researcher at the appliedAI Initiative, where he went on to found and head the TransferLab.", "answers": []}, {"guid": "daf89359-1bbb-5cd5-bf09-ae28e07b0c65", "id": 43929, "code": "8ZHNDP", "public_name": "Kristof Schr\u00f6der", "avatar": "https://pretalx.com/media/avatars/8ZHNDP_nlZRzNe.jpg", "biography": "After completing his PhD in applied mathematics, specializing in applied \r\nharmonic and numerical analysis, Kristof developed a keen interest in the\r\nrapidly evolving field of artificial intelligence. This interest inspired \r\nhim to transition his career towards AI engineering, \r\nwhere he spent the next five years working on various machine learning \r\nprojects. In May 2023, he joined the TransferLab team at appliedAI Institute.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/YWUZW9/", "id": 40780, "guid": "6af08b81-d0e5-563d-abb8-37770023a4f6", "date": "2024-04-22T13:45:00+02:00", "start": "13:45", "logo": null, "duration": "00:45", "room": "B07-B08", "slug": "pyconde-pydata-2024-40780-a-conceptual-and-practical-introduction-to-hilbert-space-gaussian-process-hsgp-approximation-methods", "title": "A conceptual and practical introduction to Hilbert Space Gaussian Process (HSGP) approximation methods", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Stats", "type": "Talk (long)", "language": "en", "abstract": "In this talk, we explore a new method to approximate Gaussian processes using spectral analysis methods, known as the Hilbert Space Gaussian process (HSGP) approximation. This technique allows us to use and fit Gaussian processes at scale for concrete applications. We provide a basic introduction to the ideas behind the method and make them tangible by implementing them ourselves using Numpyro. We then present two concrete examples in practice using both Numpyro and PyMC. Namely time-varying coefficient regression and time series forecasting.", "description": "In this talk, we explore a new method to approximate Gaussian processes using spectral analysis methods, known as the Hilbert Space Gaussian process (HSGP) approximation. This technique allows us to use and fit Gaussian processes at scale for concrete applications. We provide a basic introduction to the ideas behind the method and make them tangible by implementing them ourselves using Numpyro. We then present two concrete examples in practice using both Numpyro and PyMC. Namely time-varying coefficient regression and time series forecasting.\r\n\r\n**Idea about the approximation idea:** The core of this method relies on the Laplacian's spectral decomposition to approximate kernels' spectral measures as a function of basis functions. The key observation is that the basis functions in the reduced-rank approximation do not depend on the hyperparameters of the covariance function for the Gaussian process. This allows us to speed up the computations tremendously.\r\n\r\n**References**\r\n- Hilbert space methods for reduced-rank Gaussian process regression (https://link.springer.com/article/10.1007/s11222-019-09886-w)\r\n- Practical Hilbert space approximate Bayesian Gaussian processes for probabilistic programming (https://link.springer.com/article/10.1007/s11222-022-10167-2 )\r\n- Example: Hilbert space approximation for Gaussian processes (https://num.pyro.ai/en/stable/examples/hsgp.html)\r\n- PyMCon Web Series - Introduction to Hilbert Space GPs in PyMC - Bill Engels (https://www.youtube.com/watch?v=ri5sJAdcYHk )", "recording_license": "", "do_not_record": false, "persons": [{"guid": "bce64d59-705c-50b3-a6de-415e9b8a5ad1", "id": 1504, "code": "ADJDMC", "public_name": "Dr. Juan Orduz", "avatar": "https://pretalx.com/media/avatars/ADJDMC_5JOJaBp.jpg", "biography": "Juan is a Mathematician (Ph.D. Humboldt Universit\u00e4t zu Berlin) and data scientist. He is interested in interdisciplinary applications of mathematical methods. In particular, time series analysis, bayesian methods, and causal inference.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/83ZGV3/", "id": 41728, "guid": "8f28097b-51e0-5d0b-a0b5-18a7f2546963", "date": "2024-04-22T14:35:00+02:00", "start": "14:35", "logo": null, "duration": "00:30", "room": "B07-B08", "slug": "pyconde-pydata-2024-41728-next-stop-insights-how-streamlit-and-snowflake-power-up-data-stories", "title": "Next Stop: Insights! How Streamlit and Snowflake Power Up Data Stories", "subtitle": "", "track": "PyData: Data Handling & Engineering", "type": "Talk", "language": "en", "abstract": "Data stories transform complex data insights into clear, actionable and context rich narratives to drive business value. The presentation of data stories to different audiences in a visually compelling manner while keeping track of data changes is a challenging task. A possible solution is to implement appealing and interactive data applications, for which Streamlit is an established open-source solution. In combination with Snowflake, it enables an efficient and straightforward approach to build engaging data applications that utilize data directly from a data platform.\r\n\r\nIn this talk, we will explore a proof-of-concept, tracing the conception of a data story to the implementation of a Streamlit app in Snowflake by using open source datasets from Deutsche Bahn. So, hold onto your seats \u2013 it is time to explore the world of data apps with Snowflake and Streamlit.", "description": "Streamlit is an open-source Python package designed to simplify the creation of data applications featuring interactive data dashboards. Since September 2023, Streamlit has been integrated into Snowflake offering several benefits, including the ability for developers to securely build, deploy, and share Streamlit apps within Snowflake's data cloud making use of the scale, performance and security of the Snowflake platform.\r\n\r\nThis talk provides an introduction to Streamlit and showcases its integration into Snowflake. After this talk you will gain: \r\n\r\n- an introduction of how Streamlit can be used within Snowflake\r\n- practical insights into the creation of a data story based on a Deutsche Bahn open-source dataset on Wi-Fi connectivity in trains\r\n- comprehensive understanding of implementing a Streamlit app in Snowflake,  illustrated through the developed data story \r\n- main takeaways and key insights working with Streamlit in Snowflake\r\n\r\nThis talk is addressed to data enthusiasts who are \r\n- often faced with the challenge of presenting profound data insights to diverse audiences \r\n- interested in a tool that effortlessly constructs appealing data applications \r\n- curious about a a direct link between Streamlit and Snowflake", "recording_license": "", "do_not_record": false, "persons": [{"guid": "44e4bbe3-878d-5732-8734-aa3fc5c278f5", "id": 38342, "code": "QX8PFP", "public_name": "Marie-Kristin Wirsching", "avatar": "https://pretalx.com/media/avatars/QX8PFP_AsflUD9.jpg", "biography": "I am a data scientist working at inovex GmbH, supporting our clients in their data-driven projects across the entire machine learning life cycle. My passion lies in everything related to AI, NLP, and Computer Vision, and I am always eager to dive into real-world data to uncover valuable insights.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/NYHFSB/", "id": 41644, "guid": "111a6b38-28f9-5e22-acbf-4b6e51922b49", "date": "2024-04-22T15:35:00+02:00", "start": "15:35", "logo": null, "duration": "00:30", "room": "B07-B08", "slug": "pyconde-pydata-2024-41644-machine-learning-on-microcontrollers-using-micropython-and-emlearn", "title": "Machine Learning on microcontrollers using MicroPython and emlearn", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Stats", "type": "Talk", "language": "en", "abstract": "This presentation will show you how to deploy machine learning models\r\nto affordable microcontroller-based systems - using the Python that you already know.\r\nCombined with sensors, such as microphone, accelerometer or camera,\r\nthis makes it possible to create devices that can automatically analyze and react to physical phenomena.\r\nThis enables a wide range of useful and fun applications, and is often referred to as \"TinyML\".\r\n\r\nThe presentation will cover key concepts and explain the different steps of the process.\r\nWe will train the machine learning models using standard scikit-learn and Keras,\r\nand then execute them on device using the emlearn library.\r\nTo run Python code on the microcontroller, MicroPython will be used.\r\nWe will demonstrate some practical use-cases using different sensors, such as\r\nSound Event Detection (microphone), Image Classification (camera), and Human Activity Recognition (accelerometer).", "description": "Modern Machine Learning makes it possible to automatically extract valuable information from sensor data.\r\nWhile Machine Learning is often associated with costly, compute-intensive systems,\r\nit is becoming feasible to deploy ML systems to very small embedded devices and sensors.\r\nThese devices typically use low-power, microcontrollers that cost as little as 1 USD.\r\nThis niche is often referred to as \"TinyML\", and is enabling a range of new applications\r\nin scientific applications, industry and consumer electronics.\r\n\r\nWhile microcontrollers are getting more powerful year by year,\r\nit is still important to fit within the limited RAM, program size and CPU time available.\r\nemlearn is an open-source Python library that allows converting scikit-learn and Keras models to efficient C code.\r\nThis makes it easy to deploy models to any microcontroller with a C99 compiler,\r\nwhile keeping Python-based workflow that is familiar to Machine Learning Engineers.\r\nVia emlearn-micropython it also supports MicroPython, a Python implementation designed for microcontrollers.\r\nMicroPython runs on practically all microcontrollers with 16kB+ RAM,\r\nand this makes it possible to write an entire application for microcontrollers using Python.\r\nThe emlearn-micropython packages provided as a set of MicroPython modules\r\nthat can be installed onto a device, without having to recompile any C code.\r\nThis preserves the ease-of-use that Python developers are used to on a desktop system.\r\nCompared to pure-Python approaches, the emlearn-micropython models are typically 10-100x faster and smaller.\r\n\r\nThe models in emlearn support the core Machine Learning tasks types: classification, regression and anomaly detection.\r\nAdditionally there are also tools for data preprocessing, feature engineering and estimation of compute requirements. \r\nSince the start in 2019, emlearn has been used in a wide range of applications,\r\nfrom detection of vechicles in acoustic sensor nodes,\r\nto hand gesture recognition based on sEMG data,\r\nto real-time malware detection in Android devices.\r\n\r\nWhile emlearn and MicroPython can target a very wide range of hardware,\r\nwe will focus on the Espressif ESP32 family of devices.\r\nThese are very powerful and affordable, with good WiFi+BLE connectivity support,\r\ngpod open-source toolchains, very popular both among hobbyist and companies,\r\nand have many good ready-to-use hardware development kits.\r\n\r\nThe audience is expected to have a basic literacy in Python and proficiency in programming,\r\nand familiarity with core Machine Learning concepts such as\r\nsupervised/unsupervised learning, classification/regression, et.c.\r\nFamiliarity with microcontrollers and embedded systems is of course an advantage,\r\nbut the talk should be approachable to those who are new to this area.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "714a64bf-3ab3-5c35-808f-c554be1d2d97", "id": 16, "code": "CVFFNV", "public_name": "Jon Nordby", "avatar": "https://pretalx.com/media/avatars/CVFFNV_tmJFXNG.jpg", "biography": "Jon is a Machine Learning Engineer specialized in IoT systems.\r\nHe has a Master in Data Science and a Bachelor in Electronics Engineering,\r\nand has published several papers on applied Machine Learning,\r\nincluding topics like TinyML, Wireless Sensor Systems and Audio Classification.\r\n\r\nThese days, Jon is co-founder and Head of Data Science at Soundsensing,\r\na leading provider of condition monitoring solutions for commercial buildings and HVAC systems.\r\nHe is also the creator and maintainer of emlearn,\r\nan open-source inference engine for microcontrollers and embedded systems.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/BFF9VA/", "id": 41993, "guid": "0236475d-0e2e-5b64-b731-ae0ef01c3598", "date": "2024-04-22T16:10:00+02:00", "start": "16:10", "logo": null, "duration": "00:45", "room": "B07-B08", "slug": "pyconde-pydata-2024-41993-your-model-probably-memorized-the-training-data", "title": "Your Model _Probably_ Memorized the Training Data", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Stats", "type": "Talk (long)", "language": "en", "abstract": "I know you probably don't want to hear about it, but your deep learning model probably memorized some of its training data. In this talk, we'll review active research on deep learning and memorization, particularly for large models such as large language and multi-modal models.\r\n\r\nWe'll also explore potential ways to think through when this memorization is actually desired (and why) as well as threat vectors and legal risk of using models who have memorized training data. We'll also look at potential privacy protections which could address some of the issues and how to embrace memorization by thinking through different types of models and their use.", "description": "In this talk, I will cover:\r\n\r\n- Proven mathematical research as to why deep learning models memorize information\r\n- A series of successful attacks against deep learning models and GPT-models to extract memorized information\r\n- The legal and social impact of memorization and using memorized data\r\n- Differential privacy as one potential solution (but also its pitfalls when used to train large models)\r\n- Federated and/or local- or community-trained models as an alternative\r\n- The need for distillation that also attempts to reduce memorization", "recording_license": "", "do_not_record": false, "persons": [{"guid": "ffd0574e-11b0-52d1-a847-6a92c5e1ec5e", "id": 233, "code": "K9B9W9", "public_name": "Katharine Jarmul", "avatar": "https://pretalx.com/media/avatars/K9B9W9_gUIiN9l.jpg", "biography": "Katharine Jarmul is a privacy activist and data scientist whose work and research focuses on privacy and security in data science workflows. She works as a Principal Data Scientist at Thoughtworks and author of Practical Data Privacy. She is a passionate and internationally recognized data scientist, programmer, and lecturer.", "answers": []}], "links": [], "attachments": [], "answers": []}], "B05-B06": [{"url": "https://pretalx.com/pyconde-pydata-2024/talk/XMKREA/", "id": 41787, "guid": "de447070-cb26-5f74-a570-e55d11e5c235", "date": "2024-04-22T11:25:00+02:00", "start": "11:25", "logo": null, "duration": "00:45", "room": "B05-B06", "slug": "pyconde-pydata-2024-41787-rag-for-a-medical-company-the-technical-and-product-challenges", "title": "RAG for a medical company: the technical and product challenges", "subtitle": "", "track": "PyData: Generative AI", "type": "Talk (long)", "language": "en", "abstract": "[RAG (Retrieval Augmented Generation)](https://www.pinecone.io/learn/retrieval-augmented-generation/) is the process of querying a (large) set of documents with natural language, leveraging vector search and llms. While it has recently become widely accessible to develop a Proof-Of-Concept RAG using OpenAI and one of the various open-source contributions (e.g. langchain), building a **performant** RAG that **brings value to users** is challenging.\r\nThis talk will focus on learnings from building a RAG for a **medical company**, to allow doctors to query drug documentation with natural language, using tools like **[Chainlit](https://docs.chainlit.io/get-started/overview), [Qdrant](https://qdrant.tech/) and [Langsmith](https://www.langchain.com/langsmith)**.\r\nNaturally, a product question emerged: how to effectively leverage LLMs that **can never guarantee 100% accuracy** in the health sector?\r\nWe will explain how we addressed this challenge, as well as the various **technical improvements** implemented to enhance both the retrieval (vector search) and generation (llm) metrics of our RAG.", "description": "RAG works as follows:\r\n\r\n- An **embedding model** is used to create representations of all documents. These representations are then stored in a **vector database**.\r\n- A user poses a question. The same **embedding model** is used to create a representation of this question, enabling the **retrieval** of the most similar documents through a **similarity search**.\r\n- These documents are incorporated into a **prompt** along with the question to **generate an answer based on the documents' content**.\r\n\r\nMany open-source tools, such as Langchain, enable the creation of such pipelines in just [few lines of code](https://python.langchain.com/docs/expression_language/cookbook/retrieval). However, without specific adjustments, such systems often do **not** perform well enough to gain **user adoption**.\r\n\r\nIn this talk, we will cover the challenges and learnings encountered while building a **RAG for the drug documentation of a medical company**. More specifically, we will:\r\n\r\n- Cover the **basics** of RAGs.\r\n- Present the use case we faced and showcase the **resulting product**.\r\n- Show how we significantly improved our **retrieval and generation metrics** with techniques such as leveraging **LLMs** to add extra context to the user's question to enhance retrieval accuracy.\r\n- Discuss how we designed the product to effectively utilize LLMs while ensuring that doctors are not **misled** by potentially erroneous information, such as **hallucinations**. We achieved this mostly by displaying the sources: while many RAG pipelines cite their sources, we went a step further by **inserting HTMLs** of the sources directly **within** the generated answers, along with **highlighted citations**.\r\n- Highlight the tooling aspect of the project, e.g. **[Langsmith](https://www.langchain.com/langsmith) (a logging tool for LLMs)**, allowed us to easily augment our initial dataset and ensure that users were interacting correctly with the product. Furthermore, the ability to replay/alter a prompt on the interface allowed the **product owner** to iterate on prompt engineering and assist with technical iterations using their **field knowledge**.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "05e86b1c-67f7-5aac-87c2-2fb1804327ad", "id": 25892, "code": "B893GM", "public_name": "No\u00e9 Achache", "avatar": "https://pretalx.com/media/avatars/B893GM_01Wa0QN.JPG", "biography": "I am a Lead Data Scientist at Sicara, where I worked on a wide range of projects mostly related to vector databases, computer vision, prediction with structured data and more recently LLMs.\r\nI am currently leading the GenAI development in the company.\r\n\r\nHere the list of the talks I did:\r\n\r\n[Great Practices for RAG in Production](https://www.aleios.com/talks/great-practices-for-rag-retrieval-augmented-generation-in-production-noe-achache) @GenAI London Meetup\r\n\r\n[How to Choose a Vector Database in 2023](https://www.youtube.com/watch?v=aX_hdQEintc) @DVC Meetup\r\n\r\n[Advanced Visual Search Engine with Self-Supervised Learning (SSL)](https://www.youtube.com/watch?v=n5ccoC9di7U) @PyconDE et Pydata Berlin 2023\r\n\r\nGreat Practices for RAG in Production @GenAI Paris meetup\r\n\r\nGenerating Millions of text boxes with a GAN @Meetup Computer Vision Paris", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/BYH8Y8/", "id": 41690, "guid": "f89913c8-8fb0-56b6-b01d-c65eca93165a", "date": "2024-04-22T12:15:00+02:00", "start": "12:15", "logo": null, "duration": "00:30", "room": "B05-B06", "slug": "pyconde-pydata-2024-41690-acknowledging-women-s-contributions-in-the-python-community-through-podcast", "title": "Acknowledging Women\u2019s Contributions in the Python Community Through Podcast", "subtitle": "", "track": "General: Community, Diversity, Career, Life and everything else", "type": "Talk", "language": "en", "abstract": "The Python community has been making efforts in improving the diversity and representation among its members. There are examples of success stories such as PyCon US Charlas, PyLadies, Djangonaut, and Django Girls. Yet in the Python podcast community, women are still underrepresented, making up only 17% of invited guests among the popular podcast series. Being a guest in a podcast is a privilege, and an opportunity to influence the Python community. There are many women and underrepresented group members who have made impactful contributions to the Python community globally, and they deserve the recognition and to be heard by the rest of us. Disheartened by the lack of representation by women on Python podcasts, and inspired by others who have shown us how diversity in the community can be improved through intentionality, we decided to start a podcast with a goal to highlight their voices so that they could receive the recognition they deserve. In this talk, learn about them, and about our podcast series. We\u2019ll also share how you can further help out cause in improving representation and diversity in the Python community.", "description": "The Python community has been making efforts in improving the diversity and representation among its members. There are examples of success stories such as PyCon US Charlas, PyLadies, Djangonaut, and Django Girls. Yet in the Python podcast community, women are still underrepresented, making up only 17% of invited guests among the popular podcast series. Being a guest in a podcast is a privilege, and an opportunity to influence the Python community. There are many women and underrepresented group members who have made impactful contributions to the Python community globally, and they deserve the recognition and to be heard by the rest of us. Disheartened by the lack of representation by women on Python podcasts, and inspired by others who have shown us how diversity in the community can be improved through intentionality, we decided to start a podcast with a goal to highlight their voices so that they could receive the recognition they deserve. In this talk,earn about them, and about our podcast series. We\u2019ll also share how you can further help out cause in improving representation and diversity in the Python community. \r\n\r\n## Goal\r\n\r\nTo raise awareness of the underrepresentation of certain groups, especially women. To acknowledge the progress made by the Python community and what can be done further to continue the improvement.\r\n\r\n## Target Audience \r\n\r\nAnyone who cares about the diversity and inclusion progression in the Python community. Community leaders who want to be allies.\r\n\r\n## Outline\r\n\r\n### Diversity in Python community, examples (5 minutes)\r\n\r\n- PyCon US speakers: from 1% in 2011 to 40% in 2016\r\n-Efforts in improving diversity in the Python community: Charlas, PyLadies, DjangoGirls, Djangonaut\r\n\r\n### How are those efforts successful? (5 minutes)\r\n\r\n- Intentionality: starts with recognizing the issue and clear intention and goal in improving the situation\r\n- Outreach: targeted and direct outreach to underrepresented, explicit invitation asking underrepresented group members to participate in\r\n- Opportunity: providing opportunities and tools for women to succeed\r\n\r\n### In Podcast (3 minutes)\r\n\r\n- Since there were no stats, we collected our own data by scraping three most popular Python Podcasts\r\nCollected using Python, beautiful soup, and Datasette\r\n- Our result shows that among the three podcasts that have been running for years, women made up only 17% of invited guests, whereas there were the same men who appeared more frequently on the same shows\r\n\r\n### Why is ithis important (5 minutes)\r\n\r\n- Podcast guest is influential\r\n- Women and underrepresented group members deserve to be seen and heard\r\n- Representation creates inspirations. Lack of representation = lost opportunity to inspire women to further participate in the community\r\n\r\n### 6 months of our podcasts (4 minutes)\r\n\r\n- Share public reactions and support from our launch\r\n- Karolina Ladino: in Colombia, women has to be accompanied by husband, brothers to come to meetups, otherwise it's not safe for them to come alone. \r\n- Joanna Jablonski: making impact in Python community through documentation and developer education\r\n\r\n### How you can help(3 minutes)\r\n\r\n- Listen to their stories\r\n- Actively promote and boost voices from women and underrepresented group members\r\n- Suggest people to interview", "recording_license": "", "do_not_record": false, "persons": [{"guid": "716d26c2-170b-5a5e-86e5-9d4cecf3bbdd", "id": 54, "code": "8EGVC9", "public_name": "Cheuk Ting Ho", "avatar": "https://pretalx.com/media/avatars/8EGVC9_EpBXtRy.jpg", "biography": "After having a career as a Data Scientist and Developer Advocate, Cheuk dedicated her work to the open-source community and working as a community manager at OpenSSF. She has co-founded Humble Data, a beginner Python workshop that has been happening around the world. She has served the EuroPython Society board for two years and is now a fellow and director of the Python Software Foundation.", "answers": []}, {"guid": "9f1c4db3-3e40-5e40-a06d-ad540d3a75fc", "id": 1899, "code": "NMACLQ", "public_name": "Tereza Iofciu", "avatar": "https://pretalx.com/media/avatars/NMACLQ_WXPaTiS.jpg", "biography": "Tereza Iofciu is an experienced data practitioner and career coach, leading a coaching team and teaching data science at neuefische. She is a co-organizer of the PyLadies Hamburg group and is part of the Python Software Foundation Code of Conduct and Diversity & Inclusion working groups. She has been awarded the Python Software Foundation 2021 Q1 community service award. She is also part of the DISC Steering Committee team and an Ambassador for Observable.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/NYFVLM/", "id": 41056, "guid": "caec3724-cf44-5707-af06-709bcf72c6aa", "date": "2024-04-22T13:45:00+02:00", "start": "13:45", "logo": null, "duration": "00:45", "room": "B05-B06", "slug": "pyconde-pydata-2024-41056-the-pragmatic-pythonic-data-engineer", "title": "The pragmatic Pythonic data engineer", "subtitle": "", "track": "PyData: Data Handling & Engineering", "type": "Talk (long)", "language": "en", "abstract": "Learn to make practical decisions in data engineering with Python's vast ecosystem. Avoid blindly following market guidelines and consider the reality of your situation for better performance and architecture.", "description": "Often, we tend to look at the success of others and try to repeat their **decisions**, expecting the same result. We must deal with things sensibly and realistically based on practical rather than just theoretical considerations. **Python** offers a vast **ecosystem** to handle all phases of data engineering. Implementing a **data architecture** can be complex, and many adopt the strategy of using\u00a0market **guidelines**\u00a0without **pragmatism** of understanding your **reality**; in most cases, this strategy is a big problem of\u00a0**architecture**\u00a0and\u00a0**performance**.\r\n\r\nAs a part of this talk, we will walk through the process of identifying **Pythonic** components of **data analysis**, **data cleaning**, **data ingestion**, **databases**, **file systems**, **serialization formats**, **workflows**, and **pipelines**. As we move through those steps, my main focus is teaching the audience **pragmatic thinking** on incorporating best practices into the **data architecture** process. I will also walk through **strategies** and explain high-level data engineering concepts we can use.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "c3f2e1c5-6161-55e9-9ac5-1340b4c4aeb9", "id": 1528, "code": "GPMPE7", "public_name": "Robson Junior", "avatar": "https://pretalx.com/media/avatars/GPMPE7_mq82Byz.jpg", "biography": "Robson has been a developer since 2003 with a multifaceted life. Since 2014, I transitioned my career to be a Data Engineer and used Python to handle complex pipelines and glue other technologies. Living in Berlin, in their free time, he is an apprentice paramedical tattooer and glider pilot.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/7898PU/", "id": 43040, "guid": "9bd0fccd-7c62-5beb-95e4-ed1a7e78cf2a", "date": "2024-04-22T14:35:00+02:00", "start": "14:35", "logo": null, "duration": "00:30", "room": "B05-B06", "slug": "pyconde-pydata-2024-43040-whispered-secrets-building-an-open-source-tool-to-live-transcribe-summarize-conversations", "title": "Whispered Secrets: Building An Open-Source Tool To Live Transcribe & Summarize Conversations", "subtitle": "", "track": "PyData: Generative AI", "type": "Talk", "language": "en", "abstract": "Are you secretly a spy and/or passionate about open-source? Maybe you don't trust a cloud-hosted service with your highly classified information, or perhaps you like to build things for yourself. In this light-hearted talk, you will learn how to make a real-time on-device GenAI-powered application that can live transcribe and summarize conversations without internet access, using open-source components.\r\n\r\nOur journey begins with an introduction to open-source LLMs and the latest trends in running GenAI tools on your own hardware. We will build up our application step-by-step, first creating a live streaming voice-to-text transcription pipeline, then an LLM-based conversation summarization layer, presented within a Streamlit frontend, with conversation summaries sent to a lightweight Django API backend for storage.\r\n\r\nThis talk is tailored for Python enthusiasts and requires no ML expertise. By seeing a practical demo come together piece by piece, attendees will gain a deeper understanding of how to build their own complex Generative AI applications and be pushed to imagine what they could make for themselves using on-device computation in real-world scenarios.", "description": "This light-hearted talk will aim to introduce the audience to the latest trends and possibilities for building GenAI applications using open-source components. Here's why this matters:\r\n\r\n* Cloud-hosted SaaS tools cannot store highly **sensitive information**.\r\n* **Good open-source alternatives exist** for most GenAI tasks; the more people who use them, the more they will thrive.\r\n* Commercial tools will solve for common use cases, but developers can build personalized tools that are **highly specialized for their own bespoke needs**.\r\n\r\nDuring the course of this talk, we will build a real-time conversation pipeline including transcription, summarization and topic analysis layers. We will use open-source Python libraries, including a Streamlit frontend and a Django API backend. The primary focus is to demonstrate the simplicity of building complex LLM-based applications, specifically tailored for attendees with a basic understanding of Python but who may not have prior experience using LLMs.\r\n\r\nWe'll explore a variety of tools*, the use of Whisper for accurate live transcription, delving into its capabilities and integration with Streamlit. Additionally, we'll discuss LangChain + llama.cpp + Llama-2 for efficient summarization and topic analysis, highlighting their performance on standard hardware like a MacBook Pro. For the web API, Django will be our framework of choice, providing a robust and scalable solution for storing and displaying our conversation transcripts and summaries. We will also demonstrate how additional tools can be easily integrated into our workflow, for example using the Chroma vector database to build a simple semantic search function.\r\n\r\nExpect plenty of Python code and some fun live demos, with GitHub code provided for attendees to try it at home. This demo only covers a small fraction of the immensely versatile capabilities available from the modern open-source AI landscape, but will leave attendees with a sense that building complex LLM-powered applications that solve real-world problems has never been this easy.\r\n\r\n_* The exact tools presented may be different from those mentioned here, due to the rapidly evolving nature of this landscape. The goal is to ensure that attendees are provided with state-of-the-art content that is fully up-to-date come April 2024._", "recording_license": "", "do_not_record": false, "persons": [{"guid": "b6fb7ed2-d6e3-5e15-9f2b-7b3250526cb7", "id": 25966, "code": "MM3RDV", "public_name": "John Sandall", "avatar": "https://pretalx.com/media/avatars/MM3RDV_P1rkf0N.jpg", "biography": "John Sandall is the CEO and Principal Data Scientist at Coefficient.\r\n\r\nHis experience in data science and software engineering spans multiple industries and applications, and his passion for the power of data extends far beyond his work for Coefficient\u2019s clients. In April 2017 he created SixFifty in order to predict the UK General Election using open data and advanced modelling techniques. Previous experience includes Lead Data Scientist at YPlan, business analytics at Apple, genomics research at Imperial College London, building an ed-tech startup at Knodium, developing strategy & technological infrastructure for international non-profit startup STIR Education, and losing sleep to many hackathons along the way.\r\n\r\nJohn is also a co-organiser of PyData London, co-founded Humble Data in 2019 to promote diversity in data science through a programme of free bootcamps, and in 2020 was a Committee Chair for the PyData Global Conference. He is currently a Fellow of Newspeak House with interests in open data, AI ethics and promoting diversity in tech.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/ZKYA9W/", "id": 42012, "guid": "b919bfbf-6b51-5ee6-8c81-262682ace185", "date": "2024-04-22T15:35:00+02:00", "start": "15:35", "logo": null, "duration": "00:30", "room": "B05-B06", "slug": "pyconde-pydata-2024-42012-everything-you-need-to-know-about-change-point-detection", "title": "Everything you need to know about change-point detection", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Stats", "type": "Talk", "language": "en", "abstract": "Change-point detection is a crucial processing step when dealing with long and non-stationary time series. It has been applied in many contexts, such as human activity recognition, speech/sound processing and industrial monitoring. This talk guides data scientists, engineers and researchers through the mathematical foundations of this subject, introduces the [ruptures](https://github.com/deepcharles/ruptures) Python package for change-point detection, and illustrates algorithms in a biomedical context. By the end, the audience will be able to integrate them into complex data pipelines.", "description": "How do you detect an activity change (e.g. walking to running to biking) from smartwatch data? Or abrupt transitions in paleoclimate records? Or when a server failure occurs, using hardware telemetry sensor data (fan speed, acoustic noise, etc.) and software metrics (CPU, memory, I/O, etc.)? If you work with long time series, you will inevitably have to detect changes in the data-generating model.\r\n\r\nChange-point detection is a crucial task for such signals. It consists in estimating the timestamps when the underlying signal model changes. First introduced in the 50s to monitor quality changes in industrial processes, this subject has since been extended to numerous contexts, such as sound/speech processing, human activity recognition, DNA analysis, analysis of COVID-19 policies' effects, software and hardware monitoring, etc. Over several decades, this subject has generated an important but heterogeneous body of work.\r\n\r\nThis talk will help data scientists, engineers and researchers navigate this vast literature. We will start by describing the mathematical and algorithmic background behind change-point detection in a high-level and easy-to-understand fashion. Then, we will introduce [ruptures](https://github.com/deepcharles/ruptures), a Python package containing many change-point detection methods, as well as calibration and visualisation routines. Algorithms will be illustrated in a real-world biomedical application. \r\nAt the end of the talk, the audience will be able to understand when to use change-point detection algorithms and how to calibrate and integrate them in a complex data pipeline.\r\n\r\n**Time breakdown:**\r\n- Introduction and motivations: 5 min\r\n- Background on change-point detection: 10 min\r\n- Python framework: 5 min\r\n- Illustration on a real-world biomedical data pipeline: 10 min\r\n- Q&A: 5 min", "recording_license": "", "do_not_record": false, "persons": [{"guid": "d544539b-e174-5664-8a49-c2ff4542105a", "id": 16097, "code": "BFRLAK", "public_name": "Charles Truong", "avatar": "https://pretalx.com/media/avatars/BFRLAK_OqR5Moj.jpeg", "biography": "Charles Truong is a researcher at Centre Borelli, ENS Paris-Saclay, France. His research interests lie between signal processing, statistics and machine learning. Most of his work is applied in biomedical and industrial contexts. He is the core developper of [ruptures](https://github.com/deepcharles/ruptures), a Python package dedicated to change-point detection algorithms.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/LWWQ9U/", "id": 41739, "guid": "4fb85dfb-9751-5852-814a-628bf5552c5c", "date": "2024-04-22T16:10:00+02:00", "start": "16:10", "logo": null, "duration": "00:45", "room": "B05-B06", "slug": "pyconde-pydata-2024-41739-using-llms-to-create-knowledge-graphs-from-a-large-corpus-of-parliamentary-debates", "title": "Using LLMs to Create Knowledge Graphs From a Large Corpus of Parliamentary Debates", "subtitle": "", "track": "PyData: Natural Language Processing & Computer Vision", "type": "Talk (long)", "language": "en", "abstract": "Large Language Models (LLMs) have proven to be incredibly powerful on a range of tasks. They do however, have certain limitations when the input context becomes significantly large. Solutions such as Retrieval Augmented Generation (RAG) do a great job in providing context from custom data without retraining any models but they too have limitations, especially when the context is spread out over many documents. Consider the question \u201cWhich projects has person X worked on?\u201d. Information required to answer this question may be spread out over hundreds of documents, making it difficult for an LLM alone to answer. One way to overcome this issue is to use an LLM as an entity extraction tool, which can extract entities and relationships from documents and load that data into a structured format such as a knowledge graph. In this talk, I will demonstrate this process on a dataset of parliamentary debates, showing how downstream analytics becomes more intuitive and feasible.", "description": "In this talk, I will demonstrate the process through which I implemented a solution to create knowledge graphs using LLMs and why this can be powerful.\r\n\r\nAgenda:\r\n- Limitations of LLMs and RAG for specific tasks\r\n- Knowledge graph (KG) bascis\r\n- Creating KGs using LLMs\r\n- Dataset and use-case: official parliamentary debates\r\n- Practical experience in creating an LLM-based pipeline\r\n- Retrieving data using natural language i.e. Text2SQL\r\n- Future works", "recording_license": "", "do_not_record": false, "persons": [{"guid": "d49c9d7c-b289-595f-95a7-83bdef30bb47", "id": 38347, "code": "FXR939", "public_name": "Usman", "avatar": "https://pretalx.com/media/avatars/FXR939_sTgrAfU.png", "biography": "Usman is a Machine Learning Engineer working for Xebia Data, with an interest for graph theory, low-level machine learning frameworks and the bridge between research and real-world implementation.", "answers": []}], "links": [], "attachments": [], "answers": []}], "A1": [{"url": "https://pretalx.com/pyconde-pydata-2024/talk/Y3FLEH/", "id": 41668, "guid": "af3d2951-0daf-52bf-99ef-f3360a0a9121", "date": "2024-04-22T11:25:00+02:00", "start": "11:25", "logo": null, "duration": "00:45", "room": "A1", "slug": "pyconde-pydata-2024-41668-best-of-both-worlds-how-we-built-an-ai-aided-content-creation-tool-for-language-learning", "title": "Best of both worlds - How we built an AI-aided content creation tool for language learning", "subtitle": "", "track": "General: Industry & Academia Use-Cases", "type": "Talk (long)", "language": "en", "abstract": "Discover how Babbel bridged the gap between tailored language learning and scalability through an AI-aided content creation tool. Our approach amalgamates human expertise with Generative Artificial Intelligence, enabling personalized content creation on a large scale. Join us on our development journey and the different iterations we went through. We will demo the tool's current version and its AI features. Learn about the tech stack and what lies ahead in our development pipeline.", "description": "Babbel learners value the high quality content that follows an educational methodology and covers everything a learner needs to become conversational in a foreign language. However, language learning cannot be approached with a one-fits all strategy. Learners have different motivation, interests, goals & learning needs that they want to see addressed throughout their learning path. Relying on human learning experts only for creating thousands of tailored learning items to personalize our contents is not a scalable solution. Luckily, recent developments in Generative Artificial Intelligence (GenAI) and its high-performing Large Language Models (LLMs) offer great opportunities to leverage artificial intelligence (AI) in the content creation process to enable large-scale personalization of contents. \r\n\r\nLet us take you on our journey of developing an AI-aided content creation tool for language learning which combines best of both worlds, namely using AI to automate and scale various steps within the content generation process and putting human intelligence (HI) in the loop to make sure that our contents meet the expectations of our learners and fit the Babbel way of learning. We will give you an overview of our development process with the help of our cross-functional team and walk you through the different iterations - from initial workflow analysis to leveraging the power of connecting our tool to Babbel\u2019s proprietary data. Additionally, we will demo the current version of the tool and give a quick tour of the different AI features that we already included. We will give an overview of the used tech stack and a quick outlook on what is next in the development pipeline.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "430c055c-690a-583f-8ea8-ed072094665d", "id": 38350, "code": "RTZXGA", "public_name": "Hector Hernandez", "avatar": "https://pretalx.com/media/avatars/RTZXGA_pGlit50.png", "biography": "Hector Hernandez is a language educator, currently serving as a Computational Linguist at Babbel. With 12 years in Second Language Acquisition, he brings a wealth of experience in curriculum development, podcast hosting, scripting, and teaching English and Spanish as a second language. More recently, Hector has delved into developing applications with Large Language Models, applying this technology to enhance language learning experiences.", "answers": []}, {"guid": "a8621ee8-e7b4-5021-beab-b49d81933fee", "id": 25758, "code": "GBF7R3", "public_name": "Lea Petters", "avatar": "https://pretalx.com/media/avatars/GBF7R3_RrXc1CL.png", "biography": "Lea Petters holds a PhD in Behavioral Economics and works as a Data & AI expert at inovex. Her industry expertise spans retail, e-commerce, media, and education tech. She has experience realizing different use cases spanning sales forecasting, causal effect estimation, recommendation systems, and generative AI.\r\n\r\nShe is passionate about topics around causality, mathematical modeling, data & AI strategy, data storytelling, ethics, and large language models.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/8LNYPD/", "id": 39662, "guid": "ccdc3351-8f31-5fe7-be4d-d91f9564aa24", "date": "2024-04-22T12:15:00+02:00", "start": "12:15", "logo": null, "duration": "00:30", "room": "A1", "slug": "pyconde-pydata-2024-39662-power-structures-the-fair-advantage", "title": "Power structures. The fair advantage", "subtitle": "", "track": "General: Ethics & Privacy", "type": "Talk", "language": "en", "abstract": "Humans are complex. As developers, we wanna ignore that ... but to do our job right, we cannot. Let's talk about power, motivation, techno-sociology, politics and why all of this is important for our job.", "description": "Have you ever been in the following situation? You know for certain that you are technically right. Your project has to be done for the benefit of the company. But you cannot convince your boss for whatever reason. You are stuck. - This might be the glorious moment of informal structures and networking. You will need to know whom else to talk to. Whom you can trust and who has the power to convince your boss? The best answer will rarely be found in formal organizational structure.\r\n\r\nAs developers, we often think in models and charts. We are used to formalize worded requests into code and structures to solve problems. And we are good at it. But what you cannot fully put into models are humans and human behavior. This is also true for the human interactions inside companies and networks. Organigrams never tell the truth about an organization. Power and influence is more complex than formal structures can describe.\r\n\r\nIn this talk, I wanna dive into how human interactions inside companies are at the same time complex, powerful and worth exploring.\r\n\r\nDisclaimer: This is no talk about unfair techniques. I will not provide you dark magic. My goal is to provide you the knowledge how to fairly play in a complex world.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "8ebface0-8cf3-5f87-ba41-7c0d56096883", "id": 37123, "code": "HL7EKF", "public_name": "Anja Kunkel", "avatar": null, "biography": "I'm Anja, Principal Fullstack Engineer working  at Mister Spex @ Berlin, Germany. We are right now breaking down a monolith and replacing parts of it by future-ready technology. Previously, I have worked for eBay.\r\nAlso, I'm sports enthusiast and current local champion in wakeskating. I love to find and cross the borders of what seems achievable.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/XDQNCR/", "id": 40429, "guid": "dcb1e147-ebf2-5a76-9811-e939ad8edd27", "date": "2024-04-22T13:45:00+02:00", "start": "13:45", "logo": null, "duration": "00:45", "room": "A1", "slug": "pyconde-pydata-2024-40429-tailored-and-trending-key-learnings-from-3-years-of-news-recommendations", "title": "Tailored and Trending: Key learnings from 3 years of news recommendations", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Stats", "type": "Talk (long)", "language": "en", "abstract": "Every day, we engage with news, and more often, these are curated by recommendation engines. Building such an algorithm poses some unique challenges, different from movie or product recommendations: articles have a short lifetime because nothing is older than yesterday's news. The data is heavily biased by the different positioning of articles on the page, and journalistic principles and brand identity should be represented in the article selection. At Axel Springer National Media and Tech, we overcome these challenges by leveraging our domain knowledge combined with simple statistics instead of black-box machine learning models. This talk will share some of our learnings that can be applied to recommendation systems and data science projects in general.", "description": "#### What is special about news recommendations?\r\n\r\n- We are used to recommendations from Netflix, Amazon, or TikTok. All of these apps have logged-in users that can be easily tracked. News websites, on the other hand, have a large share of unknown users that can only be tracked via first-party cookies. Therefore, there is much more cold start in the user dimension. In addition to that, movies, products, and funny videos have relatively long lifetimes, whereas news articles are often only relevant for a few hours. This means that recommendation systems have much less time to collect information about what is relevant for whom, and there is a lot of cold start in the item dimension.\r\n\r\n- Users are more critical with the selection of news articles that are presented to them compared to selections of products or movies. News recommendation is not only about finding the most relevant items; it is also about putting items in the right relationship to each other to reflect journalistic considerations and brand values. For example, often articles should be sorted according to the seriousness of the topic, or the topic's relevance for society. Similar articles should be placed next to each other, etc.\r\n\r\n- The front page plays an outsized role for news websites. Users come here to get an overview of what is happening in the world. Consequently, the data generated by these websites is heavily dominated by effects that originate in the structure and mechanics of the front page. Articles shown on top of this page with a large image will be clicked much more likely, compared to an article at the bottom of the page with just a small headline. \r\n\r\n#### How do news recommendations typically work?\r\n\r\n- Recommendation engines are often closely associated with collaborative filtering. However, collaborative filtering systems struggle with cold start, which is especially prevalent for news articles and users of media sites. At the same time, there are many simple ways to rank articles. Articles can be sorted according to their age, their popularity, or according to how often a user has read articles from the same category before. Based on our experience, most systems deployed in practice use a combination of these principles along with collaborative filtering. Especially for smaller widgets, multi-armed bandit approaches are also popular, where the algorithm just tries different articles and keeps showing those that tend to have the highest CTR.\r\n\r\n#### What is special about our approach to news recommendations?\r\n\r\n- One can think of recommendation as a simple click prediction problem. We have one user and many items and want to use features of the user and the items to predict how likely the user will click. The articles can then be ranked and selected based on these probabilities. Therefore, we are not tied to use collaborative filtering algorithms but can use any machine learning algorithm of our choice.\r\n\r\n- A major feature for our system is to identify articles that are trending. Most popular feeds and rankings are widely used, but as an absolute measure, they are heavily influenced by the position bias. The articles on top of the page are most likely to get the most clicks, therefore they will be put on top of the page again. This cycle continues until the story becomes so uninteresting that it starts to perform worse than other stories in worse positions.\r\n  In contrast to that, we refer to relative performance as trendingness. If a story performs better than usual for its position, then it is trending. The beauty of this approach is that it makes the performance of articles at the top and at the bottom of the page comparable to each other. You can be 10 percent better or worse than expected in all positions of the page. The ugly part is that numbers at the bottom of the page start to become very small and therefore trendingness becomes very unstable. If an article is expected to get 1/100 of a click in a certain time interval, and there is an accidental click on this article, you suddenly have an incredible trending article. Unfortunately, most news pages contain many articles that are clicked with very low probabilities, therefore you have good chances to produce these outliers quite frequently. The art of constructing a good measure of trendingness is in finding a good way to regularize the trendingness to avoid these effects.\r\n\r\n- Position bias on news media sites is so strong that a classification model that predicts clicks solely using the position of an article as a feature will have an AUC of about 0.8. Consequently, a model trained on clicks will mostly just learn patterns that are correlated with the position. For example, if politics articles tend to be placed higher on the page than sports articles, the model will learn that politics articles generally click better than sports articles. We can avoid this by giving the model information about the position, but then the algorithm mostly picks up position-related patterns that cannot be exploited when choosing which article to put in one specific position.\r\n\r\n- When training our recommendation algorithm, we overcome the position bias problem by weighting clicks so that they are compared on neutral grounds. First, we determine the click probability of an article based on its position alone. Then we weight clicks and non-clicks according to their relative probability. \r\n\t- A click that was supposed to happen with a probability of 0.1 becomes 1/0.1 - 1 = 9, and a click with a probability of 0.01 becomes 1/0.01 - 1 = 99. A likely click gets a lower weight than an unlikely one. \r\n\t- We also derive information from non-clicks. A non-click with a probability of 0.9 becomes -1/0.9 + 1 = -0.1. If an article is presented in a prominent position, but it is not clicked by the user, this is an expression of disinterest and it can help to feed our algorithm. \r\n\t- By turning clicks into weighted clicks, we essentially turn the problem from a classification problem into a regression problem. On average, the weighted clicks are equal across all positions, so that the position bias is eliminated.\r\n\r\n- One of the features that surprised us the most with its good performance is our \"article already seen\" feature. For each user and every recommendable article, we keep a counter that measures how often the article was already shown in a prominent position but not clicked by the user. These scores are based on the position-based click probabilities that we also use for the weighted clicks. If an article gets shown in a position with an average CTR of 0.1, the score is 0.1 the next time the article could potentially be recommended to the user. If the article now gets shown in a lower position with a click probability of 0.01, the score increases to 0.11 next time. The model then learns that articles that were shown multiple times in prominent positions before but were not clicked are likely not going to be clicked next time they are shown, either. As a consequence, the page becomes fresher and A/B test results indicate a meaningful uplift compared to a model without this feature.\r\n\r\n#### What have we learned?\r\n\r\n- Websites usually track what users do, but not what they do themselves. Our algorithms rely heavily on the fact that we track who saw what and in which position. This gives us the ability to overcome the position bias and significantly improve our algorithms.\r\n    \r\n- We do simple things for complicated reasons. The key advantage of simple statistical models over black-box algorithms is that they are easier to debug. Every time we replace a boosted tree or something similar with a linear model, we realize that it is not acting the way we expected. We can then make the necessary adjustments - for example, by adding well-crafted features that leverage our domain expertise. At the end of the process, the linear model becomes better than the black-box model was in the beginning.", "recording_license": "", "do_not_record": true, "persons": [{"guid": "60099318-9197-513b-82c2-32701fc71cb0", "id": 37710, "code": "SZ9NBS", "public_name": "Dr. Christian Leschinski", "avatar": "https://pretalx.com/media/avatars/SZ9NBS_XPex6qE.png", "biography": "Dr. Christian Leschinski leads the data science team and the Customer Intelligence team at Axel Springer National Media and Tech. His work is dedicated to build data and AI products that improve the user experience and the monetisation of digital media products and to help organisations to make data-informed decisions. This encompasses use cases ranging from programmatic advertising and subscription pricing to customer analytics and news recommendation.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/ZCKQVG/", "id": 42921, "guid": "31a9b36b-645d-55e8-9553-00415848bfda", "date": "2024-04-22T14:35:00+02:00", "start": "14:35", "logo": null, "duration": "00:30", "room": "A1", "slug": "pyconde-pydata-2024-42921-a-retrieval-augmented-generation-system-to-query-the-scikit-learn-documentation", "title": "A Retrieval Augmented Generation system to query the scikit-learn documentation", "subtitle": "", "track": "PyData: Generative AI", "type": "Talk", "language": "en", "abstract": "The scikit-learn website currently employs an \"exact\" search engine based on the Sphinx Python package, but it has limitations: it cannot handle spelling mistakes and queries based on natural language. To address these constraints, we experimented with using large language models (LLMs) and opted for a retrieval augmented generation (RAG) system due to resource constraints.\r\n\r\nThis talk introduces our experimental RAG system for querying scikit-learn documentation. We focus on an open-source software stack and open-weight models. The talk presents the different stages of the RAG pipeline. We provide documentation scraping strategies that we designed based on numpydoc and sphinx-gallery, which are used to build vector indices for the lexical and semantic searches. We compare our RAG approach with an LLM-only approach to demonstrate the advantage of providing context. The source code for this experiment is available on GitHub: https://github.com/glemaitre/sklearn-ragger-duck.\r\n\r\nFinally, we discuss the gains and challenges of integrating such a system into an open-source project, including hosting and cost considerations, comparing it with alternative approaches.", "description": "Currently, the scikit-learn website provides an \"exact\" search engine based on the tools provided by the Sphinx Python package (i.e., https://www.sphinx-doc.org/). The current search engine is implemented in JavaScript and runs locally using an index built when generating the documentation. This solution has the advantage of being lightweight and does not require any server to handle the query. However, the complexity of the query treated is weak: since the search is \"exact,\" it is not robust to spelling mistakes, and the search is intended for searches based on keywords.\r\n\r\nAs large language models (LLMs) are becoming more popular, we have been interested in experimenting with this technology, knowing that they could address some of the previously stated limitations. As an open-source project, we have limited resources in terms of compute and limited available datasets; therefore, we discarded the option of fine-tuning an LLM and leaned towards retrieval augmented generation (RAG) systems.\r\n\r\nThis talk presents an experimental RAG system developed to query the scikit-learn documentation. As constraints, we impose ourselves to use an open-source software stack and open-weight models to build our system. The talk is decomposed as follows:\r\n\r\nFirst, we provide some background on the RAG system and the pipeline to follow to implement such a system.\r\n\r\nThen, we go into details in the different stages of the RAG pipeline. We provide some insights regarding documentation scraping strategies that we developed by leveraging the `numpydoc` and `sphinx-gallery` parser. Then, we discuss the solution that we tested to perform lexical and semantic searches. Finally, we explain how the context found can be fed to the LLM to help generate an answer to the user query. We provide a small demo to compare queries performed on an LLM-only system and on the developed RAG system. All the code for the experiment is hosted at the following GitHub repository: https://github.com/glemaitre/sklearn-ragger-duck.\r\n\r\nFinally, we put into perspective the gains and pains of such an RAG system when it comes to integrating it into an open-source project. Notably, we question the hosting and cost of such systems and compare it with other approaches that could tackle some of the original issues.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "d6297904-fa91-50b7-9510-cc23c0cf9edb", "id": 55, "code": "KMDJAL", "public_name": "Guillaume Lemaitre", "avatar": "https://pretalx.com/media/avatars/KMDJAL_Y5TrEgm.jpg", "biography": "I have a PhD in computer science and have been a scikit-learn and imbalanced-learn core developer since 2017. I am currently an open-source engineer helping at the maintenance of these tools.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/G9S3MR/", "id": 41848, "guid": "b0db931d-d41d-5d59-a1d8-13b901d502c8", "date": "2024-04-22T15:35:00+02:00", "start": "15:35", "logo": null, "duration": "00:30", "room": "A1", "slug": "pyconde-pydata-2024-41848-moving-from-offline-to-online-machine-learning-with-river", "title": "Moving from Offline to Online Machine Learning with River", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Stats", "type": "Talk", "language": "en", "abstract": "The foundations of machine learning were built on offline batch processing techniques for model training and inference. As organisations become more dependent on real-time data, the technological trend for machine learning in production is moving towards adding an online stream processing approach. This has benefits such as lower computational requirements due to being able to incrementally learn from a stream of data points, which enables the continual upgrading of models by adapting to real-time changes in data. Learn how to get started on your online ML journey with River", "description": "The foundations of machine learning were built on offline batch processing techniques for model training and inference. As organisations become more dependent on real-time data, the technological trend for machine learning in production is moving towards adding an online stream processing approach. This has benefits such as lower computational requirements due to being able to incrementally learn from a stream of data points, which enables the continual upgrading of models by adapting to real-time changes in data.\r\n\r\nThis has wide applications in industries such as cyber security, banking, healthcare, IIoT and any industry that involves processing large volumes of high throughput data and adapting predictive capability with real-time data feeds.\r\n\r\nYou\u2019ll leave this talk with an understanding of the differences between offline and online machine learning, how to complement one with the other and enough streaming concepts and best practices needed get started on your online ML journey with River, an open source Python ML library.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "bee60e09-231f-5574-9760-ebe042c24a2f", "id": 38384, "code": "KBN889", "public_name": "Tun Shwe", "avatar": "https://pretalx.com/media/avatars/KBN889_7ze86dw.png", "biography": "Tun Shwe is the VP of Data at Quix, where he leads data strategy and developer relations. He is focused on helping companies imagine and implement their strategic data vision with stream processing at the forefront. He was previously a Head of Data and Data Engineer at high growth startups and has spent his career leading T-shaped teams in developing analytics platforms and data-intensive AI applications.\r\n\r\nIn his spare time, Tun goes surfing, plays guitar and tends to his analogue cameras.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/WEVXJS/", "id": 43022, "guid": "26f537ab-cc76-565e-a793-b51ef6972434", "date": "2024-04-22T16:10:00+02:00", "start": "16:10", "logo": null, "duration": "00:45", "room": "A1", "slug": "pyconde-pydata-2024-43022-put-your-rag-to-the-test-component-per-component-evaluation-of-our-llm-powered-airplane-manufacturing-assistant", "title": "Put your RAG to the test: Component-per-component evaluation of our LLM-powered airplane manufacturing assistant", "subtitle": "", "track": "PyData: Generative AI", "type": "Talk (long)", "language": "en", "abstract": "Your RAG-powered LLM application might look pretty convincing at first glance, but how do you really know if it\u2019s any good? And how do you justify the design choices you make? In this talk, you will learn about the RAG evaluation concept we produced at Airbus for evaluating the components of our digital engineering assistant, its implementation with open source tools paired with Google Vertex AI, and what we learnt in the process.", "description": "Nowadays, Retrieval Augmented Generation (RAG) architecture has become quite the standard approach for building high-quality document search products or personal assistant applications. Prototyping a RAG application might yield quite convincing results from the very first stages of development, but how do you know if it\u2019s really any good when you move your application from prototype into production? And how do you justify the design choices you make? For example, do you know if long-context models would perform better than short-context models with chunking for long-form documents you have at hand? Or, what difference does it make if you keep your different types of documents in one index or in separate ones? Or, is usage of few-shot learning really worth it for your use case, given that adding examples can increase the cost dramatically compared to zero-shot learning? And of course, how do you know there isn\u2019t a better prompt out there for making the LLM do exactly what you expect it to?\r\n\r\nAt Airbus, we went through this thought process during the development of a RAG-based assistant for creation of assembly manuals - documents which help our colleagues in Manufacturing navigate through the airplane parts construction procedures. For answering these and other questions, we produced an evaluation concept for our Generative AI applications, which relies on different methods and metrics for RAG evaluation end-to-end and testing each of its components separately. In this talk, we will present our evaluation concept, how we implemented it with tools like LangChain and Ragas, what metrics we use and how we conduct our experiments with the help of Google Vertex AI Pipelines.", "recording_license": "", "do_not_record": true, "persons": [{"guid": "b53f695a-6afe-570e-8c6e-8892585e1447", "id": 38939, "code": "3VWR8F", "public_name": "Nataliia Kees", "avatar": "https://pretalx.com/media/avatars/3VWR8F_2I1C4ml.jpeg", "biography": "I am a Data Scientist at Airbus, where I am a part of the team Digital, building AI products which empower engineering, manufacturing, sales and other business activities of the company. I enjoy diving deep into natural language processing and am passionate about MLOps, good coding practices and deploying AI applications in the cloud. Apart from that, I teach Python, and in my free time, I enjoy hiking and learning new languages.", "answers": []}], "links": [], "attachments": [], "answers": []}], "A03-A04": [{"url": "https://pretalx.com/pyconde-pydata-2024/talk/BKBNRF/", "id": 41676, "guid": "a1ddbac6-0731-5228-94ab-307a2c9af708", "date": "2024-04-22T11:25:00+02:00", "start": "11:25", "logo": null, "duration": "01:30", "room": "A03-A04", "slug": "pyconde-pydata-2024-41676-the-secret-life-of-metaclasses", "title": "The Secret Life of Metaclasses", "subtitle": "", "track": "PyCon: Python Language & Ecosystem", "type": "Tutorial", "language": "en", "abstract": "Metaclasses. What are they? Where do they live? How do they reproduce?\r\n\r\nDid you know that you can make your classes receive keyword arguments, just like functions? And that they can be decorated as well?\r\n\r\nDo you want to understand how classes, metaclasses and decorators work and what are they good for?\r\n\r\nIn this hands-on coding session we will inspect the inner workings of how Python creates classes, and how decorators, meta-classes and methods from superclasses can influence this process.\r\n\r\nWe'll explore:\r\n\r\n* normal and special methods\r\n* how attribute lookup works between instances and classes\r\n* what are descriptors, and how they fit into attribute lookup process\r\n* what is the relationship between instances, classes and metaclasses\r\n* what are metaclasses for\r\n* and some other metaprogramming odds and ends\r\n\r\nAll that is required for you to enjoy this session is that you have written a class in Python. If you've done the original [Python Tutorial](https://docs.python.org/3/tutorial/index.html), that should be more than enough.", "description": "Class outline:\r\n\r\n* 10 min.: Intro and Setup\r\n* 15 min.: Every time is \"runtime\":\r\n  * Function, Classes and Methods are created at runtime\r\n  * The dual responsibility of `class`\r\n  * Attribute lookup and method resolution order\r\n  * The role of `.__dict__` and `.__slots__`\r\n  * Special methods, giving instances superpowers\r\n* 10 min.: Everything is an object:\r\n  * Functions, methods and classes are also objects\r\n  * Descriptors, properties and method binding\r\n  * The two functionalities of `type`\r\n    * And how to create a class without the `class` keyword\r\n* 10 min.: Metaclass is the class of the class:\r\n  * Calling a class creates an instance, calling a metaclass creates a class\r\n  * `type` & `object`: class relations\r\n  * Creating and using metaclasses\r\n* 15 min.: What are metaclasses for?\r\n  * Giving classes special methods\r\n  * Intercepting class creation\r\n    * Keyword arguments in class declarations\r\n  * Preparing the class namespace\r\n  * The role of the methods: `__call__`, `__new__` & `__init__`\r\n  * What are metaclasses **not** for\r\n* 5 min.: complete debugging walkthrough\r\n  * class creation\r\n  * instance creation\r\n  * instance use\r\n* 5 min.: You're unlikely to ever need to create a metaclass\r\n  * `__init_subclass__`\r\n  * Class decorators\r\n  * `__class_getitem__`\r\n  * Capturing descriptor names and ordering\r\n* 5 min.: Examples\r\n* 5 min.: conclusion and questions", "recording_license": "", "do_not_record": false, "persons": [{"guid": "29a56ed7-cc7b-5de2-801d-f11dc786e5fb", "id": 21257, "code": "CH8JSD", "public_name": "Leonardo Rochael Almeida", "avatar": "https://pretalx.com/media/avatars/CH8JSD_9I4gDSH.jpg", "biography": "Python developer with over 22 years of experience, Leonardo is a technical reviewer for Luciano Ramalho's \"Fluent Python\" book for both editions.", "answers": []}, {"guid": "f8ea50ae-5ab1-54cf-854f-2ca388520f7c", "id": 2043, "code": "AX8V78", "public_name": "Luciano Ramalho", "avatar": "https://pretalx.com/media/avatars/AX8V78_HqzCl1I.jpg", "biography": "Luciano Ramalho is the author of Fluent Python, published in 9 languages and 2 editions since 2015. He was a pioneering organizer of the Python Brasil association, which supports the Brazilian national PyCon. He is now a writer, teacher, and model railroader.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/DPGRGW/", "id": 41302, "guid": "0c389fad-7fa1-52d7-b3d7-aebc27e66923", "date": "2024-04-22T13:45:00+02:00", "start": "13:45", "logo": null, "duration": "01:30", "room": "A03-A04", "slug": "pyconde-pydata-2024-41302-build-tiktok-s-personalized-real-time-recommendation-system-in-python-with-hopsworks", "title": "Build TikTok's Personalized Real-Time Recommendation System in Python with Hopsworks", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Stats", "type": "Tutorial", "language": "en", "abstract": "The real-time recommendations engine in Tiktok, Monolith, is so good it has been described as \"digital crack\" (by Andrej Karpathy, former head of AI at Tesla). In this tutorial, we will build the core components of Tiktok Monolith (a retrieval and ranking architecture): a stream processing feature pipeline, a two-tower embedding model to support personalized queries based on each user's history/context, and a simple user interface in Python (Streamlit). Our real-time machine learning system will consist of 3 Python programs - the feature pipeline, the training pipeline, and the online inference pipeline - and the ML infrastructure they require will be provided by the open-source Hopsworks platform, including a feature store, vector database, model serving, and model registry.", "description": "The real-time recommendations engine in Tiktok is so good it has been described as \"digital crack\" (by Andrej Karpathy, former head of AI at Tesla). It is a retrieval and ranking architecture that uses significant ML infrastructure, including a real-time feature store, a vector database, a model registry, and model serving infrastructure.\r\n\r\nIn this tutorial, we will build the core components of Tiktok Monolith as 3 ML pipelines: a stream processing feature pipeline that takes user actions (clicks, swipes, searches) written to Kafka and computes features that are stored in Hopsworks online store in less than 1 second.\r\nWe will train a two-tower embedding model to support personalized queries using training data grounded on each user's history/context and the videos they clicked/didn't-click on.\r\nWe will develop an online inference pipeline that takes a user query, encodes it as an embedding to retrieve candidate videos, then users an online feature store to enrich the candidates before a ranking model personalizes the order of candidates for the client. We will even develop a simple user interface in Python (Streamlit) to show the whole system working visually. \r\n\r\nOur real-time machine learning system will consist of 3 Python programs - the feature pipeline, the training pipeline, and the online inference pipeline - and the ML infrastructure they require will be provided by the open-source Hopsworks platform, including a feature store, vector database, model serving, and model registry.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "4ffcaec0-f4da-54c6-b8a3-045b1feb09cc", "id": 25748, "code": "3K8LUH", "public_name": "Jim Dowling", "avatar": "https://pretalx.com/media/avatars/3K8LUH_S0qJzJf.jpg", "biography": "Jim Dowling is CEO of Hopsworks and an Associate Professor at KTH Royal Institute of Technology. He is lead architect of the open-source Hopsworks platform, a horizontally scalable data platform for machine learning that includes the industry's first Feature Store. He is writing a book for O'Reilly on ML Systems with a feature store.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/CMM8S3/", "id": 41842, "guid": "ee35a28b-647e-5e0c-94d0-52a43a51cc8b", "date": "2024-04-22T15:35:00+02:00", "start": "15:35", "logo": null, "duration": "01:30", "room": "A03-A04", "slug": "pyconde-pydata-2024-41842-refactoring-large-programs", "title": "Refactoring Large Programs", "subtitle": "", "track": "PyCon: Programming & Software Engineering", "type": "Tutorial", "language": "en", "abstract": "One of the most challenging tasks in software engineering is cleaning up a complex software with 10,000-100,000 lines of code. The problem gets worse, if you are taking over legacy code. The fact that the Python language does neither enforce strict typing or encapsulation does not help either. What should you do if throwing away everything and rewriting the program from scratch is not an option?\r\n\r\nIn this tutorial, we will exercise refactoring a larger program that is undocumented, unstructured and untested. We will take a messy example program and work through a list of procedures that may help you in your next big refactoring.", "description": "Refactoring Large Programs\r\n\r\nYou find code and installation instructions for the tutorial on https://github.com/krother/space\r\n\r\nOne of the most challenging tasks in software engineering is cleaning up a complex software with 10,000-100,000 lines of code. The problem gets worse, if you are taking over legacy code. The fact that the Python language does neither enforce strict typing or encapsulation does not help either. What should you do if throwing away everything and rewriting the program from scratch is not an option?\r\n\r\nIn this tutorial, we will exercise refactoring a larger program that is undocumented, unstructured and untested. We will take a messy example program and work through a list of procedures that may help you in your next big refactoring. These include:\r\n\r\n* review the code\r\n* write a minimal test\r\n* add type annotations\r\n* extract core data structures\r\n* separate easily cleanable parts from very bad parts\r\n* remove excess dependencies\r\n* be very transparent about which features of the code you trust\r\n\r\nThe main takeaway of the tutorial is that large-scale refactoring is possible. Although a large refactoring is difficult and costly, you should learn that it can be approached systematically. You will walk away with ideas where to start refactoring. You will also develop your awareness how difficult a complex refactoring is. Looking at a messy codebase realistically is not only important to manage the expectations of clients and stakeholders, it is also important to manage the stress that comes with it.\r\n\r\nThis tutorial addresses people with fluency in basic Python. You should know how a class in Python works and what a Unit Test is. It helps if you have done simple refactoring before (extract variable, extract function) before. I encourage junior developers to attend the tutorial to learn and discuss how a potentially overwhelming situation looks like.\r\n\r\nThe tutorial session is structured in the following way:\r\n\r\n* 0:00 Interactive Warm-up with the audience: Who is here?\r\n* 0:05 Download and inspect code\r\n* 0:10 Quick code review\r\n* 0:20 Refactoring I: create a minimal test\r\n* 0:40 Refactoring II: extract data structures\r\n* 1:00 Refactoring III: isolate code\r\n* 1:20 buffer time and Q & A\r\n\r\nThe messy code and refactoring recipes will be provided to participants through GitHub.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "f8f9018b-f781-52d7-a64e-72e8ab52690d", "id": 15994, "code": "9EPNQG", "public_name": "Dr. Kristian Rother", "avatar": "https://pretalx.com/media/avatars/9EPNQG_TTM8mnl.jpg", "biography": "Kristian is a freelance Python trainer who wrote his first lines of Python in the year 0x11111001111. In his early career he wrote software for life science research. Since 2011, he has been teaching Python and Data Science in Europe. He has translated and written Python books and published teaching material. Kristian has collected 308 stars on Advent of Code. His knowledge about async is, unfortunately, miserable. His favorite Python module is 're'. Kristian believes everybody can learn programming.", "answers": []}], "links": [], "attachments": [], "answers": []}], "A05-A06": [{"url": "https://pretalx.com/pyconde-pydata-2024/talk/EHJRVF/", "id": 40843, "guid": "925b6888-1fad-5540-8e9f-1c1c29d6ea05", "date": "2024-04-22T11:25:00+02:00", "start": "11:25", "logo": null, "duration": "01:30", "room": "A05-A06", "slug": "pyconde-pydata-2024-40843-no-more-raw-sql-sqlalchemy-orms-asyncio", "title": "No More Raw SQL: SQLAlchemy, ORMs & asyncio", "subtitle": "", "track": "PyCon: Programming & Software Engineering", "type": "Tutorial", "language": "en", "abstract": "Managing a database and synchronizing service data representation with the database can be tricky. In this workshop, you\u2019ll learn how to use SQLAlchemy, a powerful SQL toolkit, to simplify this task. We\u2019ll cover how to leverage SQLAlchemy\u2019s Object Relational Mapper (ORM) system, and how to use SQLAlchemy's asyncio extension in your async services.\r\n\r\nParticipants will walk out of this tutorial having learned how to:\r\n- Use SQLAlchemy for database operations in Python, enhancing the readability and maintainability of the code\r\n- Build Python classes (ORMs) that represent the database tables\r\n- Experiment with different relationship-loading techniques to improve querying performance\r\n- Utilize SQLAlchemy\u2019s asyncio extension to interact with databases asynchronously", "description": "OUTLINE\r\n- Introduction [15 min]\r\n    - What is SQLAlchemy?\r\n    - Why use SQLAlchemy and advantages?\r\n    - Components Overview such as engine, dialect, connection pool, etc.\r\n- Initial setup for the hands-on workshop with GitHub Codespaces [5 min]\r\n    - Run and explore example service that has database queries with raw SQL\r\n- Adding SQLAlchemy to the example service\r\n    - Set up SQLAlchemy [10 min]\r\n        - Set up engine & dialect to connect with the DB\r\n        - Use SQLAlchemy Core to query the DB\r\n    - Add ORMs [20 min]\r\n        - What are ORMs?\r\n        - How to represent a basic table?\r\n        - Modeling different relationships (e.g., 1-1 and 1-many) between the classes\r\n        - Using ORMs to query the DB\r\n    - Convert other queries using SQLAlchemy [5 min]\r\n- Improve performance by changing relationship loading techniques [10 min]\r\n    - Consequences of certain models: Talk about N+1 problem and bidirectional relationships\r\n    - Work with different loading techniques, such as lazy loading and eager loading\r\n- The SQLAlchemy.asyncio extension\r\n    - Brief description of asyncio [10 min]\r\n        - Understanding coroutines\r\n        - Scheduling tasks on the asyncio event loop \r\n    - A hands-on walkthrough of SQLAlchemy\u2019s asyncio extension [15 min]\r\n        - Setting up SQLAlchemy in async mode\r\n        - Performing a query and inserting it into the database\r\n        - Using ORMs in queries using asyncio\r\n\r\nFORMAT\r\nThis is an interactive tutorial where we will guide participants through the use of SQLAlchemy and ORMs to interact with a database. Participants will gain an understanding of SQLAlchemy and be well-versed enough to use it in their next project.\r\nParticipants will be working on a repository via GitHub Codespaces, and they will be building on that throughout the tutorial. The Codespaces dev environment will include all required modules and a Dockerized PostgreSQL database, enabling a seamless setup. The repository will have a branch corresponding to each section of the workshop, so participants who have trouble with a step or aren\u2019t able to finish on time can check out the corresponding branch and follow the rest of the workshop from there.\r\nWe\u2019ll start with an introduction to SQLAlchemy and its advantages. The rest of the tutorial will be hands-on. For each section, we will start by explaining the concept, then allowing participants to complete the relevant steps on the example service on their own laptops, and ask questions. We expect this to last around 10 minutes per concept. We will then give participants time to complete the steps on their own laptops and ask questions.\r\n\r\nAUDIENCE\r\nThis tutorial is for Python developers of any level who write applications that interact with databases and want to learn how to leverage a tool like SQLAlchemy to seamlessly interact with their database and manage their data in a Pythonic way.\r\nHaving a basic understanding of databases and SQL (such as inserting or reading data from a table) is sufficient. Participants should also be familiar with git and have a GitHub account, as we would use GitHub Codespaces to enable easy set-up for Python and the database. However, they do not need any prior knowledge of SQLAlchemy or ORMs, since we will explain that first. For the last part of the tutorial, it would help if attendees have some familiarity with coroutines or asynchronous programming, but it is not required, since we will be explaining these fundamental concepts first.\r\nParticipants will walk out of this tutorial having learned how to:\r\n- Use SQLAlchemy for database operations in Python, enhancing the readability and maintainability of the code\r\n- Build Python classes (ORMs) that represent the database tables\r\n- Experiment with different relationship-loading techniques to improve querying performance\r\n- Utilize SQLAlchemy\u2019s asyncio extension to interact with databases asynchronously", "recording_license": "", "do_not_record": false, "persons": [{"guid": "b4a9f981-74ee-5684-95c6-d80f10aa9840", "id": 37689, "code": "HDAQ8J", "public_name": "Rhythm Patel", "avatar": "https://pretalx.com/media/avatars/HDAQ8J_K0pflsN.jpg", "biography": "Rhythm Patel is a software engineer at Bloomberg. He is a part of Bloomberg's Python Guild, which is dedicated to aiding Python engineers, fostering innovation, creating and maintaining Python packages, as well as acting as a bridge to the wider Python community. Rhythm has spoken at PyCon UK 2023 and other internal conferences. When he\u2019s not working, you can find him playing football or tennis, traveling and hiking, or volunteering at London\u2019s Royal Parks and London Zoo.", "answers": []}, {"guid": "95f75ffb-34c2-566f-ad83-9329c805d51f", "id": 21318, "code": "SXKSL7", "public_name": "Aya Elsayed", "avatar": "https://pretalx.com/media/avatars/SXKSL7_qh71ZvR.jpg", "biography": "Aya Elsayed is a software engineer at Bloomberg. She\u2019s a leader in the company's Python Guild, which aims to support Python engineers at Bloomberg to innovate, develop Python packages, and stay connected to the broader Python community. Aya previously spoke at a few conferences, including PyCon US 2023, PyCon Italia 2023, and PyCon UK 2022, as has delivered workshops at internal and local meetups like PyLadies London. She enjoys Pilates, hiking, and trying out restaurants around London.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/WPKRCT/", "id": 41740, "guid": "4f3d291a-1a45-579f-9f44-9015287a02c6", "date": "2024-04-22T13:45:00+02:00", "start": "13:45", "logo": null, "duration": "01:30", "room": "A05-A06", "slug": "pyconde-pydata-2024-41740-build-an-ai-document-inquiry-chat-with-offline-llms", "title": "Build an AI Document Inquiry Chat with Offline LLMs", "subtitle": "", "track": "PyData: Natural Language Processing & Computer Vision", "type": "Tutorial", "language": "en", "abstract": "As we descend from the peak of the hype cycle around Large Language Models (LLMs), chat-based document inquiry systems have emerged as a high-value practical use case. Retrieval-Augmented Generation (RAG) is a technique to share relevant context and external information (retrieved from vector storage) to LLMs, thus making them more powerful and accurate.\r\n\r\nIn this hands-on tutorial, we\u2019ll dive into RAG by creating a personal chat app that accurately answers questions about your selected documents. We\u2019ll use a new [OSS project called Ragna](https://ragna.chat/en/latest/) that provides a friendly Python and REST API, designed for this particular case. We\u2019ll test the effectiveness of different LLMs and vector databases, including an offline LLM (i.e., local LLM) running on GPUs on the cloud-machines provided to you. And, we\u2019ll conclude by demonstrating how to quickly build personal or company-level chat-based document interrogation systems.", "description": "The ability to ask natural language questions and get relevant and accurate answers from a large corpus of documents can fundamentally transform organizations and make institutional knowledge accessible. Foundational LLM models like OpenAI\u2019s GPT4 provide powerful capabilities, but using them directly to answer questions about a collection of documents presents accuracy-related limitations. Retrieval-augmented generation (RAG) is the leading approach to enhancing the capabilities and usability of Large Language Models.\r\n\r\nIn this tutorial, we will learn to use RAG to build document-inquiry chat systems using different commercial and locally running LLMs. The topics we\u2019ll cover include:\r\n\r\n* **Introduction to RAG**, how it works and interacts with LLMs, and Ragna - a framework for RAG orchestration\r\n* Creating a **basic chat function** that uses popular LLMs (like GPT) answers questions about your documents, using a Python API in Jupyter Notebooks\r\n* Optimizing the chat through **experiments with different LLMs**, vector databases, context windows, and more\r\n* Running a **local LLM on GPUs** on the provided platform, and comparing its performance to commercial LLMs\r\n* Walkthrough of the **REST API for building web-apps** and user interfaces and exploration of the built-in (Panel-based) web application\r\nBy the end of this tutorial, you will have an understanding of the fundamental components that form a RAG model, and practical knowledge of open source tools that can help you or your organization explore and build on your own applications. This tutorial is designed to enable enthusiasts in our community to explore an interesting topic using some beginner-friendly Python libraries.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "1e289f7b-bd99-5631-92fb-f28eb817cdc1", "id": 24595, "code": "LLJSBE", "public_name": "Pavithra Eswaramoorthy", "avatar": "https://pretalx.com/media/avatars/LLJSBE_ORGbB9T.jpg", "biography": "Pavithra Eswaramoorthy is a Developer Advocate at [Quansight](https://quansight.com/), where she works to improve the developer experience and community engagement for several open source projects in the PyData community. Currently, she maintains the [Bokeh visualization library](https://bokeh.org/), and contributes to the [Nebari](https://www.nebari.dev/) (adjacent to the Jupyter community), [conda-store](https://github.com/conda-incubator/conda-store) (part of the conda ecosystem), and [Ragna](https://ragna.chat/) (RAG orchestration framework) projects. Pavithra has been involved in the open source community for over 5 years, notable as a maintainer of the [Dask](https://dask.org/) library and an administrator for [Wikimedia](https://www.wikimedia.org/)\u2019s OSS programs. In her spare time, she enjoys a good book and hot coffee. :)", "answers": []}, {"guid": "a84f57ac-cba3-56c8-8d1c-8a201a5ffa9c", "id": 38348, "code": "DHLJJM", "public_name": "Philip Meier", "avatar": "https://pretalx.com/media/avatars/DHLJJM_broHGLZ.jpg", "biography": "Philip is a Senior Software Engineer at Quansight. His recent worked focused on Ragna (https://ragna.chat) an OSS RAG orchestration framework.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/DSFWRC/", "id": 41706, "guid": "61e6d61d-cd2d-5339-bc46-6fb7b938c163", "date": "2024-04-22T15:35:00+02:00", "start": "15:35", "logo": null, "duration": "01:30", "room": "A05-A06", "slug": "pyconde-pydata-2024-41706-pytest-tips-and-tricks-for-a-better-testsuite", "title": "pytest tips and tricks for a better testsuite", "subtitle": "", "track": "PyCon: Testing", "type": "Tutorial", "language": "en", "abstract": "pytest lets you write simple tests fast - but also scales to very complex scenarios: Beyond the basics of no-boilerplate test functions, this training will show various intermediate/advanced features, as well as gems and tricks.\r\n\r\nTo attend this training, you should already be familiar with the pytest basics (e.g. writing test functions, parametrize, or what a fixture is) and want to learn how to take the next step to improve your test suites.\r\n\r\nIf you're already familiar with things like fixture caching scopes, autouse, or using the built-in `tmp_path`/`monkeypatch`/... fixtures: There will probably be some slides about concepts you already know, but there are also various little hidden tricks and gems I'll be showing.", "description": "We'll cover things like:\r\n\r\n- Recommended pytest settings for more strictness\r\n- What's xfail and why is it useful?\r\n- How to mark an entire test file or single parameters\r\n- Ways to deal with parametrize IDs and syntax\r\n- Useful built-in pytest fixtures\r\n- Caching for fixtures\r\n- Using fixtures implicitly\r\n- Advanced fixture and parametrization topics\r\n- How to customize fixtures behavior based on markers or custom CLI arguments\r\n- Patching, mocking, and alternatives\r\n- Various useful plugins, and how to write your own\r\n- Short intro to property-based testing with Hypothesis", "recording_license": "", "do_not_record": false, "persons": [{"guid": "97323304-0f6e-5496-a41f-38e84991e7ca", "id": 1676, "code": "BPA78X", "public_name": "Florian Bruhin", "avatar": "https://pretalx.com/media/avatars/BPA78X_7lU3SPR.jpg", "biography": "Florian Bruhin (\"The Compiler\") is a long-time contributor and maintainer of\r\nboth the pytest framework and various plugins. He discovered pytest in 2015 -\r\nsince then, he has given talks and conducted workshops about pytest at various\r\nconferences and companies. His primary project, qutebrowser (a keyboard-focused\r\nweb browser), has grown from a hobby to a donation-funded part-time job.", "answers": []}], "links": [], "attachments": [], "answers": []}]}}, {"index": 2, "date": "2024-04-23", "day_start": "2024-04-23T04:00:00+02:00", "day_end": "2024-04-24T03:59:00+02:00", "rooms": {"Kuppelsaal": [{"url": "https://pretalx.com/pyconde-pydata-2024/talk/HKFN8J/", "id": 44830, "guid": "3056dc75-4acf-56f7-ac4e-d3123f6d500e", "date": "2024-04-23T09:15:00+02:00", "start": "09:15", "logo": null, "duration": "00:45", "room": "Kuppelsaal", "slug": "pyconde-pydata-2024-44830-keynote-safe-space-or-trap-creating-software-like-duckdb-in-academic-institutions", "title": "Keynote - Safe Space or Trap? Creating Software like DuckDB in Academic Institutions", "subtitle": "", "track": "Plenary", "type": "Keynote", "language": "en", "abstract": "DuckDB is an in-process analytical data management system. DuckDB is free and open source and rather popular. It is one of the fastest growing data system to date, especially in the Python ecosystem.\u00a0DuckDB\u00a0was created at Centrum Wiskunde & Informatica (CWI) in Amsterdam, not entirely coincidentally the same place Python was created in. Later on, the we founded a commercial company, DuckDB Labs, which now drives development. In my talk, I will discuss DuckDB, its origins,\u00a0and the unique benefits and challenges of maintaining popular software in an academic setting.", "description": "DuckDB is an in-process analytical data management system. DuckDB is free and open source and rather popular. It is one of the fastest growing data system to date, especially in the Python ecosystem.\u00a0DuckDB\u00a0was created at Centrum Wiskunde & Informatica (CWI) in Amsterdam, not entirely coincidentally the same place Python was created in. Later on, the we founded a commercial company, DuckDB Labs, which now drives development. In my talk, I will discuss DuckDB, its origins,\u00a0and the unique benefits and challenges of maintaining popular software in an academic setting.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "d18f9f65-a3a5-570e-abfb-6fd778dabd4c", "id": 40283, "code": "NYMSNS", "public_name": "Hannes M\u00fchleisen", "avatar": "https://pretalx.com/media/avatars/NYMSNS_w2IDQJb.jpg", "biography": "Prof. Dr. Hannes M\u00fchleisen is a creator of the DuckDB database management system and Co-founder and CEO of DuckDB Labs, a consulting company providing services around DuckDB. Hannes is also Professor of Data Engineering at Radboud Universiteit Nijmegen. His main interest is analytical data management systems.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/ZFXZHG/", "id": 41750, "guid": "575e48ed-568f-5d29-8931-cbd5e775f8af", "date": "2024-04-23T10:30:00+02:00", "start": "10:30", "logo": null, "duration": "00:30", "room": "Kuppelsaal", "slug": "pyconde-pydata-2024-41750--the-taller-the-tree-the-harder-the-fall-determining-tree-height-from-space-using-deep-learning-and-very-high-resolution-satellite-imagery-", "title": "\ud83c\udf33 The taller the tree, the harder the fall. Determining tree height from space using Deep Learning and very high resolution satellite imagery \ud83d\udef0\ufe0f", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Stats", "type": "Talk", "language": "en", "abstract": "A case study of how we use Deep Learning based photogrammetry to calculate the height of trees from very high resolution satellite imagery. We show the substantial improvement achieved by switching from classical photogrammetric techniques to a deep learning based model (implemented in PyTorch), and the challenges we had to overcome to make this solution work.", "description": "The risk that a tree poses to line infrastructure (such as power lines) is determined by several factors, chief among them the height of the particular tree. The increasing availability of very high resolution satellite imagery makes it possible to use photogrammetric techniques to extract height information from a set of stereo satellite images. By using satellite imagery we can achieve a scale not possible by manual measurement. \r\nWe found that classical techniques perform poorly on vegetation, and were handily outperformed by deep learning based techniques implemented in PyTorch. This improvement was not trivial to achieve however, as creating labelled data in sufficient quantity was quite challenging. By increasing the quality of our height predictions we were able to more accurately calculate risk for our customers.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "98b03cf1-6f41-5387-964b-07a76d4bb9cc", "id": 37678, "code": "J9ZPD3", "public_name": "Ferdinand Schenck", "avatar": "https://pretalx.com/media/avatars/J9ZPD3_qxTDw2J.jpg", "biography": "I am a Machine Learning Engineer at LiveEO currently focused on applying Machine Learning techniques to remote sensing data.  \r\n\r\nBefore that, I did a PhD in particle physics at the Humboldt-Universit\u00e4t zu Berlin on the ATLAS experiment at CERN.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/YHMUCL/", "id": 42610, "guid": "6ec0b8f5-4593-58e2-9bc7-e46fc4514ff8", "date": "2024-04-23T11:05:00+02:00", "start": "11:05", "logo": null, "duration": "00:30", "room": "Kuppelsaal", "slug": "pyconde-pydata-2024-42610-streamlining-python-development-a-practical-approach-to-ci-cd-with-github-actions", "title": "Streamlining Python Development: A Practical Approach to CI/CD with GitHub Actions", "subtitle": "", "track": "PyCon: MLOps & DevOps", "type": "Talk", "language": "en", "abstract": "Crafting code for minimal dependencies and maximum portability is an art. This talk focuses on how continuous integration and delivery ensure project resilience to Python updates and changes in the packaging ecosystem. Setting up automation around your project enhances peace of mind, improves code maintainability, and facilitates collaboration.", "description": "The worst thing I dislike when dealing with code is encountering an error message indicating that well-crafted code, written a while ago in a language other than Bash, fails to run on the new system, new laptop, or some other operating system. It's an art to write code with minimal dependencies and maximum portability.\r\n\r\nThe complexity increases in larger projects. This is where Continuous Integration and Continuous Delivery (CI/CD) pipelines prove useful. CI/CD can help you keep the project alive even without you being around. Dependencies could be automatically updated, the code could be automatically tested, and delivered to the end-user, be it you or someone else.\r\n\r\nThis talk is about \"YAML programming\", which will help you write better Python code. The goal of the talk is to equip you with a set of building blocks to construct a CI/CD pipeline with GitHub Actions for your projects. Automating tasks as much as possible is highly beneficial.\r\n\r\nWe'll cover best practices and helpful tools for writing and debugging CI/CD pipelines. Writing YAMLs is time-consuming and error-prone; my goal is to help you spend less time on it and benefit faster from automation.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "6b68583c-3b7f-5d6c-b7d3-4c1f218f9f2b", "id": 25704, "code": "3W79QG", "public_name": "Artem Kislovskiy", "avatar": "https://pretalx.com/media/avatars/3W79QG_q4kwlnl.jpg", "biography": "I am a software engineer who codes for fun and profit. Proudly affiliated with the EuroPython Society, I am committed to share my knowledge at conference and actively contibute to Python community events. As a Pythonista I love crafting elegant and maintainable software. Beyond coding, I find joy in long-distance running and the thrill of speeding down ski slopes.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/JKWBBR/", "id": 41468, "guid": "f945ad91-69b1-5731-9a0d-20c1a464eaa9", "date": "2024-04-23T11:40:00+02:00", "start": "11:40", "logo": null, "duration": "00:30", "room": "Kuppelsaal", "slug": "pyconde-pydata-2024-41468-that-s-it-dealing-with-unexpected-data-problems", "title": "That\u2019s it?! Dealing with unexpected data problems", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Stats", "type": "Talk", "language": "en", "abstract": "Drawing on experience with multiple consulting projects, this talk shares experiences on how to deal with unexpected data problems. We are discussing how fare purely technical solutions as well as domain knowledge can be deployed to compensate for lacking data quality or quantity and when it might be better to scale down the original project scope.", "description": "And it was such a nice idea! Nearly everybody working with data has felt this sentiment at least once in their career. The promising idea for a cool new data tool meets the reality of lacking data quality or quantity.  This talk wants to provide you with some options on what else you can do in this kind of situations instead of giving up and filing the project away for the non-foreseeable future.  \r\n\r\nDrawing on experience from multiple consulting projects we are discussing what is realistically possible and how to make the most out of the limited data you might find yourself confronted with. The talk covers a brief recap of the limitations arising from unexpectedly little and/or unclean data, before moving on to share lessons learned. We are going to discuss how fare purely technical solutions might be able to provide fixes to some of the issues, before moving on to consider how domain knowledge can be deployed to compensate for lacking data quality or quantity.  Next, this talk addresses under which circumstances it makes sense to keep pursuing your original goal and when it might be better to down-size expectations. The talk concludes, by arguing that despite all the problems arising from unexpected data scarcity, potential answers to important business problems can be found in small data settings if the right questions are asked.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "dd375178-4392-5444-8085-6aa1015cef91", "id": 25894, "code": "QLJTCV", "public_name": "Simon Pressler", "avatar": null, "biography": "I hold a Masters's Degree in Comparative and International Studies from ETH-Z\u00fcrich, as well as a Data Science Master from the University of Mannheim. Since March this year I work as a full-time Data Scientist for the K\u00f6nigsweg AI GmbH after being with the team for 2,5 years part time. Also, I enjoy long-distance hiking.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/7TEYDQ/", "id": 44832, "guid": "73954641-036f-5dc3-ab26-0ed6b42f6e11", "date": "2024-04-23T13:15:00+02:00", "start": "13:15", "logo": null, "duration": "00:45", "room": "Kuppelsaal", "slug": "pyconde-pydata-2024-44832-keynote-the-art-and-science-of-tending-open-source-orchards", "title": "Keynote - The art and science of tending open source orchards", "subtitle": "", "track": "Plenary", "type": "Keynote", "language": "en", "abstract": "Over the history of free and open source software, we have gone through quite a few metaphors for open source projects: from homesteads in noosphere to puppies, roads & bridges, gardens, forests, and orchards. Regardless of the preferred comparison, we all can agree that behind every large open source project is a resilient contributor community. Is there a blueprint for it? How about a script for scaling a contributor community or a formula\u00a0for contributor retention? In this talk, I will examine all these questions and share my insight on the art and science of fostering resilient open source communities.", "description": "Inessa is building bridges between people, open science, and open source software, advocating for diversification of contribution pathways to open source and supporting its human infrastructure. She is an active contributor to the Python ecosystem (NumPy, Scientific Python, PyOpenSci, SciPy conference, PyCon US Maintainers Summit, PySWFL, PyLadies SoFlo) and broader open source (Contributor Experience Project, CHAOSS). In her role as Open Source Program Manager at OpenTeams, she leads initiatives focused on widening the contributor pipeline and bringing funding to more open source projects. Inessa is perpetually fascinated by incentive design, collaborative intelligence, and jazz.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "3cf11cc7-eff6-5e43-b3fc-616c2bfd13ad", "id": 40285, "code": "XNBC37", "public_name": "Inessa Pawson", "avatar": "https://pretalx.com/media/avatars/XNBC37_8v1OGEE.jpg", "biography": "Inessa is building bridges between people, open science, and open source software, advocating for diversification of contribution pathways to open source and supporting its human infrastructure. She is an active contributor to the Python ecosystem (NumPy, Scientific Python, PyOpenSci, SciPy conference, PyCon US Maintainers Summit, PySWFL, PyLadies SoFlo) and broader open source (Contributor Experience Project, CHAOSS). In her role as Open Source Program Manager at OpenTeams, she leads initiatives focused on widening the contributor pipeline and bringing funding to more open source projects. Inessa is perpetually fascinated by incentive design, collaborative intelligence, and jazz.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/RGWDCN/", "id": 41743, "guid": "9a0ac046-94b6-5673-bbe7-e5a0c063d75a", "date": "2024-04-23T14:10:00+02:00", "start": "14:10", "logo": null, "duration": "00:30", "room": "Kuppelsaal", "slug": "pyconde-pydata-2024-41743-robust-configuration-management-with-pydantic-s-data-validation", "title": "Robust Configuration Management with Pydantic's Data Validation", "subtitle": "", "track": "PyCon: Programming & Software Engineering", "type": "Talk", "language": "en", "abstract": "As applications grow, so do the amount of configurable features. Managing consistent defaults, maintaining user and developer documentation, and ensuring uniform parsing among a growing number of client applications can become a challenge. Adding constraints like complex fallback hierarchies and backwards compatibility, increases the probability of runtime errors. We show how [`Pydantic`'s](https://pydantic.dev/) strong data validation and integration into Python's type annotations can help building a strict specification for your configuration format, catch misconfiguration early, and mitigate the aforementioned problems with a non-formalized configuration management system.", "description": "We describe how we moved our configuration management system from a simple unstructured YAML format loaded into dictionaries into a fully formalized, typed, class-based system using [`Pydantic`'s][pydantic] data validation.\r\n\r\nWhile simple enough to begin with, we discuss the problems that emerged from the lack of tight specification of our early configuration system: Missing ahead-of-time validation and resulting runtime errors; out-of-sync code and browsable user documentation; incompatible defaults and subtle differences in various separate parsers scattered throughout many microservices; duplicated and brittle fallback logic. Using a strict specification can mitigate these issues by enabling static validation of configuration files, automatic documentation generation, centralized defaults, and flexible data transformation.\r\n\r\nAfter discussing various available configuration management systems, we explain\r\nthe motivation to hand-roll a simple system based on the data validation\r\nlibrary [`Pydantic`][pydantic]. Popularized by it's usage in [`FastAPI`][fastapi] has become the de-facto standard for data validation in Python. It's deep integration into Python's type annotation system makes it a powerful tool for configuration management.\r\n\r\nAfter an introduction into [`Pydantic`][pydantic] capabilities and usage, specifically it's features tailored to configuration management ([`pydantic.BaseSettings`][basesettings]), we share some tips-and-tricks encountered while speccing out our configuration file format. Additionally, we share some inspiration on our internal tooling to load  and validate configuration, render up-to-date browsable user documentation, integration with CI systems, and lessons learned for a incremental transition from the lose `dict`-based system to the strictly typed class-based type strict system powerd by [`Pydantic`][pydantic].\r\n\r\n[pydantic]: https://pydantic.dev/\r\n[fastapi]: https://fastapi.tiangolo.com/\r\n[basesettings]: https://docs.pydantic.dev/latest/api/pydantic_settings/", "recording_license": "", "do_not_record": false, "persons": [{"guid": "7cf9208c-5be4-50e1-bdc2-505a32703027", "id": 16171, "code": "CMUPDQ", "public_name": "Philipp Stephan", "avatar": "https://pretalx.com/media/avatars/CMUPDQ_RRDlZIa.jpg", "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/UG8THG/", "id": 42830, "guid": "abef97ab-3c4a-51da-b956-fc2ca8820db2", "date": "2024-04-23T14:45:00+02:00", "start": "14:45", "logo": null, "duration": "00:45", "room": "Kuppelsaal", "slug": "pyconde-pydata-2024-42830-unlock-the-power-of-dev-containers-build-a-consistent-python-development-environment-in-seconds-", "title": "Unlock the Power of Dev Containers: Build a Consistent Python Development Environment in Seconds!", "subtitle": "", "track": "PyCon: Programming & Software Engineering", "type": "Talk (long)", "language": "en", "abstract": "In this talk, we will explore the basic concepts of Dev Containers and demonstrate how they can support your everyday development as a Python programmer, data scientist, or machine learning engineer. With Dev Containers, you can build a consistent development environment in seconds, no matter where you are or what tools you use. And you know what? The Development Container Specification is even open source. Say goodbye to the hassle of setting up your development environment from scratch every time you start a new project!\r\n\r\nWe will start with a basic example and discuss how to set up a consistent Python development environment, including best practices for package management and GPU support. After this talk, you will be able to leverage the advantages of Dev Containers, allowing you to work from anywhere and be ready in seconds.\r\n\r\nIf you're tired of wasting time setting up your development environment and want to unlock the power of Dev Containers, then this talk is a must-attend for you!", "description": "In this talk, we will explore the basic concepts of Dev Containers and demonstrate how they can support your everyday development as a Python programmer, data scientist, or machine learning engineer. With Dev Containers, you can build a consistent development environment in seconds, no matter where you are or what tools you use. And you know what? The Development Container Specification is even open source. Say goodbye to the hassle of setting up your development environment from scratch every time you start a new project!\r\n\r\nWe will start with a basic example and discuss how to set up a consistent Python development environment, including best practices for package management and GPU support. After this talk, you will be able to leverage the advantages of Dev Containers, allowing you to work from anywhere and be ready in seconds.\r\n\r\nIf you're tired of wasting time setting up your development environment and want to unlock the power of Dev Containers, then this talk is a must-attend for you!", "recording_license": "", "do_not_record": false, "persons": [{"guid": "861c2c2a-a76c-5d0c-9687-2c1bc18ed4ed", "id": 25800, "code": "F8ANLQ", "public_name": "Thomas Fraunholz", "avatar": "https://pretalx.com/media/avatars/F8ANLQ_O8VbXoB.png", "biography": "Meet Thomas, a passionate advocate for science, particularly in the realm of applied mathematics. Following his doctoral studies, he embarked on a journey into the world of embedded programming, where his affinity for DevOps took root. His enduring passion for crunching numbers ultimately led him to the fascinating field of artificial intelligence, where he's now an acknowledged MLOps expert, seamlessly integrating machine learning into operations.\r\n\r\nThomas has an impressive track record as a leader, having overseen two publicly funded open-source research programs in the field of AI, in collaboration with the German Aerospace Center. Today, he is at the forefront of AI-driven cybersecurity research at Smart Cyber Security GmbH and working on his low-budget bark beetle detection drone project \u2013 a testament to his enduring fascination with embedded systems.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/PLJKUH/", "id": 47678, "guid": "fbbbb1ad-ff7d-5651-a060-13416af5d2eb", "date": "2024-04-23T16:00:00+02:00", "start": "16:00", "logo": null, "duration": "01:00", "room": "Kuppelsaal", "slug": "pyconde-pydata-2024-47678-community-conferences-under-the-hood-perspectives-and-best-practices-in-volunteer-organization", "title": "Community Conferences under the Hood. Perspectives and Best Practices in Volunteer Organization", "subtitle": "", "track": "General: Community, Diversity, Career, Life and everything else", "type": "Panel", "language": "en", "abstract": "PyCon DE & PyData Berlin is volunteer run. This session aims to underscore the significant role that volunteer organization plays in cultivating environments of authenticity, inclusion, and diversity within tech communities.", "description": "Through a combination of individual presentations and interactive discussions, the panel will explore the challenges and triumphs of community organization. This session is designed not just for current and aspiring community leaders but for anyone passionate about fostering an inclusive, collaborative tech ecosystem.\r\n\r\nThis panel brings together seasoned community organizers from diverse backgrounds to share their insights, experiences, and best practices in building and nurturing inclusive communities. \r\n\r\nJoin us in this empowering session to discover how you can contribute to a more inclusive, diverse, and vibrant Python community through effective volunteer organization. Together, we can drive positive change and ensure that our communities remain strong, supportive, and forward-moving.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "e61ae96e-6f0d-5312-867d-6bf04eefb64f", "id": 228, "code": "8F38DV", "public_name": "Alexander CS Hendorf", "avatar": "https://pretalx.com/media/avatars/8F38DV_0aO0cup.jpg", "biography": "Alexander Hendorf is responsible for data and artificial intelligence at a boutique consultancy in Germany. He has many years of experience in the practical application, introduction and communication of data and AI-driven strategies and decision-making processes.\r\nThrough his commitment as a speaker and chair of various international conferences as PyConDE & PyData Berlin, he is a proven expert in the field of data intelligence. He's been appointed Python Software Foundation and EuroPython fellow for this various contributions. Currently he is sitting board member of Python Software Verband (Germany) and the EuroPython Society (EPS). Currently he's building Pioneers Hub - a new non-profit organisation to support tech-communities.", "answers": []}, {"guid": "9644be56-2861-536d-a28b-6c0fa0803f32", "id": 32158, "code": "VWGMRU", "public_name": "Lais Carvalho", "avatar": "https://pretalx.com/media/avatars/VWGMRU_gJZTyT1.jpeg", "biography": "La\u00eds Carvalho is an active member of the Python community. She was the first black female board member of Python Ireland, and a core-organiser of HumbleData, a non-profit organisation focused on mentoring underrepresented minorities on Python and Data Science. Currently, La\u00eds is the only female board member of the EuroPython Society.\r\nSeasoned speaker and ex-developer advocate, La\u00eds is passionate about leadership and volunteering. She works as a Site Reliability Engineer at Workday Inc. building monitoring tools.\r\n\r\nLais is excited about food, documentation, and communication. Her main core values are courage and kindness.", "answers": []}, {"guid": "7f795ab1-a24e-5f6b-8262-5e89f1034d32", "id": 2333, "code": "WLCBPE", "public_name": "Valentina Scipione", "avatar": "https://pretalx.com/media/avatars/WLCBPE_wEbbOP8.png", "biography": "Valentina Scipione is an active member of the PyData community, serving as on-site committee member and volunteer coordinator of PyConDE & PyData Berlin conference, as well as a committee member of the Pydata Berlin chapter. \r\nShe is Software Engineering Manager at Planet, a pioneering company founded by ex-NASA scientists dedicated to using space technology to help life on Earth. Valentina is responsible for ensuring the quality and reliability of the platform which empowers customers to access high-resolution satellite data on demand by tasking satellites directly.\r\nShe is passionate about leadership, coaching and volunteering, and in her spare time she enjoys pole dancing and spending time with her three feline companions.", "answers": []}, {"guid": "d8a2dd67-d397-54f5-88e9-b2c680fb4e5c", "id": 102, "code": "8LQU9C", "public_name": "Florian Wilhelm", "avatar": "https://pretalx.com/media/avatars/8LQU9C_vv210Xj.jpg", "biography": "Florian is Head of Data Science & Mathematical Modeling at inovex GmbH, an IT project center driven by innovation and quality, focusing its services on \u2018Digital Transformation\u2019.", "answers": []}], "links": [], "attachments": [], "answers": []}], "B09": [{"url": "https://pretalx.com/pyconde-pydata-2024/talk/JRRET3/", "id": 44946, "guid": "d3d38e03-f82d-58aa-bed8-46334ce2a5c1", "date": "2024-04-23T10:30:00+02:00", "start": "10:30", "logo": null, "duration": "00:30", "room": "B09", "slug": "pyconde-pydata-2024-44946-build-a-personalized-bitcoin-btc-virtual-assistant-in-python-with-hopsworks-and-llm-function-calling", "title": "Build a personalized Bitcoin (BTC) virtual assistant in Python with Hopsworks and LLM function calling", "subtitle": "", "track": "Sponsor", "type": "Sponsored Talk", "language": "en", "abstract": "The human ambitious desire to get rich without effort has been a major driving force\r\nbehind the popularity of cryptocurrencies like Bitcoin and Ethereum. However, their high\r\nvolatility makes them too unpredictable, and keeping track of our investment gains and\r\nlosses over time can be tedious, if not boring.\r\n\r\nIn this talk, we will define the different components necessary to build a personalized\r\nBitcoin (BTC) virtual assistant in Python. The assistant will help you analyze your\r\ntransaction history, estimate future BTC prices, and calculate the future value of your\r\nholdings based on these predictions. It will be powered by LLMs and will make use of a\r\nrecent technique called Function Calling to recognize the user intent from the\r\nconversation history.", "description": "The human ambitious desire to get rich without effort has been a major driving force\r\nbehind the popularity of cryptocurrencies like Bitcoin and Ethereum. However, their high\r\nvolatility makes them too unpredictable, and keeping track of our investment gains and\r\nlosses over time can be tedious, if not boring.\r\n\r\nIn this talk, we will define the different components necessary to build a personalized\r\nBitcoin (BTC) virtual assistant in Python. The assistant will help you analyze your\r\ntransaction history, estimate future BTC prices, and calculate the future value of your\r\nholdings based on these predictions. It will be powered by LLMs and will make use of a\r\nrecent technique called Function Calling to recognize the user intent from the\r\nconversation history.\r\n\r\nThe ML system will be built in Python, following the best practices of the FTI\r\n(feature/training/inference) pipeline architecture, on top of the open-source Hopsworks\r\nplatform which will provide the necessary ML infrastructure such as a feature store,\r\nmodel serving, and a model registry.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "65c45afd-14d2-509d-b5e2-901653149804", "id": 41549, "code": "RUSBZM", "public_name": "Javier de la R\u00faa Mart\u00ednez", "avatar": "https://pretalx.com/media/avatars/RUSBZM_dfKQ355.png", "biography": "Javier is a Research Engineer at Hopsworks where he actively contributes to advancing the\r\nHopsworks Feature Store platform. He is currently pursuing his Ph.D. at KTH Royal Institute of\r\nTechnology in Sweden with a primary focus on large-scale machine learning systems", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/KXU7Q8/", "id": 41053, "guid": "f3bbaa41-c7ad-5b97-a0d2-0e1267b2d219", "date": "2024-04-23T11:05:00+02:00", "start": "11:05", "logo": null, "duration": "00:30", "room": "B09", "slug": "pyconde-pydata-2024-41053-missing-data-bayesian-imputation-and-people-analytics-with-pymc", "title": "Missing Data, Bayesian Imputation and People Analytics with PyMC", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Stats", "type": "Talk", "language": "en", "abstract": "We demonstrate a range of different approaches to missing data imputation in employee engagement survey data. Contrasting frequentist style full-information maximum likelihood approaches with more direct Bayesian imputation and chained equation methods, we highlight how the different assumptions regarding the missing-data license different inferences about the imputed values and ultimately the plausible causal narratives which can be expressed in PyMC. In particular we avail of the hierarchical nature of employee engagement data to justify a hierarchical approach to justifying the (MAR) missing-at-random assumption for imputation schemes in People Analytics.", "description": "There is no \"agnostic statistics\" when approaching the question of missing data. Theory quickly breaks against reality in the context people-analytics.  All imputation schemes need to justify their assumptions of \"strong-ignorability\" or \"missing-at-random\" reasons for missing data. This is easier and cleaner in a Bayesian setting than in frequentist alternatives. This transparency is important when dealing with HR data. We will demonstrate both full information maximum likelihood (FIML) and Bayesian imputation by chained equation approaches to the imputation of missing data in the context of employee engagement survey data.\r\n\r\nWe will use the probabilistic programming language PyMC to articulate the structures and conditional probabilities around missing data in hierarchical organisations. Non-response bias in engagement survey data often corrupts the overall picture of organisational health and modelling of the non-response bias helps uncover patterns or trends in the patterns of missing-ness.  These insights can be used diagnostically to locate the source of problems within the organisation, but we need to be willing to commit to the assumptions that license genuine causal inference.  In this way we present the problem of missing-data as a gate-way to an organisational focus on causal inference problems. Somewhat ironically, the lack of data can actually makes the problems of causal inference more concrete for business stakeholders.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "a1e45954-a39c-523b-93be-ac53df0ad728", "id": 37148, "code": "GB9KHE", "public_name": "Nathaniel Forde", "avatar": "https://pretalx.com/media/avatars/GB9KHE_duKBduV.jpeg", "biography": "I'm a data scientist from Dublin, working at Personio on a range of revenue or customer focused areas. Previously I worked with CarTrawler on pricing and insurance risk modelling, and with Marsh and McLennan in areas of re-insurance and catastrophic risk. Before this i worked in Paddy Power Betfair on models of risk indicators for gambling as part of a responsible gambling initiative. I''m broadly interested in problems of risk and confounding.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/H3X3AX/", "id": 42907, "guid": "8e2d06d3-847d-503b-b8b7-bc417d2e244d", "date": "2024-04-23T11:40:00+02:00", "start": "11:40", "logo": null, "duration": "00:30", "room": "B09", "slug": "pyconde-pydata-2024-42907-tackling-the-cold-start-challenge-in-demand-forecasting", "title": "Tackling the Cold Start Challenge in Demand Forecasting", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Stats", "type": "Talk", "language": "en", "abstract": "In this talk, we address the Cold Start problem in Demand Forecasting, focusing on scenarios where historical data is scarce or nonexistent. This constitutes a common situation in practice, such as with the launch of new products in Retail. However, many Time Series and Machine Learning models encounter difficulties in handling this challenge, primarily due to their dependence on a substantial amount of historical data for effective training and prediction.\r\n\r\nWe begin by providing an overview of established techniques used to address the Cold Start problem, including methods like padding, feature engineering, and leveraging item similarities. Additionally, we explore more recent advancements and emerging research, such as Transfer Learning for Time Series.\r\n\r\nWhile each technique presents its unique set of trade-offs, the challenge lies in determining the most suitable approach for a given dataset or use case. This aspect is often not widely understood, and our goal is to unravel this complexity by offering practical insights. Furthermore, we introduce a practical framework for systematically evaluating different forecasting strategies within the Cold Start setting, guiding you in selecting the most suitable approach for your datasets and use cases.", "description": "In this talk, we address the Cold Start problem in Demand Forecasting, focusing on scenarios where historical data is scarce or nonexistent. This constitutes a common situation in practice, such as with the launch of new products in Retail. However, many Time Series and Machine Learning models encounter difficulties in handling this challenge, primarily due to their dependence on a substantial amount of historical data for effective training and prediction.\r\n\r\nWe begin by providing an overview of established techniques used to address the Cold Start problem, including methods like padding, feature engineering, and leveraging item similarities. Additionally, we explore more recent advancements and emerging research, such as Transfer Learning for Time Series.\r\n\r\nWhile each technique presents its unique set of trade-offs, the challenge lies in determining the most suitable approach for a given dataset or use case. This aspect is often not widely understood, and our goal is to unravel this complexity by offering practical insights. Furthermore, we introduce a practical framework for systematically evaluating different forecasting strategies within the Cold Start setting, guiding you in selecting the most suitable approach for your datasets and use cases.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "95bd73d4-cfff-5c1b-8825-fecf4d8a0c22", "id": 38892, "code": "PPHRHL", "public_name": "Alexander Meier", "avatar": "https://pretalx.com/media/avatars/PPHRHL_NV9vlrD.jpeg", "biography": "I\u2019m an experienced Data Scientist with a strong background in Software Engineering and a PhD in Mathematical Statistics. I\u2019m interested in Machine Learning, ML Engineering and Time Series Analysis.", "answers": []}, {"guid": "12dbdf2f-c5fa-5e59-b0bf-0f81a1c2c1aa", "id": 40149, "code": "3X3SEN", "public_name": "Daria Mokrytska", "avatar": "https://pretalx.com/media/avatars/3X3SEN_zhaZzuO.jpeg", "biography": "Data Scientist from Heidelberg, Germany. The central focus of my work is time series forecasting, with a specific emphasis on forecasting demand. Before my current role, I gained experience as a Research Assistant  focusing on astrophysics and data analysis.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/RD9SU8/", "id": 39508, "guid": "410dcc94-8ab1-507d-9816-15cdb6cedbbf", "date": "2024-04-23T14:10:00+02:00", "start": "14:10", "logo": null, "duration": "00:30", "room": "B09", "slug": "pyconde-pydata-2024-39508-content-recommendation-with-graphs-from-basic-walks-to-neural-networks", "title": "Content Recommendation with Graphs: From Basic Walks to Neural Networks", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Stats", "type": "Talk", "language": "en", "abstract": "Discover how graph algorithms are transforming content recommendation in this insightful talk. We'll journey from the basics of graph-based models, exploring simple graph walks, to the cutting-edge realm of Graph Neural Networks. Uncover the power of graph embeddings and learn when graph-based approaches excel in recommender systems.", "description": "In this talk, we'll explore how the complex problem of content recommendation transforms when viewed through the innovative lens of graph algorithms.\r\n\r\nImagine a world where content and users form a bi-partite graph, and the key to unlocking personalized recommendations lies in predicting links and weights within this graph. We'll embark on a journey starting from the foundational graph-based recommender models, where simple graph walks lay the groundwork. \r\n\r\nAs we delve deeper, we'll uncover the potent capabilities of graph embeddings and the transformative impact of Graph Neural Networks. \r\n\r\nFinally, we'll wrap up with valuable insights on the scenarios where graph-based approaches shine the brightest in solving recommender problems. Whether you're a seasoned data scientist or new to the field of machine learning, this talk will equip you with a fresh perspective on leveraging graphs for sophisticated and effective content recommendation strategies.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "b1bd6639-4916-533d-846c-cc832a5656a1", "id": 36847, "code": "7KW8LE", "public_name": "Dr. Mirza Klimenta", "avatar": "https://pretalx.com/media/avatars/7KW8LE_M4wWaj0.jpg", "biography": "Mirza Klimenta received his PhD in Computer Science from the University of Konstanz (Germany) at age 25. While in academia, Mirza worked in the fields of dimension reduction and graph embedding, and his work has been recognized by the scientific community. As a (Senior) Data Scientist, Mirza focuses on Recommender Systems and Algorithm Engineering. His most notable work is in the design and implementation of a Recommender System powering ARD Audiothek, one of the most popular audio-on-demand platforms in Germany. He is also a writer of literary fiction.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/7J7LEB/", "id": 41608, "guid": "85964709-6c19-5849-b62f-a9709d643698", "date": "2024-04-23T14:45:00+02:00", "start": "14:45", "logo": null, "duration": "00:45", "room": "B09", "slug": "pyconde-pydata-2024-41608-personalizing-carousel-ranking-on-wolt-s-discovery-page-a-hierarchical-multi-armed-bandit-approach", "title": "Personalizing Carousel Ranking on Wolt's Discovery Page: A Hierarchical Multi-Armed Bandit Approach", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Stats", "type": "Talk (long)", "language": "en", "abstract": "Wolt's Discovery page serves as the primary gateway for millions of weekly users exploring diverse cuisines and products. With over 130,000 merchants in 25 countries, presenting relevant content poses a unique challenge. In this presentation, we address the complexities of personalizing the Discovery page using a hierarchical multi-armed bandit (MAB) approach built on the Python ecosystem. We outline the challenges specific to an expansive online delivery platform, introducing our MAB solution that incorporates hierarchical parameters at user, segment, city, and country levels. Leveraging Thompson Sampling for exploration and exploitation, our approach accommodates data sparsity challenges. Evaluation results, both offline and online, showcase the effectiveness of our solution. The talk concludes with insights into the resilient, scalable, and adaptive architecture underpinning our approach, featuring open-source libraries such as mlflow, Flyte, and Seldon Core. Our learnings and future steps toward a personalized, context-aware Discovery page cap off the presentation. Join us as we navigate the intricacies of recommendation challenges in the dynamic world of quick commerce.", "description": "Wolt's Discovery page is the main entrance point for millions of weekly users seeking to explore new cuisines, order their favorite dish, or replenish their fridge's stock. The Discovery page is a vertical collection of multiple modules (carousels) which can stem from automatic and curated mechanisms. It features restaurants, retail venues, individual items and dishes along with a broad set of banners.\r\nWolt consumers have distinct tastes and preferences - all of which can change over time and vary with context. However, they expect Wolt to show what's relevant to them and to be able to discover - coupled with a frictionless experience. We want to satisfy our users, keep them engaged and grow our customer base around the world. Wolt delivery covers over 130.000 merchants in more than 500 cities across 25 countries, which results in a substantial variety and size of content Wolt has to offer its customers. Ranking the most relevant carousels at the top is a key challenge to solve so that our users find what they want fast. This renders personalizing the Discovery page as a key lever. Personalized carousel ranking presents a major recommendation challenge across many different domains like content streaming, ecommerce or quick commerce.\r\n\r\nIn our talk, we present a hierarchical multi-armed bandit (MAB) solution for personalizing the ranking of carousels on Wolt\u2019s Discovery page which is built on top of the Python ecosystem. Therefore, we first illustrate the specific challenges of an (almost) everything online delivery platform and our goals for Wolt's Discovery page. Second, we present our MAB-approach which combines a novel hierarchical parameterization of bandits on user-, segment-, city- and country-level with classical Thompson Sampling for exploration and exploitation. This approach caters well to the challenge of data sparsity. We also share the offline and online evaluation results of our approach. Lastly, we illustrate the architecture to make this solution resilient, scalable and adaptive. Our architecture is built on top of well-known open source libraries. We\u2019re leveraging mlflow for tracking and lineage, Flyte for ML workflows, Redis for serving features, and Seldon Core for serving user requests online fast and reliably. We will wrap up our talk with our learnings and an outlook for the next steps in our journey towards a personalized, context-aware, and controllable Discovery page.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "cb103ab8-3502-58f2-9460-23bbc62c88b4", "id": 15985, "code": "G8LVXX", "public_name": "Marcel Kurovski", "avatar": "https://pretalx.com/media/avatars/G8LVXX_7RBfPsG.jpg", "biography": "Senior Data Scientist in Wolt's Personalization Team working on Carousel Ranking and Personalized Recommendations. Show Host of Recsperts - Recommender Systems Experts, the Podcast Show with industry and academia experts in Recommender Systems. Building Recommenders and Personalization Solutions with Python for various industries since 7+ years as well as creator and instructor of Python RecSys Training.", "answers": []}, {"guid": "15236e22-134a-5bd9-886e-901ae6a39345", "id": 38305, "code": "8BCJ9W", "public_name": "Steffen Klempau", "avatar": "https://pretalx.com/media/avatars/8BCJ9W_s5i5O9o.jpg", "biography": "After graduating with a business information systems degree from HS Fresenius, Steffen started working as a Data Engineer at Fielmann (Germany's biggest optician), building their internal data platform. In 2021 he joined Capgemini as an IT consultant working in various roles and projects for a multitude of clients - mostly enterprises. In 2023 Steffen joined Wolt as a Machine Learning Engineer in the personalisation team as one of Wolt's first embedded MLEs. Steffen is also a co-organiser of the ML Ops Community Berlin.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/CMMJPN/", "id": 44670, "guid": "2b1032ae-47b7-535e-980d-3e0cd0ca0665", "date": "2024-04-23T16:00:00+02:00", "start": "16:00", "logo": null, "duration": "00:30", "room": "B09", "slug": "pyconde-pydata-2024-44670-time-series-anomaly-detection-with-a-human-in-the-loop", "title": "Time series anomaly detection with a human-in-the-loop", "subtitle": "", "track": "General: Industry & Academia Use-Cases", "type": "Sponsored Talk", "language": "en", "abstract": "In the cross-industry wide trend towards industry 4.0 solutions, the amount of gathered sensor data is ever growing. Through the sheer amount of data, manual or human-based monitoring of the collected time series data becomes cumbersome if not even impossible. Yet, careful inspection of the time series data and identification of possible anomalies therein is crucial to detect problems in the underlying processes. To resolve this demand, ZEISS is developing a fully automated time series processing tool that performs ML based time series anomaly detection with a human-in-the-loop.", "description": "Starting from a completely unlabelled dataset, unsupervised anomaly detection is performed. Identified anomaly candidates are presented via a web app to domain experts, who can judge whether the identified time series segments are indeed abnormal or are expected behaviour, i.e., false positives generated by the anomaly detection. The domain-expert\u2019s feedback is stored to create a partially labelled dataset. The intended benefits from storing the collected labels are: 1) Metrics can be generated that allow to evaluate the performance of the initially unsupervised anomaly detection run. 2) The number of false positives generated by the algorithm, i.e., time series segments that were incorrectly flagged as anomaly, can be reduced via pattern matching. 3) Based on a partially labelled dataset more domain problem specific methods might be applied such as semi-supervised anomaly detection or time series classification.\r\nThe framework uses open source tools and all its components, i.e., data pipelines, anomaly detection, web app, are deployed to the cloud.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "8adc842a-8504-5c4f-a413-ee83f0b84191", "id": 40156, "code": "8SJQLN", "public_name": "Philipp Millet", "avatar": "https://pretalx.com/media/avatars/8SJQLN_Xhktli0.jpeg", "biography": "With a background in particle physics, Philipp Millet has been working as Data Scientist for HotSprings/Umlaut/Accenture in various projects and domains. In 2023 he joined ZEISS Digital Partners as a Machine Learning Engineer. His focus is getting Data Science projects from a PoC stage into production.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/BNFLZB/", "id": 47974, "guid": "9a0a7723-0162-5469-a4c3-556b0acf9529", "date": "2024-04-23T16:35:00+02:00", "start": "16:35", "logo": null, "duration": "00:30", "room": "B09", "slug": "pyconde-pydata-2024-47974-cloud-no-thanks-i-m-gonna-run-genai-on-my-ai-pc", "title": "Cloud? No Thanks! I\u2019m Gonna Run GenAI on My AI PC", "subtitle": "", "track": "PyData: Generative AI", "type": "Sponsored Talk", "language": "en", "abstract": "In this speech, we want to introduce an AI PC, a single machine that consists of a CPU, GPU, and NPU (Neural Processing Unit) and can run GenAI in seconds, not hours. Besides the hardware, we will also show the OpenVINO Toolkit, a software solution that helps squeeze as much as possible out of that PC. Join our talk and see for yourself the AI PC is good for both generative and conventional AI models. All presented demos are open source and available on our GitHub.", "description": "In a world dominated by cloud computing, there's a growing demand for harnessing the power of PCs and edge devices for AI needs. After all, all computers connected have more power than any cloud. Hence, in this speech, we want to introduce an AI PC, a single machine that consists of a CPU, GPU, and NPU (Neural Processing Unit) and can run GenAI in seconds, not hours. Besides the hardware, we will also show the OpenVINO Toolkit, a software solution that helps squeeze as much as possible out of that PC. Join our talk and see for yourself the AI PC is good for both generative and conventional AI models. The demos we will present are open source, so feel free to try them at home. Let's paint your dreams together!", "recording_license": "", "do_not_record": false, "persons": [{"guid": "a2128224-f3de-53b7-973d-7f0849fb7cbf", "id": 17353, "code": "MWDHXR", "public_name": "Adrian Boguszewski", "avatar": "https://pretalx.com/media/avatars/MWDHXR_AldGcZh.jpg", "biography": "AI Software Evangelist at Intel. Adrian graduated from the Gdansk University of Technology in the field of Computer Science 7 years ago. After that, he started his career in computer vision and deep learning. As a team leader of data scientists and Android developers for the previous two years, Adrian was responsible for an application to take a professional photo (for an ID card or passport) without leaving home. He is a co-author of the LandCover.ai dataset, creator of the OpenCV Image Viewer Plugin, and a Deep Learning lecturer occasionally. His current role is to educate people about OpenVINO Toolkit. In his free time, he\u2019s a traveler. You can also talk with him about finance, especially investments.", "answers": []}, {"guid": "c7ed2e43-3ff1-5f19-9d9a-313a5884c08f", "id": 43063, "code": "HRBPPP", "public_name": "Dmitriy Pastushenkov", "avatar": "https://pretalx.com/media/avatars/HRBPPP_ToBeDBt.jpg", "biography": "Dmitriy Pastushenkov is a passionate AI PC Evangelist at Intel Germany\u00a0\u00a0with more than 20 years of comprehensive and international experience in the industrial automation, industrial Internet of Things (IIoT) and real-time operating and AI. Dmitriy has held various roles in software development and enablement, software architecture and technical management.\u00a0\u00a0\r\nDmitriy started the career at Intel in 2022 as a Software Architect. He works on the enablement and optimization of real-time, functional safety and AI workloads on the smart edge applying innovative Intel technologies and software products.\u00a0 Currently,\u00a0 as an AI PC Evangelist\u00a0Dmitriy focuses on the AI Software Stack of AI PC, including OpenVINO.\r\nDmitriy has a Master\u2019s degree in Computer Science from Moscow Power Engineering Institute (Technical University).", "answers": []}], "links": [], "attachments": [], "answers": []}], "B07-B08": [{"url": "https://pretalx.com/pyconde-pydata-2024/talk/EMZ7L7/", "id": 40110, "guid": "a8baed31-8a0f-5a92-aba5-f062c06c1c76", "date": "2024-04-23T10:30:00+02:00", "start": "10:30", "logo": null, "duration": "00:30", "room": "B07-B08", "slug": "pyconde-pydata-2024-40110-unleashing-confidence-in-sql-development-through-unit-testing", "title": "Unleashing Confidence in SQL Development through Unit Testing", "subtitle": "", "track": "PyCon: Programming & Software Engineering", "type": "Talk", "language": "en", "abstract": "As the landscape of data-driven applications expands, the need for robust SQL development practices becomes increasingly critical. This conference talk addresses the challenges faced by data teams in maintaining and evolving complex SQL models for their Data Warehouses, and shows how unit testing can play a vital role in ensuring data quality.\r\n\r\nWe will delve into the significance of SQL unit testing, highlighting its ability to quickly validate modeling logic and making sure that modifications do not break existing behavior. With the ease of mind of an automatically verified SQL logic, changes to existing data models can be shipped with confidence, ultimately contributing to faster deployment cycles.\r\n\r\nGet detailed insights on  the structure and functionality of Lotum\u2019s SQL unit testing framework, built in Python using pytest and tailored for BigQuery. With Lotum processing millions of events from mobile games every day, explore how this robust framework allows for efficient testing, ensuring the accuracy of the SQL logic. Learn how test cases with small sets of static mock data can be defined effortlessly so that they help pinpoint potential code errors easily.", "description": "The conventional approach to data model development frequently involves a repetitive cycle: crafting a query, executing it, examining a portion of the result, and iterating through the process with each subsequent query modification. This method becomes particularly challenging when dealing with the evolution of mature, extensively-used data models, where multiple developers collaborate without sufficient testing. In such scenarios, the iterative nature of this process poses significant risks, potentially leading to overlooked errors and compromised data quality.\r\n\r\nThe talk showcases the tangible benefits of having a well-designed unit testing framework, providing ease of mind to developers working collaboratively on the same model, and enabling the early detection of hard-to-spot errors before deployment.\r\n\r\nDuring the development of new data models and during the integration of new data sources, the absence of large amounts of production data makes verification of the model outputs difficult - clearly defined tests for scenarios not yet observed in production play a crucial role in overcoming this hurdle. SQL unit testing becomes especially relevant when refactoring existing data models and can be very helpful to ensure the logic is unchanged, even for edge cases.\r\n\r\nI outline the requirements for an effective SQL unit testing framework, emphasizing the use of the database or query engine to verify SQL statement correctness without persisting any data in the database. The presented framework supports the definition of atomic test cases, where each test case consists of minimal input datasets and expected output datasets and it is verified if the output of the query when run on the defined inputs matches the expected output.\r\n\r\nThe practical implementation of a SQL unit testing framework will be shared in detail, by giving insights into Lotum\u2019s pytest-based SQL unit testing framework and demonstrating how a test case for a SQL statement with mock data can be built effortlessly with minimal code redundancy. \r\n\r\nInternal workings of the framework will be explained, including the mechanics to define and run a unit test: By injecting mock data into an existing SQL statement, replacing references to production tables by the injected mock data, and executing the resulting fully-static statements in the query engine, the framework evaluates the transformed data against expected outputs. This way, the correctness of the query can be verified on a case-by-case basis without manually modifying the query code itself.\r\n\r\nAttendees will leave the session with a deep understanding of the importance of SQL unit testing, equipped with insights into building an effective framework, defining test cases, and ensuring data model robustness. The talk provides a roadmap for data teams to embrace a test-driven development approach, enhancing code quality, and fostering a culture of confident SQL development.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "988edae9-b784-556d-828a-35063416916c", "id": 37529, "code": "PMZLKB", "public_name": "Tobias Lampert", "avatar": "https://pretalx.com/media/avatars/PMZLKB_gSpfqac.png", "biography": "I'm an experienced technical leader with expertise in Data Science and Data Engineering. For over 20 years I have been designing and implementing data-intensive applications end-to-end, from data ingestion to deployment and have build solutions which generate insight from data using statistical analysis and machine learning. My passion is building user-friendly, high-performance and cost-efficient data platforms.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/Z3FALV/", "id": 42936, "guid": "a0cb0a0c-312f-5fc3-919a-a22f4fc5fa4e", "date": "2024-04-23T11:05:00+02:00", "start": "11:05", "logo": null, "duration": "00:30", "room": "B07-B08", "slug": "pyconde-pydata-2024-42936-green-software-engineering", "title": "Green Software Engineering", "subtitle": "", "track": "PyCon: Programming & Software Engineering", "type": "Talk", "language": "en", "abstract": "Did this question ever cross your mind that how green software engineering can help in environment sustainability? \r\nMy talk will answer this exact question. \uf04a\r\nMy passion for nature and love for technology pushed me into this topic. \r\nThe way global warming is affecting us is one of the biggest concern of so many people around the world. The focus is to educate people about how they can play their role in protecting the environment by just using their laptop or computers in the right possible way. \r\nOne of the biggest questions is to deal with the gas emissions and control it but how software engineering can help in all of this? \r\nThe complete cycle of the Software Engineering should be designed and implemented in such a way that it incorporates environment sustainability without affecting the economic benefits. It is a win win situation. We need more environment sustainable mobile and web applications.", "description": "The rapid growth of digital economy, production of software products demands a more sustainable way to deal with global warming issues. All of the tech industry is contributing to the growth of carbon footprints and we need to handle it efficiently. \r\nI will focus on the life cycle of Software Engineering and also explain how they can incorporate green software engineering into practice, from requirement engineering to the end product in the whole cycle. Further digging deeper into the following topics:\r\n\u2022\tGreen Requirement Engineering \r\n\u2022\tGreen Architecture and Design \r\n\u2022\tGreen Coding \r\n\u2022\tOptimization of Infrastructure\r\n\u2022\tGreen Usage of software products \r\nThe development of software products should be in such a way that it decrease carbon, increase efficiency and lower carbon intensity. The choice of coding language should be based upon time, complexity and resource usage so we can incorporate green coding. Participate in electronic recycling programs and shift your previous infrastructure to the services such as cloud to decrease resources usage. \r\nWhen it comes to green usage of the software products then never leave your laptops and systems on sleep as it also increase the carbon footprints.  \r\nIn the end of the talk people will be able to practice some green computing concepts in their everyday life.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "d0dc035d-2aba-59df-ae95-4f8c9b44a6d4", "id": 38907, "code": "NXHWGL", "public_name": "Farah", "avatar": "https://pretalx.com/media/avatars/NXHWGL_CKS40Em.jpg", "biography": "A passionate Software Engineer who is looking to connect the dots between environment and technology to make a sustainable world. I am working as a Software Engineer at Smartmirco Braunschweig.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/8W7RPP/", "id": 42852, "guid": "cc47e5b9-5414-567c-bb96-a27e4ba869d8", "date": "2024-04-23T11:40:00+02:00", "start": "11:40", "logo": null, "duration": "00:30", "room": "B07-B08", "slug": "pyconde-pydata-2024-42852-building-professional-voice-ai-with-vocode", "title": "Building Professional Voice AI with Vocode", "subtitle": "", "track": "PyData: Natural Language Processing & Computer Vision", "type": "Talk", "language": "en", "abstract": "Dive into the world of AI voice agents with Vocode, the leading framework for creating interactive, voice-based AI assistants. In this talk, we'll explore how Vocode integrates speech-to-text, response generation, and speech synthesis APIs to create agents that not only speak but also understand and adapt to the nuances of human conversation. We'll discuss the challenges of teaching these agents the etiquette of real conversations, such as knowing when to pause, not interrupt, and conclude interactions. Plus, we'll showcase Vocode's LLM function-calling feature through a practical example: real-time appointment booking. Join us to uncover the secrets behind building AI voice agents that are as engaging and efficient as they are innovative.", "description": "The AI open-source package Vocode (https://github.com/vocodedev/vocode-python) has emerged as a leader in creating AI voice agents since May 2023. These are the interactive voices on the other end of the phone, ready to assist with various tasks. My journey with Vocode began in August while developing a commercial platform that allows for no-code creation of voice agents utilizing Vocode's capabilities.\r\nThis presentation delves into the intricacies of Vocode. It's not just about voice; it's about crafting an experience. The framework seamlessly integrates external APIs for speech-to-text conversion, Large Language Model (LLM) response generation, and speech synthesis. But the real challenge lies in the nuances of human conversation: teaching the bot to pause when interrupted, not to speak over others, and to recognize the natural end of a conversation. These subtleties are what make interactions with Vocode feel remarkably human.\r\nA significant part of this talk will focus on the LLM function-calling feature of Vocode, particularly in real-time tasks like booking appointments. Imagine a scenario where you're speaking to 'Jane', a virtual plumber, to schedule a visit. The interaction feels real, with the bot understanding and responding to changes in appointment preferences, such as switching from a suggested time of \"tomorrow at 9 am\" to a more suitable slot \"next month\".\r\nThis talk aims to share insights and practical knowledge about building and refining AI voice agents, making them more than just voices on a call but rather engaging, interactive entities capable of performing complex tasks with ease and human-like finesse.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "348e3662-d37f-5a12-a210-2190d990a579", "id": 24692, "code": "8BJ7S9", "public_name": "Lev Konstantinovskiy", "avatar": "https://pretalx.com/media/avatars/8BJ7S9_fvI4Sl2.jpg", "biography": "Lev Konstantinovskiy is the Head of Engineering at a Berlin start-up Synthflow that specialises in AI voice agents. Long time ago he used to maintain a python Natural Language Processing library gensim", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/RLCLBB/", "id": 41472, "guid": "8e51615e-d1eb-5775-a9e7-0da6e5cf74b7", "date": "2024-04-23T14:10:00+02:00", "start": "14:10", "logo": null, "duration": "00:30", "room": "B07-B08", "slug": "pyconde-pydata-2024-41472-how-to-do-monolingual-multilingual-and-cross-lingual-text-classification-in-april-2024", "title": "How to Do Monolingual, Multilingual, and Cross-lingual Text Classification in April, 2024", "subtitle": "", "track": "PyData: Natural Language Processing & Computer Vision", "type": "Talk", "language": "en", "abstract": "In 2023, the field of NLP was again flurried -- the appearing of powerful closed- and opens-source LLMs opened new possibility for texts processing. However, many questions about these models usability for typical NLP tasks are still open. One of them is quite simple -- if we want a classification model for some task, can we rely on LLMs or is it still better to fine-tune an own model? It might be easier to obtain some classifier for English, but what if my target language is not so resource-rich? In this presentation, the main \"recipes\" how to obtain the best text classifier depending on the language and data availability will be described.", "description": "We will provide the answer to the three main questions:\r\n\r\n1. If I want a text classifier for English texts, what is better -- to fine-tune the model or to prompt LLM? Which model is to fine-tune though?\r\n\r\n2. If my data is not in English, i.e. not resource rich language, what should I do? Can I utilize LLMs? Or I need to somehow get the data? Or I can transfer somehow knowledge from existing English data?\r\n\r\n3. If I want a multilingual model for several languages, again, what is the choice -- LLMs or own model? Which model then?\r\n\r\nThe findings and comparisons will be illustrated on three tasks -- toxic speech, formal speech, and fluent speech detection -- for two languages -- English (as resource-rich language) and Ukrainian (as low resource language in terms of different data availability). We will provide tests of closed- and open-source models together with fine-tuned opensources models like BERT, RoBERTa.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "0abf737a-3e36-5b8a-92d5-6f65ac92d786", "id": 25740, "code": "937CJZ", "public_name": "Daryna Dementieva", "avatar": "https://pretalx.com/media/avatars/937CJZ_i7piS9G.jpg", "biography": "Hi, I'm Daryna \ud83d\udc4b\ud83c\uddfa\ud83c\udde6 I am a postdoctoral researcher at Social Computing Research Group in  Technical University of Munich\ud83c\udde9\ud83c\uddea. Before, I obtained my PhD degree at Skolkovo Institue of Science and Technology under supervision of Alexander Panchenko with topic \"Method for Fighting Harmful Multilingual Textual Content\" \ud83d\udcdc. Currently, I continue to follow my research vector participating in eXplainable AI (XAI) project and also multilingual NLP developing the models for the Ukrainian language.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/ZKDEPW/", "id": 40391, "guid": "3d28f123-91a9-5e60-bcfc-129019389f07", "date": "2024-04-23T14:45:00+02:00", "start": "14:45", "logo": null, "duration": "00:45", "room": "B07-B08", "slug": "pyconde-pydata-2024-40391-leveraging-the-art-of-parallel-unit-testing-in-django", "title": "Leveraging the Art of Parallel Unit Testing in Django", "subtitle": "", "track": "PyCon: Testing", "type": "Talk (long)", "language": "en", "abstract": "Unit testing is a fundamental practice in software development, ensuring the reliability and maintainability of code. However, in the context of monolith repositories, executing unit tests efficiently becomes a formidable challenge. This conference aims to explore the intricacies of unit testing in Django within monolithic codebases and shed light on how major institutions address and overcome these challenges through the implementation of parallel testing strategies.", "description": "Key Points to Address:\r\n- Understanding Monolith Challenges:\r\n- - Identification of challenges and bottlenecks in traditional unit testing approaches within Django monoliths.\r\n- - Analysis of the impact on development velocity and code quality.\r\n\r\n- Introduction to Parallel Testing:\r\n- - Explanation of parallel testing concepts and its application to Django unit testing.\r\n- - Benefits of parallelization in terms of speed, efficiency, and resource utilization.\r\n\r\n- Parallel Testing Tools and Techniques:\r\n- - Overview of tools and techniques available for parallelizing unit tests in Django.\r\n- - Practical insights into configuring and optimizing test suites for parallel execution.\r\n\r\n- Real-world Experiences from Major Institutions:\r\n- - Case studies from leading institutions sharing their challenges with unit testing in Django monoliths.\r\n- - Lessons learned and best practices in implementing parallel testing strategies.\r\n\r\n- Implementation Guidelines for Django Projects:\r\n- - Guidance on implementing parallel unit testing in Django projects, including code examples and configurations.\r\n- - Tips for integrating parallel testing seamlessly into existing development workflows.\r\n\r\nExpected Outcomes:\r\n- Insight into challenges specific to Django unit testing within monolithic repositories.\r\n- Understanding the principles and benefits of parallel testing.\r\n- Practical knowledge of tools and techniques for parallelizing Django unit tests.\r\n- Real-world experiences and best practices shared by major institutions.\r\n- Actionable guidelines for implementing parallel unit testing in Django projects.\r\n\r\nTarget Audience:\r\nThis talk is tailored for Django developers, software engineers, and testing professionals seeking to optimize their unit testing practices, especially within the context of monolithic repositories.\r\n\r\nConclusion:\r\nJoin me in this 45-minute session as we navigate through the challenges of unit testing in Django monoliths and explore the art of parallelization. By the end, you'll be equipped with the knowledge and tools to transform your Django unit testing workflows, leveraging the lessons learned from major institutions in the industry.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "70bbefc3-6b4f-5230-8151-98eccfbe86a9", "id": 20573, "code": "7BTCGM", "public_name": "Syed Ansab Waqar Gillani", "avatar": "https://pretalx.com/media/avatars/7BTCGM_D0Fb28e.png", "biography": "\ud83d\ude80 Python Enthusiast \ud83d\udc0d | Software Engineer at Agoda | Architect of Scalable Solutions | Agile Advocate | Mentor & Innovator | Speaker at XtremeJS, DjangoCon, OpenEdX, PyCascades | Let's elevate the tech game together! \ud83d\udcbb\u2728", "answers": []}, {"guid": "60afa417-eaec-5a14-a477-834173484bb2", "id": 19563, "code": "HVKWNH", "public_name": "Azan Bin Zahid", "avatar": "https://pretalx.com/media/avatars/HVKWNH_ZAtvU0b.jpg", "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/CY97LS/", "id": 41829, "guid": "022a5a89-ddf9-5ae2-aa01-56c3e4ccd9b7", "date": "2024-04-23T16:00:00+02:00", "start": "16:00", "logo": null, "duration": "00:30", "room": "B07-B08", "slug": "pyconde-pydata-2024-41829-analyzing-covid-19-protest-movements-a-multidimensional-approach-using-geo-social-media-data", "title": "Analyzing COVID-19 Protest Movements: A Multidimensional Approach Using Geo-Social Media Data", "subtitle": "", "track": "General: Ethics & Privacy", "type": "Talk", "language": "en", "abstract": "The COVID-19 pandemic and associated policy measures lead to world-wide protest movements that were singled out by the spread of misinformation and conspiracy theories, predominantly on social media platforms. Publicly available social media data therefore is a powerful proxy for studying these protest movements. The data, consisting of user locations, follower relationships, and content information, allows to understand the geographical centers of activity, network structure, and key themes of conspiracy movements.\r\n\r\nThis talk will present a multi-dimensional network analysis for the Austrian COVID-10 protest movement using Python libraries like geopandas, networkx and gensim.\r\nIn particular, it will demonstrate how to identify geo-spatial hot spots using spatial statistics, densely connected clusters within the network by employing community detection techniques, as well as dominating content themes through topic modeling approaches.\r\n\r\nThe presentation highlights how data-driven analysis enables further understanding of movements that may pose threats to democracy, alongside the importance of publicly available social media data for addressing societal challenges.", "description": "The talk will walk through the steps undertaken in the analysis of a protest network  using Twitter data. It will explain the methods used, present the results as well as code and libraries used following (roughly) this outline:\r\n\r\n1. Motivation: What was special about the COVID-19 protest movement and why a multi-dimensional view is crucial for understanding.\r\n2. The Data: The retrieved information using Twitter's API and the necessary pre-processing steps.\r\n3. Spatial Analysis: The statistical means to understand the movement's spatial manifestation, including explanation of used methods, presentation of results.\r\n4. Network Analysis: Mere social network analysis is not enough for understanding protest movements. Including the spatial information allows to draw deeper insights by geo-spatially mapping network communities and centralities. \r\n5. Semantic Analysis: Understanding the dominating themes in the protest network with semantic analysis: generating the document embeddings, clustering topics and dealing with a large dataset of tweets.\r\n6. Conclusion: Importance of multi-dimensional analysis and the availability of social media data for studying societally important phenomena.\r\n\r\nPython libraries that were used (among others): geopandas, networkx. berttopic, lda and friends.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "b606e0ce-e982-55a1-831b-51cb7cf53aeb", "id": 2094, "code": "F7VUE9", "public_name": "Nefta Kanilmaz", "avatar": "https://pretalx.com/media/avatars/F7VUE9_KG775zE.jpeg", "biography": "I am a researcher and PhD candidate at the department of Geoinformatics at the University of Salzburg. My research focus lies on the spatio-temporal analysis of online communication networks based on social media data. Before that, I worked as a software engineer on scalable big data processing for machine learning applications.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/YPKKQF/", "id": 41758, "guid": "1f1ba6d5-e107-5171-a2cb-ba657138cf87", "date": "2024-04-23T16:35:00+02:00", "start": "16:35", "logo": null, "duration": "00:30", "room": "B07-B08", "slug": "pyconde-pydata-2024-41758-would-you-rely-on-chatgpt-to-dial-911-a-talk-on-balancing-determinism-and-probabilism-in-production-machine-learning-systems", "title": "Would you rely on ChatGPT to dial 911? A talk on balancing determinism and probabilism in production machine learning systems", "subtitle": "", "track": "PyData: Natural Language Processing & Computer Vision", "type": "Talk", "language": "en", "abstract": "In the last year there hasn\u2019t been a day that passed without us hearing about a new generative AI innovation that will enhance some aspect of our lives. On a number of tasks large probabilistic systems are now outperforming humans, or at least they do so \u201con average\u201d. \u201cOn average\u201d means most of the time, but in many real life scenarios \u201caverage\u201d performance is not enough: we need correctness ALL of the time, for example when you ask the system to dial 911. \r\n\r\nIn this talk we will explore the synergy between deterministic and probabilistic models to enhance the robustness and controllability of machine learning systems. Tailored for ML engineers, data scientists, and researchers, the presentation delves into the necessity of using both deterministic algorithms and probabilistic model types across various ML systems, from straightforward classification to advanced Generative AI models. \r\n\r\nYou will learn about the unique advantages each paradigm offers and gain insights into how to most effectively combine them for optimal performance in real-world applications. I will walk you through my past and current experiences in working with simple and complex NLP models, and show you what kind of pitfalls, shortcuts, and tricks are possible to deliver models that are both competent and reliable.\r\n\r\nThe session will be structured into a brief introduction to both model types, followed by case studies in classification and generative AI, concluding with a Q&A segment.", "description": "Objective and Outline:\r\nThis talk addresses the often-overlooked need for integrating deterministic and probabilistic models in machine learning, which is crucial in complex production environments. We begin by defining deterministic and probabilistic models, highlighting their distinct roles in ML systems. The talk then showcases practical examples where the synergy of these models enhances system performance, focusing on classification and Generative AI models.\r\n\r\nTarget Audience and Expected Background Knowledge:\r\nIntended for ML engineers, data scientists, and academic researchers, this presentation assumes familiarity with basic machine learning concepts and models. It's particularly beneficial for those involved in designing, implementing, or managing ML systems in production environments.\r\n\r\nKey Takeaways:\r\n\r\n- Understanding the strengths and limitations of deterministic and probabilistic models in ML.\r\n- Strategies for effectively combining these models in various ML systems.\r\n- Real-world examples demonstrating the improved robustness and controllability achieved through this integration.\r\n- Insights into future trends and potential developments in model integration.\r\n\r\nTime Breakdown:\r\n\r\n- Minutes 0-10: Introduction to deterministic and probabilistic models\r\n- Minutes 10-20: Synergies of approaches in real-world examples\r\n- Minutes 20-30: Applications for Generative AI models, including Q&A\r\n\r\nAdditional Information:\r\nNo prerequisites are required beyond a basic understanding of machine learning concepts. The presentation will be informative with a focus on practical applications, providing attendees with actionable knowledge and a deeper appreciation of model integration in ML systems.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "57945e2b-778e-57aa-92d4-5a985123286b", "id": 38354, "code": "XHFQVH", "public_name": "Nicolas Guenon des Mesnards", "avatar": "https://pretalx.com/media/avatars/XHFQVH_6pGthrE.JPG", "biography": "Nicolas is a Sr. ML Engineer at GitGuardian where he develops NLP-based technologies to detect vulnerabilities in code and provide remediation. He was previously Sr. Applied Scientist at Amazon Alexa  where he developed the models that power Alexa's core understanding  capabilities. He published multiple academic papers at top tier NLP conferences in the field of semantic parsing. Nicolas has hands-on experience with a variety of NLP models applied to client-facing applications.", "answers": []}], "links": [], "attachments": [], "answers": []}], "B05-B06": [{"url": "https://pretalx.com/pyconde-pydata-2024/talk/W7YDRX/", "id": 42839, "guid": "2c89b931-ef2c-59ac-aed5-d7e4c765d700", "date": "2024-04-23T10:30:00+02:00", "start": "10:30", "logo": null, "duration": "00:30", "room": "B05-B06", "slug": "pyconde-pydata-2024-42839-deploying-your-python-application-to-android", "title": "Deploying your Python application to Android", "subtitle": "", "track": "PyCon: Programming & Software Engineering", "type": "Talk", "language": "en", "abstract": "Since many years Android has held the top position as the most used OS with about 38% of the OS user share in 2023. Currently 3 major languages \u2013 C++, Java, Kotlin are used for application development on Android. Although Python has the capabilities of enabling Android deployment, Python was never considered as an adequate language for Android development. But, with the introduction of \u201cPEP 738: Adding Android as a supported platform\u201d, and the increasing popularity of frameworks like PySide6, Kivy, Flet etc. which enable GUI development with Python for Android devices, it is time for Python package developers to consider Android as a potential platform.  \r\n\r\nThis talk gives an introduction to each of the GUI development toolkits \u2013 Kivy, Flet and PySide6 by demonstrating how to create a simple Contact List application. We later delve into the pros and cons of each of these frameworks, so that Python application developers can decide which framework suits their requirements better.", "description": "Python can be used to create native applications for Android. However, although Python is the most popular programming language, it is not the first choice to create an Android application. This talk gives an overview of developing Android application with Python by comparing the 3 popular frameworks for GUI development with Python that support Android as a platform \u2013 PySide6, Kivy and Flet. This comparison is demonstrated with a simple Contact List application with the ability to add, edit and delete contacts.  \r\n\r\nThe overall structure of the talk will be almost the following: \r\n\r\n1. Why is Android a relevant platform for Python application developers? (6 minutes) \r\n\r\nIn this section, we establish why Android is the most popular OS being sued currently. Although Python has had the support to run applications natively in Android, even dating back to 2011, the development of Android applications with Python is not so popular. We will further highlight one of the major concerns of using Python for Android develpoment and how PEP 738 can help simplify this. \r\n\r\n2. Current status of Android app development with Python (2 minutes) \r\n\r\nIn this section, we give a brief introduction to some of the Python based toolkits that support Android as a platform \u2013 Kivy, Flet, PySide6, Beeware etc. \r\n\r\n3. Contact List application with Kivy (3 minutes) \r\n\r\nIn this section, we look at how the applicatiion looks with Kivy and KivyMD, followed by the ease of development and some pros and cons of the framework.\r\n\r\n4. Contact List application with PySide6 (5 minutes) \r\n\r\nThe deployment of PySide6 application to Android uses the same build tool as Kivy, called python-for-android. python-for-android now also supports a Qt backend along with SDL2 that Kivy uses thus enabling the deployment of PySide6 application. In this section, we look at how the applicatiion looks with PySide6, followed by the ease of development and some pros and cons of the framework.\r\n\r\n5. Contact List application with Flet (3 minutes) \r\n\r\nIn this section, we look at how the applicatiion looks with Flet, followed by the ease of development and some pros and cons of the framework.\r\n\r\n6. Python packages support (6 minutes) \r\n\r\nWe see the various Python packages supported by each framework.\r\n\r\n7. Conclusion and Questions (5 minutes) \r\n\r\nQuestions from the audience.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "46b9b0b6-99f2-5dbf-a54e-6c01b43844a2", "id": 28354, "code": "7PPUDB", "public_name": "Shyamnath Premnadh", "avatar": "https://pretalx.com/media/avatars/7PPUDB_UOG1bve.jpg", "biography": "Shyamnath aka Shyam is a Senior Software Engineer respoinsble for developing PySide and shiboken6 at the Qt Company. Having worked across many domains including Mainframes, Java, Android and Machine Learning, Shyam's current interest lies in playing around with C++ and Python.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/AQ8HUM/", "id": 41840, "guid": "058445b6-196d-5608-8f5f-774c2e59988f", "date": "2024-04-23T11:05:00+02:00", "start": "11:05", "logo": null, "duration": "00:30", "room": "B05-B06", "slug": "pyconde-pydata-2024-41840-advanced-observability-with-opentelemetry-and-python", "title": "Advanced Observability with OpenTelemetry and Python", "subtitle": "", "track": "PyCon: Programming & Software Engineering", "type": "Talk", "language": "en", "abstract": "As Python expands into serverless and cloud environments, popularizing distributed microservice architectures, we often face observability challenges that impact efficiency and complicate error tracing. This presentation introduces OpenTelemetry, an emerging industry standard that provides a framework for tracking the performance of not just our Python code, but also other system components like databases and message queues. Its API and SDK integrate seamlessly with Python, enabling a unified approach to gather, process, and export telemetry data from various sources within a distributed system.\r\n\r\nWe will explore the setup and usage of OpenTelemetry's Python SDK through a practical scenario. The session will demonstrate how to convert an existing Flask microservice to use OpenTelemetry, using both automatic and manual instrumentation. Finally, we will examine how to utilize the exported data for enhanced system monitoring.", "description": "With the rise of serverless architectures and cloud technologies, Python has become increasingly popular for building microservices. Yet, as these systems expand, they face observability challenges leading to reduced efficiency and complexities in error tracing.\r\n\r\nTo address these challenges, this presentation introduces OpenTelemetry, an emerging industry standard providing a framework for tracking the performance of not only our Python code but also other system components such as databases or message queues. It integrates seamlessly into Python environments, offering a common way to gather, process, and export telemetry data from various sources of a distributed system.\r\n\r\nThe session will begin by revisiting the concept of observability and its critical importance in distributed systems. We will then introduce OpenTelemetry, and check the fundamentals of its' Python SDK. A practical use case will be presented, demonstrating the integration of OpenTelemetry into an existing Python microservice, using both automatic instrumentation mode and manual traces. Finally, we will discuss how to utilize the data collected by OpenTelemetry for system monitoring.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "a23f4c59-be33-5cf8-a073-42381f82ac2f", "id": 2101, "code": "Z7K9TP", "public_name": "Anton Caceres", "avatar": "https://pretalx.com/media/avatars/Z7K9TP_FIxI6Gg.jpg", "biography": "I manage a Python agency with a strong focus on community engagement, evidenced by my involvement in organizing PyCons and leading PyMunich meetups. Being a Python Software Foundation fellow, I am dedicated to a journey of constant learning and sharing.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/C9F9CC/", "id": 41737, "guid": "bd49f630-05ac-5670-a77b-929088abbf2c", "date": "2024-04-23T11:40:00+02:00", "start": "11:40", "logo": null, "duration": "00:30", "room": "B05-B06", "slug": "pyconde-pydata-2024-41737-boost-your-app-to-flash-speed-by-mastering-performance-tricks", "title": "Boost your app to Flash speed by mastering performance tricks", "subtitle": "", "track": "PyCon: Python Language & Ecosystem", "type": "Talk", "language": "en", "abstract": "In this talk, we discuss computational operations and memory utilization in Python and what is the connection between them. Additionally, we will provide you with visual aids for helping to build a mental picture of these concepts. Moreover, we will dive into how Python interpreter works and how the understanding of bytecode instructions can help you write better code. In the end, we will demonstrate the advantages of best practices by comparing both performance metrics and bytecode instructions.", "description": "Nowadays, more and more companies are looking for different strategies to gain more users for their products by using different approaches starting from introducing unique features to optimizing application performance. Additionally, python is one of the widely used programming languages where the community continuously introduces new libraries for enhancing performance and optimizing memory usage. However, can we also accelerate app performance not only by relying on libraries but also by understanding how Python works under the hood?\r\n\r\nIn this talk, we discuss computational operations and memory utilization in Python and what is the connection between them. Additionally, we will provide you with visual aids for helping to build a mental picture of these concepts. Moreover, we will dive into how Python interpreter works and how the understanding of bytecode instructions can help you write better code. In the end, we will demonstrate the advantages of best practices by comparing both performance metrics and bytecode instructions.\r\n\r\nIf you're keen to move beyond basic optimizations and truly understand what happens under Python's hood during application execution, this session is for you. Join us to learn how Python works under the hood and also have an imagination of what is going on in Python during the application execution.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "172c2f01-bff6-5365-9b5c-833b02c1ae2a", "id": 15861, "code": "ZLLVEH", "public_name": "Laysa Uchoa", "avatar": "https://pretalx.com/media/avatars/ZLLVEH_DTTaNg0.jpg", "biography": "Laysa is a digital architect who constructs scalable structures using the magic of the cloud and Python. She is a certified cloud engineer and an enthusiastic advocate of the Python language and its environments. In addition to this, she serves as the leader of the PyLadies Munich chapter\u2014a community where individuals gather to learn, share, and nurture their growth.", "answers": []}, {"guid": "8aca423b-d47d-545b-8d66-ce05184220d1", "id": 34687, "code": "Z8HXML", "public_name": "Yuliia Barabash", "avatar": "https://pretalx.com/media/avatars/Z8HXML_xn1gnuK.png", "biography": "I have lived in Germany for the past five years, during which I have gained a diverse range of experiences in the tech industry. My expertise spans from developing web applications in Python to constructing AWS cloud solutions. I have a good understanding of design patterns, Object-Oriented Programming (OOP), event-driven architecture, and microservices architectures. Additionally, I have hands-on experience with REST API design and database technologies. I am continuously committed to enhancing my skills and ensuring that I utilize tools in the best practices.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/N9DEVW/", "id": 41505, "guid": "c272c16d-d607-536e-ae92-a3104befe93e", "date": "2024-04-23T14:10:00+02:00", "start": "14:10", "logo": null, "duration": "00:30", "room": "B05-B06", "slug": "pyconde-pydata-2024-41505-pandas-dask-dataframe-2-0-comparison-to-spark-duckdb-and-polars", "title": "Pandas + Dask DataFrame 2.0 - Comparison to Spark, DuckDB and Polars", "subtitle": "", "track": "PyData: Data Handling & Engineering", "type": "Talk", "language": "en", "abstract": "Dask is a library for distributed computing with Python that integrates tightly with pandas. Historically, Dask was the easiest choice to use (it\u2019s just pandas) but struggled to achieve robust performance (there were many ways to accidentally perform poorly). The re-implementation of the DataFrame API addresses all of the pain points that users ran into. We will look into how Dask is a lot faster now, how it performs on benchmarks that is struggled with in the past and how it compares to other tools like Spark, DuckDB and Polars.", "description": "Dask is a library for distributed computing with Python that integrates tightly with pandas and other libraries from the PyData stack. It offers a DataFrame API that wraps pandas and thus offers an easy transition into the big data space.\r\n\r\nHistorically, Dask was the easiest choice to use (it\u2019s just pandas) but struggled to achieve robust performance (there were many ways to accidentally perform poorly).  It was great for experts, but bad for novices.  Other tools (Spark, DuckDB, Polars) just did this better. \r\n\r\nFortunately, these pain points have been fixed with the following features: \r\n\r\n- A new and vastly improved shuffle algorithm\r\n- A logical query planning layer to improve performance and usability\r\n- A reduced memory footprint through a more efficient data model due to pandas 2.0\r\n\r\nWe will look into how these changes work together across pandas, Arrow, and Dask to provide a better UX and a more robust and faster system overall. Additionally, we will look into a comparison of Dask against other tools in the big data space, including Spark, Polars and DuckDB.\r\n\r\nWe will use the TPC-H benchmarks to compare these tools. We will look ahead into what the future will bring for pandas and Dask and how the logical query planning layer can be extended to fit other frameworks like Dask Array and XArray.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "5ddfee0c-a183-59fd-bd39-3fbfffc64182", "id": 38035, "code": "STZ9DS", "public_name": "Patrick Hoefler", "avatar": "https://pretalx.com/media/avatars/STZ9DS_AAEcZQM.jpeg", "biography": "Patrick Hoefler is a member of the pandas core team and a Dask maintainer. He is currently working at Coiled where he focuses on Dask development and the integration of a logical query planning layer into Dask. He holds a Msc degree in Mathematics and works towards a Msc in Software engineering at the University of Oxford.", "answers": []}, {"guid": "399a4feb-3664-5546-87e6-bb6d1a77020e", "id": 2045, "code": "7ATTUC", "public_name": "Florian Jetter", "avatar": null, "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/9PZSBS/", "id": 41576, "guid": "ad080293-022e-5298-8f1b-8bfa5785e4a0", "date": "2024-04-23T14:45:00+02:00", "start": "14:45", "logo": null, "duration": "00:45", "room": "B05-B06", "slug": "pyconde-pydata-2024-41576-the-key-to-reliability-testing-in-the-field-of-ml-ops", "title": "The key to reliability - Testing in the field of ML-Ops", "subtitle": "", "track": "PyCon: MLOps & DevOps", "type": "Talk (long)", "language": "en", "abstract": "Testing is a de facto standard in modern software development. With increasing awareness that comes with ML-Ops, testing becomes more important for the development and operation of machine learning-based components. In this talk we would like to share our view and solution for testing in the field of machine learning. We will present the applied testing strategy used and the lessons learned from the last four years of experience in operating idealo\u2019s cataloging system.", "description": "idealo.de offers a price comparison service for millions of products from a wide variety of categories. It navigates the dynamic landscape of about 3.7 billion offerings from 50,000+ shops, our central challenge is cataloging this huge offer automatically. Machine learning plays a crucial role for us in processing data.   \r\n\r\nMachine learning components must be considered as a part of a more complex domain. In our domain those components are part of an event driven asynchronous architecture. The need to continuously develop, deliver, and train accompanied by the capability to smoothly work together with traditional software components raises high demands on stable software development and operations. Testing plays a crucial role and brings up many open questions in the field of machine learning.   \r\n\r\nIn this talk we want to share and present our holistic approach to testing in machine learning.  The following aspects are taken into account:   \r\n- Introduction into our machine learning lifecycle   \r\n- Testing in context of traditional software development comprising unit tests, code coverage, contract tests, tests on infrastructure as code    \r\n- Specific challenges of testing in the machine learning domain comprising end-to-end test of training pipelines, deployment testing of inference endpoints in operational modes \r\n- The role of logging and monitoring for safe operations  \r\n\r\nThe presented test strategy is based on our 4 years' experience in operating idealo's cataloging system. Examples will be aligned along our tech stack consisting of e.g., PyTest,  CDK , Pactman,  AWS Sagemaker, Github Actions, OpenSearch Kibana and Grafana.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "538f65cd-f089-59f3-bd75-e1edda708a42", "id": 25757, "code": "3CDBQQ", "public_name": "Gunar Maiwald", "avatar": "https://pretalx.com/media/avatars/3CDBQQ_19yNoLw.jpg", "biography": "Gunar Maiwald has a background in Computer Science. For the last 4 years he worked as an ML engineer at idealo.de. His professional programming path led him from Perl via TypeScript to Python.\r\n\r\n* LinkedIn: https://de.linkedin.com/in/gunar-maiwald-6a5aa6194", "answers": []}, {"guid": "e4d322ca-e6b4-5640-9a83-4494d66ba14e", "id": 27140, "code": "UXME8R", "public_name": "Tobias Senst", "avatar": "https://pretalx.com/media/avatars/UXME8R_biRI1vR.png", "biography": "Tobias Senst is a Senior Machine Learning Engineer at idealo internet GmbH. Tobias Senst received his PhD in 2019 from the Technische Universit\u00e4t Berlin under the supervision of Prof. Thomas Sikora. He has more than 10 years of experience in Computer Vision and Video Analytics research.\r\n\r\nAt idealo, he switched from the world of images and videos to Natural Language Processing and is responsible for the operation and development of machine learning models in a productive environment.\r\n\r\n* LinkedIn: https://linkedin.com/in/tobias-senst-08090b192\r\n* Github: https://github.com/tsenst\r\n* Medium: https://medium.com/@tsenst\r\n* Google Scholar: https://scholar.google.de/citations?user=NKQ8Y9oAAAAJ&hl=de", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/BNCJPV/", "id": 41727, "guid": "18cda0f6-945c-5eb2-b7e0-d04a62cad894", "date": "2024-04-23T16:00:00+02:00", "start": "16:00", "logo": null, "duration": "00:30", "room": "B05-B06", "slug": "pyconde-pydata-2024-41727-the-evolution-of-feature-stores", "title": "The evolution of Feature Stores", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Stats", "type": "Talk", "language": "en", "abstract": "Feature Stores have become an important component of the machine learning lifecycle. They have been particularly pivotal in bridging the gap between data engineering and machine learning workflows(experimentation, training and serving). This talk will explore Feature Stores with a focus on their evolution, what they look like now and what they could look like in the future with the advent of the AI ACT.", "description": "In recent years, the role of feature stores has become increasingly pivotal in data engineering and machine learning. This talk will delve into the history of feature stores, exploring their evolution from Uber's Michelangelo to recent solutions like Feast, Hopsworks and Fennel. Lastly, we will discuss the potential impact of the AI Act on the future of feature stores, highlighting regulatory constraints that may affect what they look like in the future.\r\n\r\nThe outline of this talk is detailed below.\r\n\r\n### Historical Perspective:\r\n\r\n- Tracing the origins of Feature Stores: How did the concept evolve over time?\r\n- Early use cases and challenges: Lessons learned from Michelangelo.\r\n- Pioneering Feature Stores: Case studies on organizations at the forefront of adoption.\r\n\r\n\r\n### Current Landscape:\r\n\r\n- Architectural insights: What do modern Feature Stores look like?\r\n- Integration with popular ML frameworks and data storage solutions.\r\n- Real-world success stories: How Zalando built a central Feature Store for serving features across departments and business units with different technical requirements.\r\n\r\n### AI ACT and the Future of Feature Stores:\r\n- Envisioning Feature Stores in an AI ACT environment.\r\n- Federated learning and distributed feature stores: Opportunities and challenges.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "1fdcef7a-f92d-591c-bb3f-139404e6e97b", "id": 38341, "code": "KXYTZD", "public_name": "Olamilekan Wahab", "avatar": null, "biography": "I'm an Engineering Manager in the Machine Learning Platform team in Zalando where I build tools to make it easier for ML engineers, researchers and data scientists to be more productive and compliant.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/LNFSDV/", "id": 41772, "guid": "4d3e1d3e-7e77-5882-a5b5-a196df203650", "date": "2024-04-23T16:35:00+02:00", "start": "16:35", "logo": null, "duration": "00:30", "room": "B05-B06", "slug": "pyconde-pydata-2024-41772-polars-and-time-series-what-it-can-do-and-how-to-overcome-any-limitation", "title": "Polars and Time Series: what it can do, and how to overcome any limitation", "subtitle": "", "track": "PyData: Data Handling & Engineering", "type": "Talk", "language": "en", "abstract": "Time series analysis is ubiquitous in applied data science because of the value it delivers. In order to do effective time series analysis, you need to know your tools well. Polars has excellent built-in time series support, and it's also possible to extend it where necessary.\r\n\r\nWe will talk about:\r\n- Basic built-in time series operations with Polars (e.g. \"what's the average number of sales per month?\").\r\n- numba/numpy/scipy interoperability for not-so-basic time series operations (e.g. non-linear interpolation, or cumulative operations).\r\n- Advanced, custom time series operations, and how you can implement them as Polars plugins (e.g. business day arithmetic).\r\n\r\nBasic interest and knowledge of Python and data will be assumed, but no prior Polars experience is required.\r\n\r\nAnyone working with time series and/or dataframes will likely benefit from the talk.", "description": "This will be a technical talk, teaching people how to use Polars effectively for time series analysis.\r\n\r\nThe format will be roughly:\r\n- 5 mins: motivation, super-fast Polars crash course.\r\n- 7 mins: what's built-in - making the most of Polars' built-in time series capabilities.\r\n- 7 mins: when Polars isn't enough: interoperability with numba/scipy/numpy.\r\n- 6 mins: when nothing is enough: writing your own Polars Plugin, and learning how to do that.\r\n - 5 mins: engaging Q&A / awkward silence.\r\n\r\nAttendees will leave knowing where to turn to for any time series analysis task they may encounter whilst using Polars.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "6fa6c8b8-7b91-5fe9-8b55-ae35ecc17ec8", "id": 30515, "code": "KEUJ9U", "public_name": "Marco Gorelli", "avatar": "https://pretalx.com/media/avatars/KEUJ9U_SQaluhL.jpg", "biography": "Marco is a core dev of pandas and Polars and works at Quansight Labs as Senior Software Engineer. He also consults and trains clients professionally on Polars. He has also written the first Polars Plugins Tutorial and has taught Polars Plugins to clients.\r\n\r\nHe has a background in Mathematics and holds an MSc from the University of Oxford, and was one of the prize winners in the M6 Forecasting Competition (2nd place overall Q1).", "answers": []}], "links": [], "attachments": [], "answers": []}], "A1": [{"url": "https://pretalx.com/pyconde-pydata-2024/talk/RRAZ99/", "id": 41604, "guid": "95a11a03-bb82-5712-91dd-8aff2a8812a2", "date": "2024-04-23T10:30:00+02:00", "start": "10:30", "logo": null, "duration": "00:30", "room": "A1", "slug": "pyconde-pydata-2024-41604-encoding-charactersets-may-the-force-be-with-you", "title": "Encoding Charactersets - may the force be with you", "subtitle": "", "track": "PyCon: Python Language & Ecosystem", "type": "Talk", "language": "en", "abstract": "Understanding and repairing garbled text (Mojibake)\r\nis despite Unicode a permanent ongoing task in IT projects.\r\nGarbled text is the result of text being decoded using an unintended character encoding.\r\n\r\nExample: Die UTF-8 Selbsthilfegruppe trifft sich heute Abend im gr\u00c3\u00bcnen Saal\r\n\r\nThis talks explains how to analyze and fix such encoding problems with python.\r\nThe topics of this talk contains:\r\n\r\n- difference between grapheme and codepoints\r\n- Unicode vs. UTF-8 \r\n- decoding and encoding files, database result sets, REST-APIs calls\r\n- the unicodedata module\r\n- handling of ISO charsets in the unicode world\r\n\r\nThis talk shows short code examples for real world problems and solutions.", "description": "Understanding and repairing garbled text (Mojibake)\r\nis despite Unicode a permanent ongoing task in IT projects.\r\nGarbled text is the result of text being decoded using an unintended character encoding.\r\n\r\nThe topics of this talk contains the following points. To every point there are code examples:\r\n\r\n- Explore the nuances of text representation: Grapheme vs. Codepoints. Unravel the essence of characters in computing.\r\n- Delve into the realm of character encoding: Unicode vs. UTF-8. Decipher the key distinctions shaping text globalization.\r\n- Master the art of data interchange. Decode and encode files, database results, and REST-APIs seamlessly for universal communication.\r\n- Unlock the power of the unicodedata module. Learn how it aids in character information retrieval and manipulation in Python.\r\n- Navigate the challenges of ISO charsets in the Unicode era. Gain insights into effective strategies for handling diverse character sets.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "b3729c66-dfe8-5005-bf58-6232d1007148", "id": 38302, "code": "V8A3RS", "public_name": "Martin Hoermann", "avatar": "https://pretalx.com/media/avatars/V8A3RS_XfZ4Bg8.jpg", "biography": "Working for over 25 years for ORDIX AG as consultant in topics databases and programming. Focused on programming python in the last years. Giving lectures for beginners and advanced customers. Having lots of fun in edutainment difficult but all-day problems.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/Y3Y78W/", "id": 42987, "guid": "dc45640a-8bce-53f5-b115-d3dfe3ed7489", "date": "2024-04-23T11:05:00+02:00", "start": "11:05", "logo": null, "duration": "00:30", "room": "A1", "slug": "pyconde-pydata-2024-42987--un-leashed-potential-of-ai-in-government", "title": "(Un)leashed potential of AI in Government", "subtitle": "", "track": "General: Ethics & Privacy", "type": "Talk", "language": "en", "abstract": "As the world is being reshaped at an unprecedented speed through the rise of powerful (Generative) AI technologies that change the way we work and live, governments seek their place in the arena. This presentation will focus on how government institutions adapt to these changes by exploring three key areas of action: Adoption, Regulation, and Reskilling/Upskilling. Emphasis will be placed on Ethics and AI in government.", "description": "As the world is being reshaped at an unprecedented speed through the rise of powerful (Generative) AI technologies that change the way we work and live, governments seek their place in the arena. This presentation will focus on how government institutions adapt to these changes by exploring three key areas of action:\r\n1.\tAdoption: Generally, technology adoption has been slower in government than in the private sector. Yet governments have increasingly started to explore the potential of AI to deliver on their mission. The audience will learn about potentials, barriers, and concrete use cases/prototypes of AI-based services in German government bodies with a focus on responsible AI and Ethics. \r\n2.\tRegulation: It is discussed how government bodies respond to the rise of AI through regulation. An introduction to the EU AI Act is given \u2013 the world\u2019s first comprehensive AI law. \r\n3.\tReskilling & Upskilling: Insights are given on the role specialised data skills play in shaping the future of Digital Government in Germany.", "recording_license": "", "do_not_record": true, "persons": [{"guid": "a76c6a9a-81c1-5244-a189-fc06dfbe3547", "id": 38925, "code": "HBSQVV", "public_name": "Rosa Marie Keller", "avatar": null, "biography": "Rosa works as a Data Scientist at the German Federal Ministry for Family Affairs, Senior Citizens, Women and Youth, where she is currently supporting the setup of an in-house Data Lab. As an interdisciplinary thinker with experience in national and EU policymaking, she is interested in sustainable solutions at the intersection of policy, psychology, and data science.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/8GQLLY/", "id": 42932, "guid": "71a9accf-4b8b-57e5-a607-8fd4376c0f61", "date": "2024-04-23T11:40:00+02:00", "start": "11:40", "logo": null, "duration": "00:30", "room": "A1", "slug": "pyconde-pydata-2024-42932-ddataflow-an-open-source-end-to-end-testing-framework-for-ml-pipelines", "title": "DDataflow: An open-source end-to-end testing framework for ML pipelines", "subtitle": "", "track": "PyCon: MLOps & DevOps", "type": "Talk", "language": "en", "abstract": "In the realm of machine learning, the complexity of data pipelines often hinders rapid experimentation and iteration. This talk will introduce [DDataflow](https://github.com/getyourguide/DDataFlow), an innovative open-source tool, designed to facilitate end-to-end testing in ML pipelines by leveraging decentralized data sampling. Attendees will gain insights into the challenges of unit testing in large-scale data pipelines, the design philosophy behind DDataflow, and practical implementation strategies to enhance the reliability and efficiency of their ML pipelines.", "description": "Machine Learning pipelines, especially those dealing with large datasets, are intricate and multifaceted. The ability to quickly iterate and experiment is crucial, yet the complexity and scale of these pipelines often lead to prolonged development loops and latent errors. Traditional unit-testing approaches have proven to be cumbersome and inefficient in addressing these challenges due to the extensive boilerplate code and limited coverage they offer.\r\n\r\nThis talk will delve into the journey of developing [DDataflow](https://github.com/getyourguide/DDataFlow), a tool aimed at addressing the aforementioned challenges by enabling efficient end-to-end testing in ML pipelines. DDataflow employs decentralized data sampling to expedite testing processes, allowing for rapid and reliable iterations in ML pipelines.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "86973d97-e18a-5002-9b99-7690509f6220", "id": 25666, "code": "EZMJWT", "public_name": "Theodore Meynard", "avatar": "https://pretalx.com/media/avatars/EZMJWT_XWSm1xx.jpg", "biography": "Theodore Meynard is a data science manager at GetYourGuide. He leads the evolution of their ranking algorithm, helping customers find the best activities to book and locations to explore. Beyond work, he is one of the co-organizers of the Pydata Berlin meetup and the conference. \r\nWhen he is not programming, he loves riding his bike looking for the best bakery-patisserie in town.", "answers": []}, {"guid": "120a91e2-462f-5e74-9eaa-e3f77ed9e558", "id": 29444, "code": "ZCXEBQ", "public_name": "Jean Machado", "avatar": "https://pretalx.com/media/avatars/ZCXEBQ_QSoUlW7.jpeg", "biography": "Jean Carlo Machado is a Brazilian DataScience Manager at GetYourGuide for the Growth Data Products team and the Machine Learning Platform Team. From this point of view is able to collaborate with amazing people in turning business opportunities into data science products, from inception to large scale production deployments of multiple data products.  Jean values community building and getting communities together; he is currently one of the organizers of the MLOps.community Berlin. Jean spends a significant part of his ever shrinking free time building open-source tools his focus right now building social good tech.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/93MHQ3/", "id": 43030, "guid": "b7985a83-02f6-5986-9997-e799c34b0459", "date": "2024-04-23T14:10:00+02:00", "start": "14:10", "logo": null, "duration": "00:30", "room": "A1", "slug": "pyconde-pydata-2024-43030-exploring-zarr-from-fundamentals-to-version-3-0-and-beyond", "title": "Exploring Zarr: From Fundamentals to Version 3.0 and Beyond", "subtitle": "", "track": "PyData: Data Handling & Engineering", "type": "Talk", "language": "en", "abstract": "A key feature of the Python data ecosystem is the reliance on simple but efficient primitives that follow well-defined interfaces to make tools work seamlessly together (Cf. http://data-apis.org/). NumPy provides an in-memory representation for tensors. Dask provides parallelisation of tensor access. Xarray provides metadata linking tensor dimensions. **Zarr** provides a missing feature, namely the scalable, persistent storage for annotated hierarchies of tensors. Defined through a community process, the Zarr specification enables the storage of large out-of-memory datasets locally and in the cloud. Implementations exist in C++, C, Java, Javascript, Julia, and Python, enabling.\r\n\r\nThis talk presents a systematic approach to understanding the newer [Zarr Specification Version 3](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html) by explaining the critical design updates, performance improvements, and the lessons learned via the broader specification adoption across the scientific ecosystem.\r\n\r\nI will also briefly discuss the evolution of the Zarr - the development of the [Zarr Enhancement Process (ZEP)](https://zarr.dev/zeps) and its use to define the next major version of the specification (V3); as well as uptake of the format across the research landscape.", "description": "Zarr is a data format for storing chunked, compressed N-dimensional arrays and is sponsored by [NumFOCUS](https://numfocus.org/project/zarr) under their umbrella.\r\n\r\nIt is based on open-source technical specification and has implementations in several languages, with [Zarr-Python](https://github.com/zarr-developers/zarr-python) being the most used.\r\n\r\n## Outline\r\n\r\nFirst, I\u2019d be talking about:\r\n\r\n### Understanding Zarr basics (5 mins.) \r\n\r\n- What is Zarr, and how it works?\r\n    - The inner workings of Zarr using illustrated graphics\r\n- What is the Zarr Specification?\r\n    - How is Zarr different when compared to other storage formats?\r\n\r\nThen, I'll be talking about the new Zarr Specification V3 and its significant features:\r\n\r\n### What's new in Zarr Spec V3? (15 mins.)\r\n\r\n- What is the motivation for the evolution of the specification?\r\n    - High-latency storage \u2192 Better support for technologies, particularly systems with relatively high latency per operation, such as cloud object stores\r\n    - Interoperability \u2192 Language-agnostic approach towards the new specification by slimming down the specification to achieve interoperability across major programming languages\r\n- Major design updates\r\n    - Greater flexibility in how groups and arrays are created\r\n        - Support for implicit groups that do not have a metadata document but whose existence is implied by descendant nodes\r\n    - Restructuring of the `JSON` metadata document and storage path in both arrays and groups\r\n        - Why is the Zarr V3 metadata consolidated compared to the Zarr V2 metadata?\r\n    - Explicit support for extensions via defined extension points and mechanisms\r\n        - How do extensions allow the community to add innovative and cutting-edge features to help their specific use cases?\r\n    - Chunk encoding and supported codecs for V3\r\n        - How are chunks encoded into binary representation for storage in the store, using the chain of codecs specified by the codecs metadata field?\r\n- ZEP Process\r\n    - Need and origin of a community feedback process for the evolution of Zarr specification\r\n    - Transformation from steering council governed to community-owned specification\r\n    - Learnings when migrating from [Spec V2](https://zarr.readthedocs.io/en/stable/spec/v2.html) \u2192 [Spec V3](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html)\r\n\r\nThen, I\u2019d be doing a hands-on session, which would cover the following:\r\n\r\n### Hands-on (5 mins.)\r\n\r\n- Creating Zarr arrays and groups using Zarr-Python V3.0\r\n    - Walk through of the new features (mentioned above)\r\n- Demo of [Sharding Codec](https://zarr.dev/zeps/accepted/ZEP0002.html) extension\r\n    - Creating a sharded array and group and showing how a large number of chunks can be grouped together into a single shard\r\n- Looking under the hood\r\n    - Use store functions to explain how your Zarr data is stored\r\n\r\nI'd be closing the talk by:\r\n\r\n### Conclusion (5 mins.)\r\n \r\n- Key takeaways\r\n- How can you get involved?\r\n- QnA\r\n\r\nThis talk aims to address an audience that works with large amounts of data and is looking for a transparent, open-source, reliable, cloud-optimised, and environmentally friendly format. Also, I\u2019d like to invite anyone interested in the lessons I learned by maintaining the project throughout the years.\r\n\r\nThe tone of the talk is set to be informative, story-telling and fun.\r\n\r\nIntermediate knowledge of Python and NumPy arrays is required for the attendees to attend this talk.\r\n\r\n### After this talk, you\u2019d:\r\n\r\n- understand the basics of Zarr and what's new in V3,\r\n- using Zarr V3 for local and cloud storage,\r\n- make an informed decision on what data format to use for your data\r\n\r\nand also you'd:\r\n\r\n- know why should you have a process for your project,\r\n- have essential takeaways regarding when an OSS project transitions from a young to a mature stage", "recording_license": "", "do_not_record": false, "persons": [{"guid": "b66fda83-603e-5800-9d21-04088119b753", "id": 15587, "code": "A7ACFE", "public_name": "Sanket Verma", "avatar": "https://pretalx.com/media/avatars/A7ACFE_CWefmUa.jpg", "biography": "Sanket is a data scientist based out of New Delhi, India. He likes to build data science tools and products and has worked with startups, governments, and organisations. He loves building community and bringing everyone together and is Chair of PyData Delhi and PyData Global. \r\n\r\nCurrently, he's taking care of the community and OSS at Zarr as their Community Manager.\r\n\r\nWhen he\u2019s not working, he likes to play the violin and computer games and sometimes thinks of saving the world!", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/P3GRLG/", "id": 41566, "guid": "e4913906-942e-57ce-b717-2f2bef45b2c1", "date": "2024-04-23T14:45:00+02:00", "start": "14:45", "logo": null, "duration": "00:45", "room": "A1", "slug": "pyconde-pydata-2024-41566-from-llm-as-oracle-to-llm-as-translator-our-journey-from-theory-to-everyday-s-practice-in-a-corporate-setting-with-dmgpt-and-python-", "title": "From LLM as oracle to LLM as translator - our journey from theory to everyday\u2019s practice in a corporate setting with dmGPT (and python)", "subtitle": "", "track": "PyData: Generative AI", "type": "Talk (long)", "language": "en", "abstract": "Last year, dm-drogeriemarkt was among the first big German companies launching a tool for the coworkers to be able to unlock the power of LLMs in a secure setting. At the beginning, dmGPT was only a user interface pointing to a private instance of a foundation Model. \r\nListening to the needs of our colleagues, we quickly learned that this \u201cnaked\u201d model \u2013 a super powerful NLP Model that can help them processing text - is not really what they needed: they needed a trustworthy, knowledge-rich assistant to help them accomplish their daily tasks. \r\nIn our journey towards this goal, we used python to shift the LLM\u2019s role in dmGPT: from being the motor and only source of answers to being a translator between the user\u2019s input in natural language and multiple software systems, the steering wheel that helps humans drive the flow. \r\nToday, dmGPT is not only a statistical parrot anymore, now it is an open platform powered by internal knowledge.\r\n\r\nIn this talk we want to share with you the learnings and insights we gained while designing and implementing the new dmGPT.", "description": "One of the biggest challenges of working in such a large organization like dm is finding the information you need to accomplish your tasks: distributed organization units, multiple knowledge sources, and different tools make it very challenging to know where to find information whose location you don\u2019t know. Most of the times, the best way to find something out is to ping a more experienced colleague and ask them. But what if you could ping your AI-Powered copilot and find out? Not only that\u2026 What if it also helped you create content for your specific product without you telling it everything about the product? What if it was able to help you write code using internal tools? What if it could help you have an insight of your internal data? \r\n\r\nAfter its first steps in summer 2023, our vision for dmGPT quickly developed to it becoming a truly helpful assistant for every coworker of dm. Since then, we have contributed to the design and implementation of an LLM-powered platform that aims to achieve this goal. To come a step closer, we had to rethink the role of the LLM, picturing it as a translator between natural languages and software systems and back. Now, it helps us map an instruction in natural language to a set of tools needed to accomplish the given task and construct a coherent answer based on the provided data. \r\n\r\nIn the design we had to face multiple challenging questions, such as: \r\n-\tHow to connect multiple, heterogenic data sources?\r\n-\tHow to pick an LLM for a given task?\r\n-\tWhich LLM do we support?\r\n-\tHow do we build a user friendly, dynamic and configurable user interface?\r\n-\tHow to measure the system\u2019s quality?\r\n\r\n\r\nIn this talk we would like to provide a technical insight to our journey, discussing architectural decisions as well as implementation dilemmas, and engage in a discussion with the community about the steps to come.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "d98bb634-fb62-5da4-90a6-738a4b9bf650", "id": 38285, "code": "TNJ3S9", "public_name": "Emma Haley", "avatar": "https://pretalx.com/media/avatars/TNJ3S9_M5j47q7.png", "biography": "Half colombian, half spanish half german data person with particular interest in natural languages. Translator & Computer Scientist by training, pythonista by heart.", "answers": []}, {"guid": "852facf5-6503-5401-8540-670d0dcbe6f8", "id": 38286, "code": "HEC8FE", "public_name": "Niklas Lederer", "avatar": "https://pretalx.com/media/avatars/HEC8FE_CxSyGVf.jpg", "biography": "I'm a data person with a focus on machine learning and data science but also business background. Working for a few years now in the data world - especially with Python - I'm now the product lead for a team called Customer genAI Incubator at dmTECH which is the IT subsidiary of dm-drogerie markt.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/MTVWQM/", "id": 42572, "guid": "57b74fb1-e6f7-51c1-a252-043695ad988b", "date": "2024-04-23T16:00:00+02:00", "start": "16:00", "logo": null, "duration": "00:30", "room": "A1", "slug": "pyconde-pydata-2024-42572-safeguarding-privacy-and-mitigating-vulnerabilities-navigating-security-challenges-in-generative-ai", "title": "Safeguarding Privacy and Mitigating Vulnerabilities: Navigating Security Challenges in Generative AI", "subtitle": "", "track": "PyData: Generative AI", "type": "Talk", "language": "en", "abstract": "Generative AI (GenAI) has significantly improved our daily lives, prompting a focus on its integration into products and our routines. However, the growing importance of GenAI brings along significant concerns regarding privacy and vulnerability.\r\n\r\nThis talk delves into the critical issues surrounding the protection of private data and the security of GenAI systems. We'll begin by understanding the fundamental differences between data privacy and data security. Drawing insights from real-life data breaches and compromised information in major companies, we'll explore the mistakes made and the steps taken to rectify them. Throughout the discussion, we'll analyze the challenges faced by GenAI in ensuring data privacy and security across various stages of an LLM project.\r\n\r\nFurthermore, the talk will shed light on how prominent companies building GenAI are working to reduce the impact of data privacy and security concerns within their models. Additionally, we'll explore strategies for individuals, like ourselves, using GenAI, to enhance data privacy and security when integrating it into our products or daily lives. Finally, the role and significance of government regulations in ensuring the safety and security of GenAI will be emphasized.", "description": "In the ever-evolving landscape of Generative AI (GenAI), privacy and security have emerged as paramount concerns, echoing the necessity for comprehensive frameworks and collaborative initiatives. The session kicks off with an interactive segment, aiming to gauge the audience's familiarity and involvement with GenAI, ensuring the discussion aligns with their varying levels of expertise and engagement.\r\n\r\nFundamental concepts of Data Privacy and Data Security are meticulously delineated, elucidating the responsible handling and fortification of personal information. A visual aid in the form of a Venn diagram underscores the intricate interplay between these two crucial facets, facilitating a deeper understanding for the audience.\r\n\r\nTransitioning to the domain of GenAI, the discourse delves into the indispensable need for data privacy throughout the lifecycle of GenAI models. Instances of ethical and legal concerns arise during the training phase, where datasets often contain potentially sensitive personal information sourced from the internet. Real-world cases such as disputes between media entities like The New York Times and AI organizations like OpenAI exemplify these dilemmas.\r\n\r\nMoreover, the session critically scrutinizes data privacy concerns during GenAI production, focusing on the policies adopted by AI companies regarding prompt-related data retention. Here, certain AI entities retain prompt records for extended durations, which can pose potential privacy risks. In response, initiatives such as enterprise versions of GenAI models, like those offered by OpenAI, provide users with enhanced control over data usage, reinforcing a more privacy-centric approach.\r\n\r\nSimultaneously, the discussion navigates through the dimensions of data security risks inherent in GenAI models during operational phases. The potential extraction of sensitive personal data from these models poses substantial risks, given GenAI's proclivity to retain information from its training data. Academic research papers, like \"Scalable Extraction of Training Data from (Production) Language Models,\" delve into these vulnerabilities, highlighting the complexity of data security challenges in GenAI.\r\n\r\nFurther enriching the discourse, the session showcases the top ten vulnerabilities in GenAI, as identified by insights from OWASP. These vulnerabilities encompass a wide array of risks, from prompt injection and insecure output handling to training data poisoning and supply chain vulnerabilities.\r\n\r\nTo culminate the discussion, actionable strategies to fortify data protection within GenAI are proposed. These encompass leveraging Open Source GenAI solutions like LLAMA, recognized for their transparency, although they may come with higher maintenance costs. Additionally, anonymizing data before prompt utilization emerges as a proactive measure, albeit posing certain operational challenges.\r\n\r\nMoreover, the session underscores the pivotal role of government regulations in safeguarding citizen data and establishing policies binding on GenAI companies. Recent regulations from governments like the US, UK, and other countries emphasize the need for AI systems to be 'secure by design,' promoting robust data protection measures. Collaborative efforts among companies also come to the forefront, exemplified by initiatives like the \"AI Alliance\" formed by IBM, Meta, and 50 other organizations. These alliances aim to advance open-source AI while fostering collective processes for data protection and security.\r\n\r\nIn conclusion, this comprehensive session aims to empower attendees with a holistic understanding of privacy and security challenges in the GenAI domain. The discourse, enriched with real-world instances, legal dilemmas, academic insights, and industry perspectives, seeks to equip individuals and organizations with actionable insights. The objective is to navigate the complex terrain of GenAI, fostering a more privacy-aware and secure integration into our lives and technological ecosystems.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "bc2ec8f8-cb8a-5e0d-a30c-92a52ceb4617", "id": 38683, "code": "3VHXXW", "public_name": "John Robert", "avatar": "https://pretalx.com/media/avatars/3VHXXW_o27HYd2.jpg", "biography": "John Robert is a Senior Machine Learning Engineer at Condo GMBH, boasting five years of expertise in machine learning. Their focus lies in deploying models while prioritizing data privacy and security. With prior experience at Daimler (Mercedes Benz) and Bosch Autonomous Driving, Robert has a rich background in automotive AI.\r\n\r\nPassionate about innovation, Robert actively participates in Hackathons and is a valued member of the MLOps community, contributing to advancements in AI technology and fostering collaboration.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/QLXUHY/", "id": 40918, "guid": "850553fa-15cc-5970-b9d6-76e76c12cfae", "date": "2024-04-23T16:35:00+02:00", "start": "16:35", "logo": null, "duration": "00:30", "room": "A1", "slug": "pyconde-pydata-2024-40918-breaking-ai-boundaries-fairness-metrics-in-unstructured-data-domains", "title": "Breaking AI Boundaries: Fairness Metrics in Unstructured Data Domains", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Stats", "type": "Talk", "language": "en", "abstract": "This presentation addresses the rare use of machine learning fairness metrics in domains with indirect human impact, e.g., automotive engineering. We briefly map out the space of use cases to examine the necessity, potential benefits, and challenges of applying fairness-related techniques. The main focus then lies on proposing solutions for overcoming identified hurdles, especially regarding the application in unstructured data domains, such as image and audio recognition and large text document analysis. Our approach includes strategies for detecting key subgroups and providing clear explanations for model failures. We also highlight two open-source tools, Sliceguard and Spotlight, for practical implementation.", "description": "Fairness Metrics are already widely used to avoid unwanted bias in machine learning models. However, although fairness is a hot topic, it is primarily used in domains where the models' interface and influence on humans are obvious. In other domains with a less obvious connection between model decisions and their impact on human beings, they are rarely seen (e.g., automotive engineering applications, etc.). This poses three questions:\r\n\r\n1. In those domains, is it really unnecessary to use fairness techniques, or is their absence endangering individuals in a less obvious way? (necessity)\r\n2. Even if a use case does not need fairness techniques, wouldn't the use cases still benefit from a look through the \"Fairness lens\" and the connected methods and tools? (benefit)\r\n3. Besides having less strong implications for using fairness metrics, what obstacles keep people from using them, and how can we mitigate them? (obstacles and solutions)\r\n\r\nTo answer these questions, our presentation will first briefly compare five prototypical engineering use cases and categorize them according to the above criteria (necessity, benefit, obstacles). This first part mainly aims to map out the space of machine learning use cases in the engineering domain and suggest possible reasons why fairness-related techniques are not applied in those areas.\r\n\r\nWe will then mainly focus on further analyzing those obstacles and providing solutions to omit them. Here, the main focus will be expanding the application of fairness-based model evaluation to unstructured data domains. Typical use cases in this category go from image and audio recognition to LLM applications with large text documents. We will provide a brief theoretical overview of strategies to make fairness metric application suitable and then go through a concrete example down to the implementation level. For that, we will touch on important subjects, such as detecting meaningful subgroups in unstructured data, extracting easy-to-grasp explanations for model failures, and interactive analysis of model predictions. This section will also feature two open-source tools to address these challenges: Sliceguard and Spotlight.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "d7b53794-03fe-506d-be63-634a571f503b", "id": 37652, "code": "NMKCDV", "public_name": "Daniel Klitzke", "avatar": "https://pretalx.com/media/avatars/NMKCDV_WUeyDLe.jpeg", "biography": "I'm a seasoned AI professional with an additional background in software engineering and web development. Having participated in and led quite a few machine learning-based projects in the engineering domain, I've worked on various ML problems, ranging from ML on images and 3D data to audio and time series analysis. My expertise in software engineering makes it easy for me to bring ML solutions to production. Currently, my focus is on helping people build effective ML models, from planning out projects to creating performant models and productionizing them. I'm particularly passionate about data curation, and I'm excited to be a part of the team building Renumics Spotlight, a free data curation tool.", "answers": []}], "links": [], "attachments": [], "answers": []}], "A03-A04": [{"url": "https://pretalx.com/pyconde-pydata-2024/talk/RMZLKZ/", "id": 41577, "guid": "df6683f4-32cf-5b9c-81c0-d264b8d48ed3", "date": "2024-04-23T10:30:00+02:00", "start": "10:30", "logo": null, "duration": "01:30", "room": "A03-A04", "slug": "pyconde-pydata-2024-41577-using-ml-to-find-out-the-why-a-tutorial-in-causal-machine-learning", "title": "Using ML to find out the \"Why\"? A Tutorial in Causal Machine Learning", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Stats", "type": "Tutorial", "language": "en", "abstract": "Machine learning is mostly used for predicting outcome variables. But in many cases, we are interested in causal questions: Why do customers churn? What is the effect of a price change on sales? How can we optimize personalized marketing campaigns or medical treatments?\r\n\r\nThis tutorial introduces participants to the field of Causal Machine Learning (Causal ML). We will start with a basic motivation of causal analysis and share insights on how to recognize causal questions in data science. We will dive into the basics of Causal ML: Why can't we simply use of-the-shelf ML methods to answer causal questions? The tutorial will focus on the Double Machine Learning approach and demonstrate the use of Causal ML with the Python library DoubleML (Bach et al., 2022). The general introduction will be complemented by hands-on data examples and interactive discussion and Q&A sessions. The tutorial is a great starting point for participants to discover Causality/Causal ML and start their own causal data science projects.\r\n\r\nReferences \r\n\r\nBach, P., Chernozhukov, V., Kurz, M. S., and Spindler, M. (2022), DoubleML - An Object-Oriented Implementation of Double Machine Learning in Python, Journal of Machine Learning Research, 23(53): 1-6, https://www.jmlr.org/papers/v23/21-0862.html", "description": "The tutorial will be organized in three blocks.\r\n\r\n1) Introduction and motivation\r\n\r\nWe will point out why Causality matters in data science. Many problems managers and data scientists are facing are causal. When organizations and companies want to optimize their marketing campaigns, their financial planning, pricing scheme they usually run into causal considerations: How much do my sales decrease if we increase the price by X%? How can I send out email newsletters to those who like them and avoid to annoy other subscribers? \r\n\r\nCausal Inference and Causal ML offer powerful tools that help to formalize and model things that are usually discussed only on an intuitive basis: Are the people who opened my newsletters really comparable to those who haven't? Can I just compare the convergence rates of these groups when I want to evaluate the newsletters's effectiveness? \r\n\r\n2) Introduction to Causal Machine Learning with DoubleML\r\n\r\nCausal Machine Learning offers tools to estimate causal relationships with SOTA ML algorithms. We will offer an introduction to the Double Machine Learning approach (Chernozhukov et al., 2018). This introduction will be aligned with several data examples and code demonstrations using the Python package DoubleML, https://docs.doubleml.org/stable/index.html . DoubleML is an open source package that offers various tools to estimate causal effects, for example for estimation of heterogeneous treatment effects (like in personalized marketing or personalized medicine).\r\n\r\n3) Hands-on Session: Data Example\r\n\r\nThe tutorial featues a data  projects that participants can solve on their own. With the hands-on session participants already get started on their own Causality learning journey :) Participants are invited to apply DoubleML to their own data example and play around with the package features. The hands-on session will follow the structure of the DoubleML workflow, which guides analysts through the process of causal inference with DoubleML, https://docs.doubleml.org/stable/workflow/workflow.html.\r\n\r\n4) Discussion and Q&A\r\n\r\nThe tutorial conlcudes with a discussion and Q&A session. We are looking forward to participants' comments and ideas. We appreciate fedback of the Python community on the DoubleML package :)", "recording_license": "", "do_not_record": true, "persons": [{"guid": "86e70cc8-870d-54a1-bcc6-5be8be916d9e", "id": 43409, "code": "8ZFRKM", "public_name": "Oliver Schacht", "avatar": "https://pretalx.com/media/avatars/8ZFRKM_769sN5M.jpg", "biography": "I am a PhD candidate at the University of Hamburg, passionately researching within the field of Causal Machine Learning. As part of my research activities, I am also a contributing developer to DoubleML, which is a toolbox for causal predictions with ML.", "answers": []}, {"guid": "9887d6cc-95d8-5e51-af88-cb9f3767ab8f", "id": 43410, "code": "HMFBCS", "public_name": "Jan Teichert-Kluge", "avatar": "https://pretalx.com/media/avatars/HMFBCS_GbbBsQ2.jpg", "biography": "My name is Jan and I work as a research associate at the University of Hamburg, where I am studying for my PhD in statistics and data science. I have a master's degree in industrial engineering and together with my experience from industry, I have a strong application-oriented background. \r\nI have contributed to the DoubleML package for Python and my research focuses on Causal ML for unstructured data such as text and images.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/XBUHCK/", "id": 41605, "guid": "386aa515-c055-583a-9f93-c3ebb58a3ce1", "date": "2024-04-23T14:05:00+02:00", "start": "14:05", "logo": null, "duration": "01:30", "room": "A03-A04", "slug": "pyconde-pydata-2024-41605-performant-scientific-computation-in-python-and-rust", "title": "Performant, scientific computation in Python and Rust", "subtitle": "", "track": "PyCon: Programming & Software Engineering", "type": "Tutorial", "language": "en", "abstract": "A tutorial session on how to build scientific packages for numerical calculus and algorithms in Python and Rust. It walks through the process of packaging with a modern tool stack, introduces the concept of vectorization for efficient computation in Python in the context of classical Machine Learning, and shows how the package can be optimized with extensions written in Rust.", "description": "The Rust programming language gained a lot of attention over the last years, and began to slowly infiltrate the Python ecosystem with an ever-increasing number of tools and libraries in the Python ecosystem such as Ruff and Polars which are implemented in this language. Unlike Python, Rust is a system language optimized for performance and memory safety, and some consider it the spiritual successor of C++. Despite its steep learning curve, it is the perfect candidate for extending Python and its ecosystem when performance matters, in a modern and memory-safe language.\r\n\r\nThis session demonstrates the path of creating a scientific package in python (following best practices and modern tools) and gradually migrating parts of it to Rust for additional performance gains. The use case is a naive implementation of the \"Expectation maximization for Gaussian Mixture Models\" algorithm from scratch, a relatively simple yet efficient machine learning method. The session addresses the following points: How to build a Python package with a modern tools set, how to translate a numerical algorithm into vectorized Python, and optimize the package with a performant Rust implementation of the critical parts. Prior knowledge of Rust or the algorithm is not required. Note that the goal is not to learn Rust in this single session (this requires at least three days) but rather to provide a superficial overview on what makes this language so great and well-suited for extending Python.\r\n\r\nParticipants are advised to follow the clone the repository below and follow the installation instructions to avoid longer download times during the session.\r\nhttps://github.com/StefanUlbrich/PyCon2024", "recording_license": "", "do_not_record": false, "persons": [{"guid": "4e42b5bd-2086-50fe-935c-1aae24b48200", "id": 26119, "code": "ABGRDC", "public_name": "Stefan Ulbrich", "avatar": "https://pretalx.com/media/avatars/ABGRDC_Kj69VLU.png", "biography": "I am a researcher and programmer with a passion for geometrical methods especially folding and unfolding in machine learning and robotics. My interests are computational cognition, robotics, bioinformatics, machine learning and data science. I am an experienced Pythonista and Rustacean", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/8C83EA/", "id": 41691, "guid": "92e9be11-09a6-5334-a5ac-2c05b7522668", "date": "2024-04-23T15:50:00+02:00", "start": "15:50", "logo": null, "duration": "01:30", "room": "A03-A04", "slug": "pyconde-pydata-2024-41691-pyo3-101-writing-python-modules-in-rust", "title": "PyO3 101 - Writing Python modules in Rust", "subtitle": "", "track": "PyCon: Programming & Software Engineering", "type": "Tutorial", "language": "en", "abstract": "In this interactive workshop, we will cover the very basics of using PyO3. There will be hands-on exercises to go from how to set up the project environment to writing a \"toy\" Python library written in Rust using PyO3. We will cover a lot of specifications of the API provided by PyO3 to create Python functions, modules, handling errors and converting types.\r\n\r\n---\r\n\r\n## Preflight checklist\r\n\r\n- [Install/ Update Rust](https://www.rust-lang.org/tools/install)\r\n- Make sure having Python 3.8 or above (recommend 3.12)\r\n- Make sure using virtual environment (recommend pyenv + virtualenv)\r\n\r\n*In this workshop we recommend using Unix OS (Mac or Linux) If you have to use Windows, you may encounter problems with Rust and Maturin. You may want to install a VM like [VirtualBox](https://www.virtualbox.org/) for developing Python libraries with PyO3.*\r\n\r\n## Setting up\r\n\r\n Set up virtual environment and install **maturin**\r\n\r\n```\r\npyenv virtualenv 3.12.2 pyo3\r\npyenv activate pyo3\r\npip install maturin\r\n```", "description": "In recent years, Rust has been getting more and more popular over other similar programming languages like C and C++ due to its robust compiler checking and ownership rules to make sure memory is safe. Hence there are more and more Python libraries that have been written in Rust natively with a Python API interface. One of the tools that have been driving this movement is PyO3, a toolset that proves Rust bindings for Python and tools for creating native Python extension modules.\r\n\r\nIn this interactive workshop, we will cover the very basics of using PyO3. There will be hands-on exercises to go from how to set up the project environment to writing a \"toy\" Python library written in Rust using PyO3. We will cover a lot of specifications of the API provided by PyO3 to create Python functions, modules, handling errors and converting types.\r\n\r\n## Goal\r\n\r\nTo give developers who are not familiar with PyO3 an introduction to PyO3 so they can consider building their Python libraries with Rust to make use of Rust's memory-safe property and parallelism ability.\r\n\r\n## Target audiences\r\n\r\nAny developers who are interested in developing Python libraries using Rust. It will be an advantage if the attendees are comfortable writing in Rust. However, attendees are not required to be familiar with Rust as all the Rust codes will be provided. Basic knowledge of Python will be assumed from the attendees.\r\n\r\n## Outline\r\n\r\nPart 1 - introduction and getting started (40 mins)\r\n- What's the difference between Rust and Python (5 mins)\r\n- Why using PyO3 (5 mins)\r\n- Setting up the environment (exercises) (15 mins)\r\n- Starting a new project (exercises) (15 mins)\r\n\r\nBreak (15 mins)\r\n\r\nPart 2 - Creating a simple Python library (50 mins)\r\n- Creating Python modules (exercises) (20 mins)\r\n    - Generating documentation\r\n- Creating Python functions (exercises) (30 mins)\r\n   - How to create function signatures\r\n   - How to deal with errors", "recording_license": "", "do_not_record": false, "persons": [{"guid": "716d26c2-170b-5a5e-86e5-9d4cecf3bbdd", "id": 54, "code": "8EGVC9", "public_name": "Cheuk Ting Ho", "avatar": "https://pretalx.com/media/avatars/8EGVC9_EpBXtRy.jpg", "biography": "After having a career as a Data Scientist and Developer Advocate, Cheuk dedicated her work to the open-source community and working as a community manager at OpenSSF. She has co-founded Humble Data, a beginner Python workshop that has been happening around the world. She has served the EuroPython Society board for two years and is now a fellow and director of the Python Software Foundation.", "answers": []}], "links": [], "attachments": [], "answers": []}], "A05-A06": [{"url": "https://pretalx.com/pyconde-pydata-2024/talk/KCC9EF/", "id": 41639, "guid": "8b3e1bf8-04a2-58f5-8942-5d6cc728b941", "date": "2024-04-23T10:30:00+02:00", "start": "10:30", "logo": null, "duration": "01:30", "room": "A05-A06", "slug": "pyconde-pydata-2024-41639-bulletproof-python-property-based-testing-with-hypothesis", "title": "Bulletproof Python - Property-Based Testing with Hypothesis", "subtitle": "", "track": "PyCon: Testing", "type": "Tutorial", "language": "en", "abstract": "Do you find yourself working through pages of copied and pasted tests to accommodate a simple code change? Does your software frequently break in unexpected ways despite your testing efforts? Don\u2019t despair! Property-based testing could be your way out of that mess. Rather than working harder and writing more test code, property-based testing forces you to work smarter and test more code with fewer tests.", "description": "Traditional tests are example-based. They require the developer to come up with arbitrary inputs and check a system\u2019s behaviour against explicit outputs. More often than not, developers only think of inputs that are handled correctly by their code, thus leaving bugs hidden. Property-based tests generate the inputs for you and in many cases they\u2019re more likely to find invalid inputs than humans. The difficulty lies in formulating these test cases.\r\n\r\nAfter this workshop you\u2019ll be comfortable with property-based testing using Hypothesis. You\u2019ll have experience requesting appropriate test data from Hypothesis and in writing tests for common and more advanced properties. At work, your co-workers will be impressed by your unbreakable code ;)\r\n\r\nParticipants are expected to have basic familiarity with unit testing and a testing framework. Provided code examples use pytest.\r\n\r\nPlease set up the workshop material in advance. To do that, navigate to the Git repository linked in the supporting material section and follow the setup instructions in the README file.", "recording_license": "", "do_not_record": true, "persons": [{"guid": "5bd0a0b8-ef43-5845-be70-7b13274d589f", "id": 16134, "code": "A7BNZH", "public_name": "Michael Seifert", "avatar": "https://pretalx.com/media/avatars/A7BNZH_hPEgeuD.png", "biography": "Michael is a trainer and consulting software engineer who helps product teams develop Python software in the cloud. He enjoys deleting code more than writing it and is constantly looking for ways to make software easier to maintain.\r\n\r\nMichael published his first FOSS project in 2006. He's the currently a maintainer of pytest-asyncio and happens to be a Shuffle Dance enthusiast.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/PKJHBA/", "id": 41459, "guid": "20188492-dce9-56d0-8702-62e28a8fe99b", "date": "2024-04-23T14:05:00+02:00", "start": "14:05", "logo": null, "duration": "01:30", "room": "A05-A06", "slug": "pyconde-pydata-2024-41459-functional-python", "title": "Functional Python", "subtitle": "", "track": "PyCon: Programming & Software Engineering", "type": "Tutorial", "language": "en", "abstract": "Python supports multiple programming paradigms. In addition to the procedural and object-oriented approach, it also provides some features that are typical for functional programming.\r\n\r\nWhile these features are optional, they can be useful to create better Python programs. This tutorial introduces Python features that help to implement parts of Python programs in the functional style. Objective is not to write pure functional programs but improve programs design by using functional feature where suitable.\r\n\r\nThe tutorial points out advantages and disadvantages of functional programming in general and in Python in particular. Participants will learn alternative ways to solve problems. This will broaden their programming toolbox.", "description": "## Audience\r\n\r\nIntermediate Python programmers who like to learn more about functional programming and its application Python.\r\n\r\n## Format\r\n\r\nThe tutorial will be hands-on. I will use JupyterLab and will start with an empty Jupyter Notebook. I will unroll the tutorial content by typing. In addition, I will distribute scripts before the tutorial to avid too lengthly typing. I will load these scripts one by one into a Notebook.\r\n\r\nParticipants will have the opportunity to type along. I am a rather slow typer. In addition, I will stop typing often to explain. This gives most participants plenty of time to follow along. The PDF handout is very comprehensive and contains most of what I type. This allows students to pick if they should fall behind.\r\n\r\n## Outline\r\n\r\n* Functional programming basics (10 min)\r\n    * Overview programming paradigms\r\n    * Features of functional programming\r\n    * Advantages of functional programming\r\n    * Disadvantages of functional programming\r\n    * Python's functional features - overview\r\n* Pure functions (5 min)\r\n* Callables and functions in Python (20 min)\r\n    * Callables\r\n    * Closures\r\n    * \"Currying\"\r\n    * Partial functions\r\n    * Recursion\r\n    * Lambda\r\n    * Single Dispatch\r\n* No Loops - map, filter, and reduce (10 min)\r\n    * Processing iterables with map\r\n    * Select from iterables with filter\r\n    * Reductions of iterables with reduce\r\n* Operators as Functions (10 min)\r\n    * Arithmetic operators\r\n    * Logical operators\r\n    * Attribute access\r\n    * Lookup\r\n* Comprehensions (15 min)\r\n    * Simple\r\n    * Nested\r\n    * Dictionary comprehensions\r\n    * Set comprehensions\r\n* Iterators (15 min)\r\n    * Itertools\r\n        * Infinite iterators\r\n        * Iterators terminating on the shortest input sequence\r\n        * Combinatoric iterators\r\n* External tools (5 min)\r\n    * More itertools\r\n    * Toolz", "recording_license": "", "do_not_record": false, "persons": [{"guid": "c84d882a-39c5-51be-b6d5-7b7408e7002b", "id": 4, "code": "9KSJ3K", "public_name": "Mike M\u00fcller", "avatar": "https://pretalx.com/media/avatars/9KSJ3K_J9AkmrQ.jpg", "biography": "I've been a Python user since 1999, teaching Python professionally since 2004.\r\nI am also active in the community, organizing Python conferences such as\r\nPyCon DE, EuroSciPy, and BarCamps.\r\nI am a PSF Fellow and chair of the German Python Software Verband.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/UPSJEM/", "id": 41694, "guid": "466e1f5a-36c0-51a0-96db-ecb17693ba8f", "date": "2024-04-23T15:50:00+02:00", "start": "15:50", "logo": null, "duration": "01:30", "room": "A05-A06", "slug": "pyconde-pydata-2024-41694-boost-your-data-science-skills-with-the-new-python-in-excel", "title": "Boost your Data Science skills with the new Python in Excel", "subtitle": "", "track": "PyData: PyData & Scientific Libraries Stack", "type": "Tutorial", "language": "en", "abstract": "Python in Excel is the new integration created by Microsoft that brings Python programming directly into Excel workbooks, for advanced data analytics. With Python in Excel, it is now possible to embed Python code directly into workbook cells, very easily, and with zero setup required.\r\nIn this tutorial, we will explore the many features and capabilities this new integration provides, to unlock unprecedented data science and machine learning use cases in Excel.", "description": "Python in Excel is the new integration created by Microsoft that brings Python programming directly into Excel workbooks, for advanced data analytics.\r\n\r\nWith Python in Excel, it is now possible to embed Python code directly into workbook cells, very easily, and with zero setup required. In fact, all the Python code runs automatically in the Microsoft Cloud, and leverages on the Python Anaconda Distribution to get immediate access to a vast selection of packages to unlock unprecedented use cases in data science, data visualization, and machine learning. \r\n\r\nThe output of each execution is automatically integrated into the spreadsheet, creating interactive data reports to share with customers and other users.\r\n\r\nThe new feature is currently available in _public preview_ to **all users** running the MS Excel Beta Channel on Windows. \r\n\r\nIn this tutorial, we will explore the many features and capabilities this new integration provides, to unlock unprecedented data science and machine learning use cases in Excel. First, we will familiarize with the new environment, understanding its execution model, and the differences from standard Python programs. Afterwards, we will work on several examples to demonstrate the potential of using Python directly into the workbook to filter, validate, wrangle and visualize our data. We will conclude our tutorial by creating a full-fledged machine learning experiment directly into Excel.\r\n\r\nFamiliarity with Excel and the Python language is the only requirement necessary to attend this tutorial.\r\n\r\n\r\n## Setup Instructions\r\n\r\n**Python in Excel** is currently available (_for free_) to MS Excel users using **Windows** operating system.\r\n\r\n### Non-Windows Users\r\n\r\nIf you are not running on Windows, it is strongly recommended to install a version of Windows on a virtual machine (VM) using any solution that works on your operating system. \r\nFor example, [Parallels](https://www.parallels.com/products/desktop) for mac OS users, or [VirtualBox](https://www.virtualbox.org/) for Linux users. \r\n\r\n### Setup Python in Excel for Windows\r\n\r\nTo use the _new_ \"Python in Excel\" feature, it is required to join the [Microsoft 365 Insider Program](https://support.microsoft.com/en-gb/office/get-started-with-python-in-excel-a33fbcbe-065b-41d3-82cf-23d05397f53d#:~:text=Microsoft%20365%20Insider%20Program) and choose the Beta Channel Insider level. \r\nYou can find more detailed instructions on [Get Started with Python in Excel](https://support.microsoft.com/en-gb/office/get-started-with-python-in-excel-a33fbcbe-065b-41d3-82cf-23d05397f53d).\r\n\r\n### (Optional) Install Excel Labs plugin\r\n\r\n[Excel Labs](https://appsource.microsoft.com/en-us/product/office/wa200003696?tab=overview) is an add-in that includes experimental Excel features. Among these features, it provides **Python editor**: A notebook-like interface designed for authoring Python in Excel.\r\n\r\nExcel lab is **not** required, but strongly recommended to have a better working and development experience with Python in Excel. \r\n\r\n### Data Download\r\n\r\nOnce all the setup operations are completed, please download the [Financial Sample Excel Workbook](https://go.microsoft.com/fwlink/?LinkID=521962). \r\n\r\nWe will use this data file as our gym playground to familiarise with the new feature.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "c134f3ca-0c60-534f-9518-76e8b442956a", "id": 24927, "code": "GHGDNR", "public_name": "Valerio Maggio", "avatar": "https://pretalx.com/media/avatars/GHGDNR_GVEbdTI.jpg", "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}]}}, {"index": 3, "date": "2024-04-24", "day_start": "2024-04-24T04:00:00+02:00", "day_end": "2024-04-25T03:59:00+02:00", "rooms": {"Kuppelsaal": [{"url": "https://pretalx.com/pyconde-pydata-2024/talk/PLRERM/", "id": 44829, "guid": "24dc6d6d-da5d-5f02-b4cf-97098c7f3323", "date": "2024-04-24T09:15:00+02:00", "start": "09:15", "logo": null, "duration": "00:45", "room": "Kuppelsaal", "slug": "pyconde-pydata-2024-44829-keynote-ten-key-questions-that-a-company-should-ask-to-have-responsible-ai", "title": "Keynote - Ten Key Questions that a Company Should Ask to have Responsible AI", "subtitle": "", "track": "Plenary", "type": "Keynote", "language": "en", "abstract": "Responsible AI covers mainly AI principles, governance & regulation, but most companies do not know how to implement all of these. Hence, in this presentation we cover the key questions for the whole process behind a new AI product, from the idea and design to the development and deployment. The questions are partly based on the new ACM Principles for Responsible Algorithmic Systems (2022) where he is one of the two lead authors as well as their extensions for Generative AI (2023). For each question we will discuss its relevance, challenges, and (partial) solutions, triggering an interactive discussion.", "description": "Responsible AI covers mainly AI principles, governance & regulation, but most companies do not know how to implement all of these. Hence, in this presentation we cover the key questions for the whole process behind a new AI product, from the idea and design to the development and deployment. The questions are partly based on the new ACM Principles for Responsible Algorithmic Systems (2022) where he is one of the two lead authors as well as their extensions for Generative AI (2023). For each question we will discuss its relevance, challenges, and (partial) solutions, triggering an interactive discussion.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "aec605e9-d8b0-5d48-9fcf-c371e1f8fea5", "id": 40282, "code": "LFTX87", "public_name": "Ricardo Baeza-Yates", "avatar": "https://pretalx.com/media/avatars/LFTX87_xrwcEco.jpg", "biography": "Ricardo Baeza-Yates is Director of Research at the Institute for Experiential AI of Northeastern University. He is also a part-time Professor at Universitat Pompeu Fabra in Barcelona and Universidad de Chile in Santiago. Before he was the CTO of NTENT, a semantic search technology company based in California and prior to these roles, he was VP of Research at Yahoo Labs, based in Barcelona, Spain, and later in Sunnyvale, California, from 2006 to 2016. He is co-author of the best-seller Modern Information Retrieval textbook published by Addison-Wesley in 1999 and 2011 (2nd ed), which won the ASIST 2012 Book of the Year award. From 2002 to 2004 he was elected to the Board of Governors of the IEEE Computer Society and between 2012 and 2016 was elected to the ACM Council. Since 2010 he has been a founding member of the Chilean Academy of Engineering. In 2009 he was named ACM Fellow and in 2011 IEEE Fellow, among other awards and distinctions. He obtained a Ph.D. in CS from the University of Waterloo, Canada, and his areas of expertise are web search and data mining, information retrieval, bias and ethics on AI, data science and algorithms in general.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/PVLTD3/", "id": 42953, "guid": "1cc5bf52-e451-580a-978c-7ab9c54945dc", "date": "2024-04-24T10:30:00+02:00", "start": "10:30", "logo": null, "duration": "00:30", "room": "Kuppelsaal", "slug": "pyconde-pydata-2024-42953-which-kind-of-software-tests-do-i-really-need-", "title": "Which kind of software tests do I really need?", "subtitle": "", "track": "PyCon: Testing", "type": "Talk", "language": "en", "abstract": "Explore a variety of software testing methodologies, from Manual and A/B Testing to Unit and Performance Tests. Learn how to make informed decisions for enhanced software delivery, matching the unique needs of your projects.", "description": "In the dynamic landscape of software development, choosing the right testing strategy is crucial for delivering high-quality software products. The myriad of available testing methodologies often leaves developers and QA professionals pondering over the question: \"Which kind of software tests do I really need?\"\r\n\r\nThis presentation aims to demystify the world of software testing by exploring various testing approaches and methodologies. From unit testing to system testing, from functional to non-functional testing, each method serves a unique purpose in the software development life cycle. The talk will dive into the factors influencing the selection of appropriate testing methods.\r\n\r\nWe will discuss the advantages and limitations of different testing types, helping participants understand the trade-offs involved in each approach. Practical examples will be presented to illustrate how choosing the right testing strategy can positively impact software quality, development speed, and overall project success.\r\n\r\nParticipants will gain insights into evolving industry best practices and learn how to adapt their testing strategies to meet the demands of modern software development.\r\n\r\nBy the end of the talk, attendees will have a overview of the diverse landscape of software testing and be equipped with the knowledge needed to make informed decisions about which types of tests are most relevant for their specific projects. This presentation aims to empower developers, QA professionals, and project managers to navigate the testing maze and optimize their testing efforts for efficient and effective software delivery.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "dcade1d5-0d8a-5ee3-9606-57562490ed24", "id": 38914, "code": "WMUYJV", "public_name": "Pascal Puchtler", "avatar": "https://pretalx.com/media/avatars/WMUYJV_a5czDEw.jpg", "biography": "I am Pascal Puchtler a freelancer specializing in software testing with Python.\r\n\r\nIn addition, I have various skills such as software architecture, databases, clean code, AI, Scrum, ...\r\n\r\nFurthermore, I have scientific publications with dataset, code and source code.\r\n* HUI-Audio-Corpus-German: A high quality TTS dataset\r\n* Neural SpeechSynthesis in German\r\n* Evaluation of Deep Learning Accelerators for Object Detection at the Edge", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/RKDSK7/", "id": 40984, "guid": "782277dc-832e-59da-a102-486b38cc9b47", "date": "2024-04-24T11:05:00+02:00", "start": "11:05", "logo": null, "duration": "00:30", "room": "Kuppelsaal", "slug": "pyconde-pydata-2024-40984-i-achieved-peak-performance-in-python-here-s-how-", "title": "I achieved peak performance in python, here's how ...", "subtitle": "", "track": "PyCon: Programming & Software Engineering", "type": "Talk", "language": "en", "abstract": "In the ever-evolving landscape of software development, crafting code that not only functions flawlessly but also operates at peak performance is a skill that sets exceptional developers apart. This talk delves into the art of optimizing Python code, exploring techniques and strategies to fine-tune your programs for maximum speed and minimal resource consumption, with a particular focus on memory efficiency.", "description": "In this session, we will embark on a journey and refine the phases of development in python.\r\n1. Functional Execution\r\n2. Rigorous Testing and Accuracy\r\n3. Performance Optimization\r\n\r\nWe will discuss common bottlenecks in unoptimized code\r\n1. inefficient Coding Practices can negatively impact performance\r\n2. Memory Leaks\r\n3. Suboptimal Data Structures and Algorithms\r\n4. Lack of Vectorization\r\n5. Overlooked Parallelization\r\n\r\nWe'll further look into the benefits of profiling the code\r\n1. Profiling the code with cProfile/sentry\r\n2. Profiling the Code with timeit\r\n3. Memory Profiler\r\n\r\nFinally, for data driven application, we'll look into strategies to achieve peak performance\r\n1. Efficient DataFrame Storage with Parquet Files\r\n2. Handling Categorical Data Type\r\n3. Looping Techniques and How to Choose Between Different Looping Techniques?\r\n4. String concatenation (joins and cleanup)\r\n\r\n[Attendees takeaway]\r\nWhether you're a seasoned developer looking to enhance your optimization skills or a newcomer eager to understand the principles behind efficient Python code, this talk offers valuable insights and practical takeaways.\r\n\r\n[Pre-requisites]\r\nBasics of Python\r\n\r\n[who-am-i]\r\nName: Dishant Sethi\r\nEmail: dishantsethi14@gmail.com\r\nPhone no: +919582565371\r\nDesignation: Software Consultant and Founder @prodinit.com\r\n\r\n[Previous Talks]\r\nPyconDE and Pydata Berlin: https://youtu.be/osGGX3tcwkc\r\nGophercon India 2023: https://youtu.be/zuzTN3ibrCM?si=GEo31lE_Q8h4hzTR\r\nPyDelhi: https://youtu.be/6h9I3iyqyu4", "recording_license": "", "do_not_record": false, "persons": [{"guid": "2409a368-70bf-52cf-91e2-f63d4f9f8e7a", "id": 25790, "code": "TSAEPG", "public_name": "Dishant Sethi", "avatar": "https://pretalx.com/media/avatars/TSAEPG_ISYxdzs.jpg", "biography": "Dishant is software engineer who is equipped with the experience in Web Development, Cloud Engineering, DevOps and MLOps. He started Prodinit, a software consultancy, after successfully freelancing for a long period of time.\r\n\r\nTalk to him about:\r\n\u2666 Product Engineering\r\n\u2666 Dev/ML Ops", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/P7AG9A/", "id": 41454, "guid": "7158ce9c-ea89-5c51-aac5-6bf61653604b", "date": "2024-04-24T11:40:00+02:00", "start": "11:40", "logo": null, "duration": "00:30", "room": "Kuppelsaal", "slug": "pyconde-pydata-2024-41454-python-3-12-s-new-monitoring-and-debugging-api", "title": "Python 3.12's new monitoring and debugging API", "subtitle": "", "track": "PyCon: Python Language & Ecosystem", "type": "Talk", "language": "en", "abstract": "Python 3.12 introduced a new low-impact monitoring API with [PEP669](https://peps.python.org/pep-0669/), which can be used to implement far faster debuggers than ever before. This talk covers the main advantages of this API and how you can use it to develop small tools.", "description": "Python long lagged a good monitoring and profiling API. It had only the simplistic sys.settrace API, which had a high overhead and couldn't be configured appropriately. The new API, released in October 2023, will change this by offering a proper fine-grained and well-designed monitoring API while also making the commonly used operations fast.\r\n\r\nThis talk will give you an introduction to the new API and its design major design decisions and show you how you can use it to write a simple debugger from scratch.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "c1ed2cb6-f82b-5ce7-8f0b-d5e9e4aff9ce", "id": 37000, "code": "L8PGG9", "public_name": "Johannes Bechberger", "avatar": "https://pretalx.com/media/avatars/L8PGG9_Ds5mqdS.jpg", "biography": "Johannes Bechberger is a JVM developer working on profilers and their underlying technology in the SapMachine team at SAP. He started at SAP in 2022 after two years of research studies at the KIT in Java security analyses. His work today comprises many open-source contributions and his blog, where he writes regularly on in-depth profiling and debugging topics and works on his JEP Candidate 435 to add a new profiling API to the OpenJDK. He has been an avid Python user for almost 10 years, with a special interest in type systems and debuggers.\r\nSince 2023 he's touring through the meet-ups and conferences of Europe, like JavaZone and Devoxx Belgium to speak on various topics.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/BFYUUJ/", "id": 47339, "guid": "fb403f18-634f-549b-b8c9-65fa52fbcae5", "date": "2024-04-24T13:10:00+02:00", "start": "13:10", "logo": null, "duration": "01:00", "room": "Kuppelsaal", "slug": "pyconde-pydata-2024-47339--pyladies-panel-reflecting-within-challenging-narratives-in-tech-feminism", "title": "(PyLadies Panel) Reflecting Within: Challenging Narratives in Tech Feminism", "subtitle": "", "track": "Plenary", "type": "Panel", "language": "en", "abstract": "For the third year in a role, the PyLadies Panel at PyCon PyData engages with a broader audience on critical issues related to gender disparities, ethics, and the ongoing importance of women-focused tech groups. Adopting unconventional formats, the PyLadies Panel aims to foster meaningful discussions among PyLadies members and the Python community, encouraging open dialogue and community solidarity.", "description": "For the third year in a role, the PyLadies Panel at PyCon PyData engages with a broader audience on critical issues related to gender disparities, ethics, and the ongoing importance of women-focused tech groups. Adopting unconventional formats, the PyLadies Panel aims to foster meaningful discussions among PyLadies members and the Python community, encouraging open dialogue and community solidarity.\r\n\r\nThis year, we propose a structured debate inspired by Lucy Delap\u2019s \u201cFeminisms: A Global History.\u201d The book challenges ethnocentric and exclusive narratives within the feminist movement itself. It calls for a more inclusive and multifaceted understanding of feminism that respects and incorporates the diversity of its expressions and the different challenges faced by women around the world. \r\n\r\nHaving the book as a reference point and inspiration, this panel is an opportunity to critically reflect on these themes and develop actionable strategies for a more equitable future in technology. Designed to dissect and challenge entrenched narratives about feminism in the tech industry, the debate encourages a deep dive into difficult conversations to dismantle binary thinking and uncover nuances in common discourse.\r\n\r\nParticipants and audience members are invited to confront and critique the prevailing frameworks of feminism, particularly the predominance of perspectives that may not fully represent the movement\u2019s global and diverse nature. By acknowledging and addressing these gaps, the debate will explore actionable steps toward inclusivity and equity.\r\n\r\nThrough a debate-style format, panelists will engage in a candid, necessary discussion and exchange of ideas, allowing for both the celebration of feminist achievements and a critical evaluation of ongoing issues. It will provide a platform for voices that have been marginalized or silenced, enabling a constructive dialogue that moves beyond simple dichotomies to foster understanding and progress.\r\n\r\nJoin us as we challenge the status quo, identify systemic flaws, and collaboratively outline the future directions of feminism in technology. This debate is not just about reflection; it\u2019s about taking active steps to ensure that our community is inclusive and representative of all its members.\r\n\r\nPanel with Taniar Allard,  Katherine Jarmul, Naa Ashiorkor Nortey & Cheuk Ting Ho", "recording_license": "", "do_not_record": false, "persons": [{"guid": "18be4213-bd0e-5f2b-9e7e-67efd2cea7cd", "id": 23776, "code": "CZVEE3", "public_name": "Paloma Oliveira", "avatar": "https://pretalx.com/media/avatars/CZVEE3_55lthZI.jpg", "biography": "As a Growth Engineer at Sauce Labs and a passionate advocate for Free and Open Source Software since 2009, I am deeply committed to driving diversity and equity in tech. I co-organize PyLadies Berlin, serve on the Pysv Python Software Verband board, and contribute to the OpenJS Foundation program committee. My approach to development prioritizes tools that meet user needs, focusing on usability and accessibility while avoiding unnecessary complexity in favor of maintenance, standards, and collaboration. I value clear communication and good documentation as a foundation for fostering collaborative environments, emphasizing mutual understanding and cross-team work.  I exercise critical thinking about the technology we create and use, being co-founder of the Zentrum f\u00fcr Netzkunst.", "answers": []}, {"guid": "ffd0574e-11b0-52d1-a847-6a92c5e1ec5e", "id": 233, "code": "K9B9W9", "public_name": "Katharine Jarmul", "avatar": "https://pretalx.com/media/avatars/K9B9W9_gUIiN9l.jpg", "biography": "Katharine Jarmul is a privacy activist and data scientist whose work and research focuses on privacy and security in data science workflows. She works as a Principal Data Scientist at Thoughtworks and author of Practical Data Privacy. She is a passionate and internationally recognized data scientist, programmer, and lecturer.", "answers": []}, {"guid": "716d26c2-170b-5a5e-86e5-9d4cecf3bbdd", "id": 54, "code": "8EGVC9", "public_name": "Cheuk Ting Ho", "avatar": "https://pretalx.com/media/avatars/8EGVC9_EpBXtRy.jpg", "biography": "After having a career as a Data Scientist and Developer Advocate, Cheuk dedicated her work to the open-source community and working as a community manager at OpenSSF. She has co-founded Humble Data, a beginner Python workshop that has been happening around the world. She has served the EuroPython Society board for two years and is now a fellow and director of the Python Software Foundation.", "answers": []}, {"guid": "f7d809a3-aa19-52ae-9dba-59e47ec54524", "id": 35048, "code": "XFHYWN", "public_name": "Naa Ashiorkor Nortey", "avatar": "https://pretalx.com/media/avatars/XFHYWN_6X59yKF.jpg", "biography": "Naa Ashiorkor is a versatile and passionate data scientist with a strong background in UI/UX design. She is currently a Master's in Data Science student and research assistant at Tampere University.\r\n\r\nShe contributes to the tech community through volunteering and mentoring.  She is an active member of PyLadies Ghana and Python Ghana. Also, she is an organizer for PyConDE & PyData Berlin 2024 and Europython 2024.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/DPVJ7K/", "id": 41769, "guid": "9b84ca39-aa0f-5d71-a76d-e25577fc5407", "date": "2024-04-24T14:45:00+02:00", "start": "14:45", "logo": null, "duration": "00:30", "room": "Kuppelsaal", "slug": "pyconde-pydata-2024-41769-async-awaits-mastering-asynchronous-python-in-fastapi", "title": "Async Awaits: Mastering Asynchronous Python in FastAPI", "subtitle": "", "track": "PyCon: Programming & Software Engineering", "type": "Talk", "language": "en", "abstract": "In this talk, we delve into the transformative world of asynchronous programming in Python, tailored specifically for the FastAPI framework. This session will explore the fundamentals of async/await syntax, unveiling how it can optimize the performance and scalability of web applications. \r\n\r\nAttendees will gain practical insights into implementing asynchronous operations in FastAPI, from setting up to handling real-time data processing. This talk is perfect for Python developers eager to harness the power of asynchronous programming to build faster, more efficient web applications. Join us to unlock the full potential of Python's async capabilities within FastAPI's dynamic environment.", "description": "In this 30-minute session, we'll embark on a journey to master asynchronous programming in Python, specifically focusing on its application in the FastAPI framework. The talk is designed to provide a thorough understanding of async/await syntax and its practical use in building efficient, scalable web applications.\r\n\r\n### Timetable:\r\n\r\n#### 1. Introduction to Asynchronous Programming (5 minutes)\r\n- Brief overview of asynchronous programming concepts.\r\n- The importance of async in modern web development.\r\n\r\n#### 2. Understanding Async/Await in Python (5 minutes)\r\n- Deep dive into Python's async/await syntax.\r\n- Key differences between synchronous and asynchronous code.\r\n\r\n#### 3. FastAPI and Asynchronous Python (10 minutes)\r\n- Introduction to FastAPI with a focus on its asynchronous features.\r\n- Demonstrating how FastAPI leverages Python\u2019s async capabilities.\r\n\r\n#### 4. Building an Asynchronous Web App (7 minutes)\r\n- Step-by-step guide on setting up and coding an async web application in FastAPI.\r\n- Best practices for handling asynchronous operations.\r\n\r\n#### 5. Q&A and Wrap-Up (3 minutes)\r\n- Addressing questions from the audience.\r\n- Summarizing key takeaways and concluding the talk.\r\n\r\nJoin us to unlock the power of asynchronous Python in the world of web development and learn how to effectively implement these techniques in your FastAPI projects.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "d85022b4-f2d8-5aa5-8682-d3f2df0dcd64", "id": 16382, "code": "CHTAPC", "public_name": "Bojan Miletic", "avatar": "https://pretalx.com/media/avatars/CHTAPC_ZB4qyOy.jpg", "biography": "Hi there! I'm a seasoned MLOps professional, specializing in bridging the gap between AI/ML concepts and real-world applications. My expertise lies in transforming ML models from theoretical data lab projects into impactful proofs of concept, MVPs, and fully operational products. I'm passionate about demonstrating the tangible value of these models to investors and key decision-makers.\r\n\r\nLeveraging #Python and #AWS, I focus on extracting real business value from ML algorithms. My role extends to mentoring AI scientists in crafting clean, reusable Python code, significantly cutting costs in ML software deployment and development. Think of me as your go-to fractional MLOps expert, guiding your company through the intricate world of ML and DevOps. Join me in exploring how to effectively manage, monitor, and mature ML models in our ever-evolving digital landscape", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/7UYHYP/", "id": 43025, "guid": "0bc1ae34-b54c-5f4c-891c-83b75eb99f22", "date": "2024-04-24T15:20:00+02:00", "start": "15:20", "logo": null, "duration": "00:30", "room": "Kuppelsaal", "slug": "pyconde-pydata-2024-43025-building-accessible-documentation-sites", "title": "Building accessible documentation sites", "subtitle": "", "track": "General: Others", "type": "Talk", "language": "en", "abstract": "Your project's documentation site is one of the first places where new users will interact with your project; as such, it is essential that these are up-to-date, well-organised, and usable and that they cater to newcomers, experienced users, and contributors alike.\r\n\r\nIt is estimated that about 25% of the global population has some sort of disability, and ensuring all folks can use and access your projects and their documentation is paramount and this, of course, includes thinking of and including disabled developers and end-users. \r\n\r\nIn this talk, we will cover some of the basics of web content accessibility and explore some tools and approaches that you can use to ensure your tools and documentation sites are accessible.", "description": "For a long time, there has been a prevailing notion that accessibility should only be considered within front-end web development - the discipline of creating what someone can see or do on a website or web app. However,  accessibility is a holistic practice that covers every aspect of building digital experiences, meaning it is everyone\u2019s concern - whether working on the backend, documentation, CLI, or API levels.\r\n\r\nAs an open-source maintainer, your project\u2019s documentation is one of the primary ways users interact with your tools. Ensuring your documentation is up-to-date is as important as ensuring it is accessible for disabled users to provide an inclusive user experience and bring in new contributors. \r\n\r\nFor the last five years, I have worked on multiple aspects of open-source accessibility, from auditing to remediation and building more accessible tools for end-users, authors, and open-source maintainers. In this talk, I will share practical advice - including tools and workflows - to make your documentation and other user-facing resources, from markdown files to Sphinx documentation sites and Jupyter notebooks,  more accessible to disabled users. \r\n\r\n\r\nAfter this talk, you will better understand how to make your documentation more accessible with minor changes to your workflows or practices, even if you do not have deep accessibility knowledge (yet).\r\n\r\nOutline\r\n\r\n- Context setting [5 mins] - Brief context setting\r\n- Intro to accessibility  [7 mins] - 101 into accessibility - while this will not be a deep dive, we will cover some guidelines and principles applicable to documentation, notebooks, and user-facing resources.\r\n- Contextualising accessibility into documentation [8 mins] - discussing strategies for accessibility auditing, remediation, and implementation within open source documentation\r\nPractical strategies TL;DR [5 mins] \r\n- Summarise  best practices and tools for OSS documentation accessibility \r\n- Q/A with the audience [5 mins]", "recording_license": "", "do_not_record": false, "persons": [{"guid": "f9849d9c-b12c-5e6d-b79d-fbc257982a5c", "id": 1483, "code": "M9T83C", "public_name": "Dr. Tania Allard", "avatar": "https://pretalx.com/media/avatars/M9T83C_G8zg0p9.jpg", "biography": "Tania is the Director of Quansight Labs and a PSF fellow and director. She is also a member of the PyLadies Global council and a long time Pythonista.", "answers": []}], "links": [], "attachments": [], "answers": []}], "B09": [{"url": "https://pretalx.com/pyconde-pydata-2024/talk/KCYDM9/", "id": 44952, "guid": "5c20dac9-3609-5b5b-8442-74d2b9886268", "date": "2024-04-24T10:30:00+02:00", "start": "10:30", "logo": null, "duration": "00:30", "room": "B09", "slug": "pyconde-pydata-2024-44952-prescriptive-analytics-in-the-python-ecosystem-with-gurobi", "title": "Prescriptive Analytics in the Python Ecosystem with Gurobi", "subtitle": "", "track": "Sponsor", "type": "Sponsored Talk", "language": "en", "abstract": "Join us as we guide you through integrating Gurobi and prescriptive analytics into your greater Python ecosystem. We\u2019ll demonstrate model-building patterns based on NumPy and SciPy.sparse data structures and explore how to take advantage of indexed DataFrames and Series in pandas for mathematical model building. You\u2019ll also discover how to use trained regressors from scikit-learn as constraints in optimization models. Join us as we delve into the world of optimization with Gurobi and elevate your workflows.", "description": "Gurobi is a prescriptive analytics technology that enables you to make optimal decisions from data. You can use prescriptive analytics to generate optimized decision recommendations, based on real-world variables and constraints. Powered by mathematical models solved by mixed-integer optimization, it enables embedded decision intelligence in all kinds of applications in an industry-agnostic fashion and in any deployment scenario.\r\n\r\nJoin us as we guide you through integrating Gurobi and prescriptive analytics into your greater Python ecosystem. We\u2019ll demonstrate model-building patterns based on NumPy and SciPy.sparse data structures and explore how to take advantage of indexed DataFrames and Series in pandas for mathematical model building. You\u2019ll also discover how to use trained regressors from scikit-learn as constraints in optimization models. Join us as we delve into the world of optimization with Gurobi and elevate your workflows.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "b0455961-8740-50a8-9792-d51addea4c68", "id": 42831, "code": "DCL9SE", "public_name": "Robert Luce", "avatar": null, "biography": "Dr. Luce is an experienced researcher in applied mathematics, and author of numerous publications in the fields of numerical linear algebra and optimization. He holds a Ph.D. from Technical University of Berlin.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/DG8G7Q/", "id": 42873, "guid": "0462ffa2-2987-51a3-981e-e5b9b70c1b91", "date": "2024-04-24T11:05:00+02:00", "start": "11:05", "logo": null, "duration": "00:30", "room": "B09", "slug": "pyconde-pydata-2024-42873-mojo-is-it-python-s-faster-cousin-or-just-hype-", "title": "Mojo \ud83d\udd25 - Is it Python's faster cousin or just hype?", "subtitle": "", "track": "PyCon: Python Language & Ecosystem", "type": "Talk", "language": "en", "abstract": "On 2023-05-02, the tech sphere buzzed with the release of Mojo \ud83d\udd25, a new programming language developed by Chris Lattner, renowned for his work on Clang, LLVM, and Swift. Billed as \"Python's faster cousin,\" and \"The programming language for all AI developers\", Mojo promised a 68,000x performance uplift and a familiar Pythonic syntax.\r\n\r\nAs it reaches its first anniversary, we unpack Mojo's journey towards its ambitious promise. This talk delves into the practical experiences developing a Large Language Model Interpretation library as part of an AI Safety Camp project in that language. We cast a critical eye over its performance, evaluate its usability, and explore its potential as a Python superset. Against a backdrop where alternatives like Rust, PyPy and Julia dominate performant programming for AI, we question whether Mojo can carve out its niche or if it will languish as another \"could-have-been\" in the programming language pantheon.", "description": "Background & Motivation\r\n\r\nThe introduction of Mojo by Chris Lattner captured the attention of the Python community with the allure of dramatic performance enhancements and a syntax that would not alienate current Python developers. As Mojo progresses beyond its infancy, it's critical to assess its evolution and its capacity to disrupt the programming ecosystem, particularly within artificial intelligence and machine learning domains.\r\n\r\nObjective & Scope\r\n\r\nThis presentation will share findings from an AI Safety Camp project which used Mojo to build a Large Language Model Mechanistic Interpretatability and Activation Engineering library. Through our exploration, we aim to provide a candid narrative of Mojo's strengths and limitations, judge its performance claims, and probe its likelihood of adoption for AI development.\r\n\r\nContent Overview\r\n\r\nIntroduction to Mojo: Brief overview of Mojo's conception, ethos, and intended use-cases.\r\nPerformance Claims: An further look at the purported 68,000x speed increase over Python, including benchmark comparisons and real-world application data.\r\nLanguage Design: An analysis of Mojo's syntax and semantics, drawing parallels and contrasts with Python, and the implications for developers transitioning to or adopting Mojo.\r\nCase Study: Detailed account of the process of writing a Large Language Model Interpretation library in Mojo, highlighting the challenges and breakthroughs experienced.\r\nEcosystem Overview: Examination of the current state of Mojo's ecosystem, its community support, and the availability of tooling and libraries.\r\nDiscussion: Engaging the audience in a discussion about Mojo's potential future, its fit within existing projects, and the propensity for it to become the primary language for AI development.\r\n\r\nConclusion\r\nWe'll wrap up with predictions for Mojo's trajectory based on our experiences and broader industry trends, potentially setting the stage for Mojo to capture the \"Mojo\" it needs to triumph or to become a footnote in the annals of programming language history.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "8b6d01c4-3dea-533d-be5e-e27b3778ca8f", "id": 38869, "code": "SNNEQW", "public_name": "Jamie Coombes", "avatar": "https://pretalx.com/media/avatars/SNNEQW_rFKhdxA.jpg", "biography": "I'm a Machine Learning Engineer with 3 years of Python and PyTorch development experience. I've provided ML expertise to startups and the UK government, I am interested in beneficial AI applications.\r\n\r\nI spoke at EuroPython Prague this summer and I have speaking experience through my prior role as a Science Teacher with TeachFirst. My background is studying Physics and then Atmospheric Physics interpreting large tropical cyclone datasets at Imperial College London.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/TCSERC/", "id": 40109, "guid": "556181e7-c893-5530-92f0-c5f0e5de7a16", "date": "2024-04-24T11:40:00+02:00", "start": "11:40", "logo": null, "duration": "00:30", "room": "B09", "slug": "pyconde-pydata-2024-40109-enhance-your-balcony-power-plant-with-python", "title": "Enhance your balcony power plant with Python", "subtitle": "", "track": "General: Infrastructure - Hardware & Cloud", "type": "Talk", "language": "en", "abstract": "Plug-in solar systems, so-called balcony power plants, are getting more popular. This talk will cover the basics of such a system, how to figure out the energy consumption of a household and how to monitor and optimize the power output of a balcony power plant.", "description": "Plug-in solar systems, so-called balcony power plants, are getting more popular and more affordable as people want a simple way to participate in moving towards sustainable energy resources. They are easy to install without the need for an electrician.\r\nIn this talk I will discuss how to figure out much power a household consumes and how much can be covered by the balcony power plant. I will also exemplify different user profiles, like \u201cworking from home\u201d or the \u201chome in idle state\u201d and how it affects the efficiency of an additional battery system.\r\nThe power consumption is measured by using devices, like WiFi plugs, from Shelly and myStrom, each offering a REST API. The power production is preferably recorded by using OpenDTU in combination with compatible microinverters but may be measured using WiFi plugs as well.\r\nThese measured values are published to Redis and can be observed using WebSockets and FastAPI. \r\nAdditionally, these values may be pushed to a public server running on FastAPI and Redis as well. A social login like Google or GitHub can be used to control the access to this server.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "5f0a198b-8135-5bd0-a9c6-203cc457e65f", "id": 15592, "code": "KSM737", "public_name": "Jannis L\u00fcbbe", "avatar": null, "biography": "2008\r\nM.Sc. Physics and Computer Science at Osnabr\u00fcck University\r\n\r\n\r\n2012\r\nPhD in Physics at Osnabr\u00fcck University\r\n\r\n\r\n2013 - now\r\nSensor Developer at ROSEN Group", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/K8AL9P/", "id": 44849, "guid": "def91b64-1b35-5cd9-8f5d-26aa3f9c955e", "date": "2024-04-24T13:10:00+02:00", "start": "13:10", "logo": null, "duration": "00:30", "room": "B09", "slug": "pyconde-pydata-2024-44849-connecting-batteries-with-python-towards-ev-charging-with-zero-emissions-at-zero-costs", "title": "Connecting batteries with Python: Towards EV Charging with #zero emissions at #zero costs", "subtitle": "", "track": "Sponsor", "type": "Sponsored Talk", "language": "en", "abstract": "This talk dives into how Python helps us to bridge the gap between automotive and energy industries. Learn how Python helps in integrating EV batteries into the power grid, enabling further use and growth of renewable energies, stabilizing power grids and enhancing the accessibility of electric mobility.", "description": "The goal of The Mobility House is to create a zero-emission energy and mobility future. Our technology unites the automotive and energy industries. We integrate vehicle batteries into the power grid using intelligent charging and energy solutions. This way, we promote the development of renewable energies, stabilize the power grid, and make electric mobility more affordable.\r\nThe goal of this talk is to give you an overview of how and where Python is used at The Mobility House. A hint upfront, we use it in many places. We use Python in all phases of development, it enables us to go quickly from a proof of concept to production. Python helps us in understanding our data better and using Python in production even changed our development culture and helped bridging the gap between data scientists and coders. However, Python does not solve all of our problems, so we will also talk about the roadblocks we hit and share the solutions which worked for us.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "28b9e75a-ff2a-5f29-84f8-44a56a29d5ae", "id": 40303, "code": "33UACS", "public_name": "Christopher Bock", "avatar": null, "biography": "After finishing my PhD in high energy physics, I worked as software developer and as solution architect on projects in various industries. In the end I ended up at The Mobility House, because I want to work towards a zero-zero future. Nowadays I am working as one of the team leads in the area of vehicle-grid-integration.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/Y7R9GZ/", "id": 40404, "guid": "e08efba4-c41d-56e7-b1e3-97c1e195ddf3", "date": "2024-04-24T13:45:00+02:00", "start": "13:45", "logo": null, "duration": "00:30", "room": "B09", "slug": "pyconde-pydata-2024-40404-replacing-callbacks-with-generators-a-case-study-in-computer-assisted-live-music", "title": "Replacing Callbacks with Generators: A Case Study in Computer-Assisted Live Music", "subtitle": "", "track": "PyCon: Python Language & Ecosystem", "type": "Talk", "language": "en", "abstract": "*Callbacks* have become an ubiquitous programming technique that we use every day without even thinking about it. They are definitely handy in many situations, but sometimes they feel more like a burden than a help. In developing an interactive realtime audio processing system for use on stage in live music, we encountered such a situation. This talk will present how a few dozen lines adding a thin abstraction layer allowed us to replace a complex callback mess with tremendously more readable *generators* (yes, you know, those functions which `yield` results instead of `return`ing them...).", "description": "At [Les Chemins de Traverse](https://www.lescheminsdetraverse.net/) we explore ways of \"augmenting\" acoustical musical instruments with new sonic possibilities offered by computers (think \"augmented reality\" for live music). For doing so, we are using Olivier B\u00e9langer's great [pyo](http://ajaxsoundstudio.com/software/pyo/) module for realtime audio processing. To make the system interactive, this module allows to register callbacks on some events. While this works great in many situation, it can get very cumbersome when we design a stateful system, where the same event must trigger different callbacks depending on the system's inner state.\r\n\r\nThis talk will present how we developed a thin abstraction layer that allows us to replace many callback functions together with many registering/unregistering of these functions by a nice, streamlined *generator* definition that's incomparably more readable than the many-callbacks version. This allows us to keep our mind focused on what's important, namely supporting the music we want to play, instead of tedious boilerplate code.\r\n\r\nWhile our use case is admittedly very specific, we believe that the ideas we present could be adapted in many other situations where callbacks are used for technical reasons, but lead to bulky and contrived code.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "af41ac25-172b-5746-9687-6164c0b46894", "id": 37695, "code": "39LFME", "public_name": "Matthieu Amiguet", "avatar": "https://pretalx.com/media/avatars/39LFME_xvIAAWp.jpg", "biography": "Trained both as a musician and a mathematician, Matthieu Amiguet took up programming as a hobby and somehow ended up making a PhD in computer science. He now works freelance - both as a musician and a developer. He is Artistic Director at Les Chemins de Traverse, jointly with Barbara Minder.\r\n\r\nLes Chemins de Traverse is a collective of musicians, artists and researchers from a variety of backgrounds with a focus on sonic exploration and live performance. They cover a large musical territory from renaissance and baroque music to jazz, rock music and contemporary experimental noise. More often than not, they mix different styles and techniques - like in a weird chemical experiment that would produce nice colored fluids but might as well explode at any time.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/HSJGHH/", "id": 41565, "guid": "620c4b36-32a4-5c17-b18b-77974292d79c", "date": "2024-04-24T14:45:00+02:00", "start": "14:45", "logo": null, "duration": "00:30", "room": "B09", "slug": "pyconde-pydata-2024-41565-bridging-the-worlds-pixi-reimplements-pip-and-conda-in-rust", "title": "Bridging the worlds: pixi reimplements pip and conda in Rust", "subtitle": "", "track": "PyCon: Programming & Software Engineering", "type": "Talk", "language": "en", "abstract": "Pixi is a modern package manager that bridges the worlds of conda and pip package management. A from-scratch implementation of a SAT solver that works for both pip and conda, native lockfiles and a cross-platform task system are compelling features of this new package manager.", "description": "Pixi goes further than existing conda-based package managers in many ways:\r\n\r\n- From scratch implemented in Rust and ships as a single binary\r\n- Integrates a new SAT solver called resolvo\r\n- Supports lockfiles like poetry / yarn / cargo\r\n- Cross-platform task system (simple bash-like syntax)\r\n\r\nA major requested feature was interoperability with PyPI packages. For this we have created a standalone library called rip. Rip contains all the code needed to download and extract wheels and SDist packages straight from PyPI, and also uses resolvo for resolution.\r\n\r\nWe had to overcome some PyPI specific hurdles that we want to discuss in the talk:\r\n\r\n- Lazy fetching of metadata, since on PyPI it is embedded in the wheel\r\n- Resolving Python packages for other platforms and locking them (since we want to resolve on Linux for Windows)\r\n\r\nWe\u2019re looking forward to take a deep-dive together into what conda and PyPI packages are and how we are seamlessly integrating the two worlds in pixi. We\u2019ll also look at some benchmarks and explain more about the conda ecosystem and why it might still have a reason to exist (even though wheels also solve a lot of the painpoints).\r\n\r\nMore information about Pixi:\r\n\r\n- https://pixi.sh\r\n- https://prefix.dev", "recording_license": "", "do_not_record": false, "persons": [{"guid": "c1a59892-19f6-59e5-bdfd-e50a1ff815c3", "id": 12256, "code": "M7CWJZ", "public_name": "Wolf Vollprecht", "avatar": "https://pretalx.com/media/avatars/M7CWJZ_R4tVxad.jpg", "biography": "Wolf is a conda-forge veteran and renowned software package wrangler. Wolf has started the mamba package manager (and prefix.dev) to make cross-platform, high-performance package management a reality. He has also packaged a lot of software for conda-forge and is a core member of the conda-forge team, as well as the founder of the RoboStack project.", "answers": []}, {"guid": "6acdee43-32b2-5373-a617-982120266323", "id": 38028, "code": "7YTCGE", "public_name": "Ruben Arts", "avatar": "https://pretalx.com/media/avatars/7YTCGE_F24UFc1.jpg", "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/ML99UB/", "id": 40960, "guid": "ba4ead0d-e9d4-5908-95d6-c14df40bebdb", "date": "2024-04-24T15:20:00+02:00", "start": "15:20", "logo": null, "duration": "00:30", "room": "B09", "slug": "pyconde-pydata-2024-40960-there-is-a-better-way-to-automate-and-manage-your-fluid-simulations", "title": "There is a Better Way to Automate and Manage Your (Fluid) Simulations", "subtitle": "", "track": "General: Industry & Academia Use-Cases", "type": "Talk", "language": "en", "abstract": "This is a story about applying Python and the \u201chacker mindset\u201d to Computer Aided Engineering (CAE), an emerging domain within the Python ecosystem. Shell scripts have traditionally been the preferred tool for automating CAE pipelines, especially in subfield of Computational Fluid Dynamics (CFD). However, this approach is brittle, severely limited and cumbersome to manage at scale. Data management is also a challenge, with tens to hundreds of GB per simulation needing to be stored and versioned in complex folder structures. One possible approach is to use Python as an automation and glue language and Data Version Control (DVC) which is a Python based tool built on top of git to track pipelines and data.", "description": "This is a story about applying Python and the \u201chacker mindset\u201d to Computer Aided Engineering (CAE), an emerging domain within the Python ecosystem. Shell scripts have traditionally been the preferred tool for automating CAE pipelines, especially in subfield of Computational Fluid Dynamics (CFD). However, this approach is brittle, severely limited and cumbersome to manage at scale. Data management is also a challenge, with tens to hundreds of GB per simulation needing to be stored and versioned in complex folder structures. One possible approach is to use Python as an automation and glue language and Data Version Control (DVC) which is a Python based tool built on top of git to track pipelines and data.\r\n\r\nThis talk will show you how to use Python to automate many tasks in CAE workflows, even when the tools don\u2019t offer a native Python interface:\r\n- Exporting CFD simulation results from Starccm+ to a PowerPoint template with python-pptx and updating the final presentation with new simulation data\r\n- Preparing input data for an electrical thermal simulation to improve performance 80-fold\r\n\r\nBoth examples will illustrate best practices and lessons learned in the automation of the CFD software that are applicable beyond the field.\r\n\r\nDVC was originally designed and is broadly used for machine learning pipelines, but its flexibility allows it to be adapted to other domains. The potential benefits for engineering applications are immense. This talk will show you how easy it is to convert an existing CAE pipeline to DVC and show the benefits:\r\n- Running hundreds of simulations, comparing them and choosing the optimal with DVC\r\n- Managing software versions declaratively and comparing results across versions\r\n- Creating in-depth meta studies and comparing many simulations with Jupyter notebooks\r\n\r\nFinally, this talk will give an outlook on the changing CAE ecosystem and propose new features for DVC to better leverage it for this use case.\r\n\r\n**Audience**\r\nEither simulation engineers seeking to enhance and scale their workflows or software engineers aiming to build powerful and flexible simulation tooling.\r\n\r\n**Relevant talks or blog posts**\r\n- Sending Rovers to Mars with Jupyter\r\n- Managing OpenFOAM Physical Simulations with DVC, CML, and Studio\r\n- How Python enables future computer chips", "recording_license": "", "do_not_record": false, "persons": [{"guid": "4ce331e7-61c8-5e38-8b6e-3e298506dcc0", "id": 38021, "code": "M8QA7S", "public_name": "Julian Wagensch\u00fctz", "avatar": null, "biography": "I am a simulation engineer developing sustainable battery system platforms at Volkswagen. I have been working on battery systems since 2016, starting in Formula Student. There, I created an open source library for analyzing vehicle control unit data with a Pandas-like interface called CANdas. I hold a master's degree in mechanical engineering with a strong focus on simulation and data management. I am always on the lookout to improve simulation workflows using Python.", "answers": []}], "links": [], "attachments": [], "answers": []}], "B07-B08": [{"url": "https://pretalx.com/pyconde-pydata-2024/talk/BA7FZL/", "id": 41756, "guid": "fecfb2a2-6e91-5ed3-b216-36b911d5b775", "date": "2024-04-24T10:30:00+02:00", "start": "10:30", "logo": null, "duration": "00:30", "room": "B07-B08", "slug": "pyconde-pydata-2024-41756-asyncapp-my-contribution-to-hype-pythons-asyncio-a-bit-more", "title": "AsyncApp. My contribution to hype Pythons asyncio a bit more", "subtitle": "", "track": "PyCon: Programming & Software Engineering", "type": "Talk", "language": "en", "abstract": "Asyncio use is now everywhere in the Python world, ...\r\n\r\n.. or is it?\r\n\r\nBeing there since version 3.4 my impression is, that it is still not the go to solution when starting off new projects.\r\nIt's not an obvious choice and traditional approaches still seem to be much preferred especially by beginners.\r\n\r\nSo let me take you with me on a journey to create simple, yet powerful building blocks to build asyncio based applications using patterns that are easy to follow, lightweight and attractive.\r\n\r\n\r\n#asyncio #click #logging #psutil #redis #raspberrypi", "description": "Asyncio has been introduced as a possible solution mainly for I/O related performance problems.\r\n\r\nThe traditional way to handle I/O often ends up in code, which blocks the execution of concurrent elements in an application, often resulting in bad performance. \r\n\r\nThe usual suspects when dealing with these problems, such as multiprocessing and threading, are often considered to be complex and not straightforward in use, especially for beginners.\r\nI believe that proper threading and multiprocessing, with all its interprocess or shared memory communication, locks and race condition prevention, as well as efficient object handling still requires a deep understanding of the architecture and inner workings, and is still mainly a topic for experts.\r\n\r\nAsyncio comes to the rescue here offering a layer of abstraction at a lower and much easier to understand layer.\r\nWhile it is no solution to aid in distributing code execution to gain more performance, it will solve the blocking issues quite effiently.\r\n\r\nTo demonstrate the power and simplicity of asyncio I will show a few object orientated building blocks that will allow us to create a simple environment monitoring app for the raspberry pi.\r\n\r\nThis app will\r\n\r\n- periodically gather sensor readings\r\n- log them\r\n- store the readings to a data file\r\n- offer a monitoring system to log cpu and memory usage for itself\r\n- be able to be configured via environment variables, config files and command line arguments\r\n\r\nIn its final iteration the app will be distributed into small parts just dealing with a single, very specific task to be performed, following the traditional UNIX philosophy for an app to do just one thing, but do this well.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "df23dc13-20d6-55c5-a620-5d4717e311aa", "id": 25799, "code": "DUSVE9", "public_name": "Jens Nie", "avatar": "https://pretalx.com/media/avatars/DUSVE9_x6pgG1p.jpg", "biography": "A physicist who has filled a variety of roles in a leading service company in the oil and gas industry, currently tackling the development of embedded devices at Rosenxt based on the Raspberry Pi, LinuX and Python with a Python history going back to version 1.4.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/9JEZ8E/", "id": 41827, "guid": "0299cadf-bbcc-5cc6-ad6d-98c29cca9aa5", "date": "2024-04-24T11:05:00+02:00", "start": "11:05", "logo": null, "duration": "00:30", "room": "B07-B08", "slug": "pyconde-pydata-2024-41827-high-performance-data-visualization-for-the-web", "title": "High Performance Data Visualization for the Web", "subtitle": "", "track": "PyData: Visualisation & Jupyter", "type": "Talk", "language": "en", "abstract": "In this talk, we will put together a simple but full-featured website using [Perspective](https://perspective.finos.org). Perspective is an open source interactive analytics and data visualization component, which is especially well-suited for large and/or streaming datasets. It is written in C++ and Rust with bindings to both Python and WebAssembly, making it ideal for data-intensive applications. It comes with a variety of visualization plugins, including a datagrid and various charts. Additionally, it comes with a Jupyter widget, which allows developers to iterate quickly with a clear pathway to their production website.", "description": "The Python ecosystem has ample supply of both web development frameworks, and data visualization components. But despite the maturity of the ecosystem, few datavisualization tools are capable of dealing with large amounts of streaming data. Even fewer are able to perform live aggregations, sorting, and filtering on top of this data.\r\n\r\nIn this talk, we will put together a simple but full-featured website using [Perspective](https://perspective.finos.org). Perspective is an open source interactive analytics and data visualization component, which is especially well-suited for large and/or streaming datasets. It is written in C++ and Rust with bindings to both Python and WebAssembly, making it ideal for data-intensive applications. It comes with a variety of visualization plugins, including a datagrid and various charts. Additionally, it comes with a Jupyter widget, which allows developers to iterate quickly with a clear pathway to their production website.\r\n\r\nWe will start with a simple [FastAPI](https://fastapi.tiangolo.com)-based website and some static data. In a few lines of code, we will have the website up and running. Next, we will demonstrate some of the core features of Perspective - pivoting, sorting, filtering, the various visualization plugins, cross-filtering (using one table as a filter on other tables), and computed columns. After this, we will pull in some streaming data and show how the functionality of Perspective demonstrated updates in realtime alongside the data. Finally, we'll crank the speed of updates to the limit.\r\n\r\nBy the end of this talk, the audience will know how to use Perspective and how to incorporate it into their own applications for both static and streaming data, either as a simple but high performance datagrid or as a full featured set of interconnected visualization components.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "881ccffb-5f79-5c8b-84d5-8349c1dd56fa", "id": 37915, "code": "TM3KV3", "public_name": "Tim Paine", "avatar": "https://pretalx.com/media/avatars/TM3KV3_NqDxFjv.jpg", "biography": "Quantitative Developer - Cubist Systematic Strategies\r\nAssociate in Computer Science - Columbia University", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/DKL7YQ/", "id": 41823, "guid": "4cedab0a-12fb-5250-a50f-f2ba1d673c64", "date": "2024-04-24T11:40:00+02:00", "start": "11:40", "logo": null, "duration": "00:30", "room": "B07-B08", "slug": "pyconde-pydata-2024-41823-how-to-improve-the-python-development-experience-for-millions-of-ubuntu-users", "title": "How to Improve the Python Development Experience for Millions of Ubuntu Users", "subtitle": "", "track": "PyCon: Python Language & Ecosystem", "type": "Talk", "language": "en", "abstract": "Have you ever tried to install a different Python version on Ubuntu or tried to upgrade your current one?\r\n\r\nLots of posts exist, many are outdated, and some even lead to a broken Ubuntu installation.\r\n\r\nThis talk will introduce the most common options and their ups and downs in-depth.\r\n\r\nWe will also give an outlook on what Ubuntu could do to make it even easier for you and everybody.", "description": "Updating your current Python installation, or installing a different one on Ubuntu is not an easy task.\r\n\r\nThere are many reasons why you want a different Python version on Ubuntu:\r\n- you want to use the latest version, but Ubuntu comes with an older one pre-installed\r\n- a Python app requires an older Python version\r\n- you want to test your Python library against multiple Python versions\r\n\r\nUnfortunately, `apt install python-<version>` won't work.\r\n\r\nAfter googling some time, you'd learn that you have many options:\r\n- pyenv\r\n- deadsnakes\r\n- mamba/conda\r\n- or even compiling Python yourself\r\n\r\nWhy isn't there a single way, and which one fits your needs the best?\r\n\r\nAnd why doesn't `apt install python-<version>` just work?\r\n\r\nThere are many blog posts and tutorials out there to install a new Python version, but they lack the depth to understand the core of the problem.\r\n\r\nAnd are they up-to-date? Do you trust them not to break your Ubuntu installation?\r\n\r\nThis talk will not only introduce and compare all the most common options to update a Python version or to install a new one on Ubuntu but will also convey the knowledge to assess the existing and upcoming options yourself.\r\n\r\nWe will also look into the future.\r\n\r\nWhat new tools are on the horizon?\r\n\r\nAnd especially, what could Ubuntu do itself to make it easier for you and everybody?", "recording_license": "", "do_not_record": false, "persons": [{"guid": "af52d8e4-c036-537b-b9a4-dc724a16db0f", "id": 10700, "code": "EARAKA", "public_name": "J\u00fcrgen Gmach", "avatar": "https://pretalx.com/media/avatars/EARAKA_ygr3HZr.jpg", "biography": "I am a software developer with a passion for Python and Linux, developing open source software both at my day job at [Canonical](https://grnh.se/5d948ffe1us), and at night as a maintainer of [tox](https://github.com/tox-dev/tox) and many other projects.\r\n\r\nI would love to connect with you!\r\n\r\n- [My blog](https://jugmac00.github.io/)\r\n- [Mastodon](https://fosstodon.org/@jugmac00)\r\n- [Twitter](https://twitter.com/jugmac00)\r\n- [LinkedIn](https://www.linkedin.com/in/j%C3%BCrgen-gmach-b24363226/)", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/7NETLX/", "id": 41677, "guid": "7b0fc0e1-534c-51a7-9e12-1f6048c51dcd", "date": "2024-04-24T13:10:00+02:00", "start": "13:10", "logo": null, "duration": "00:30", "room": "B07-B08", "slug": "pyconde-pydata-2024-41677-django-an-asynchronous-microservices-technique-", "title": "\u00b5Django, an asynchronous microservices technique.", "subtitle": "", "track": "PyCon: Django & Web", "type": "Talk", "language": "en", "abstract": "A standard Django project involves working with multiple files and folders from the start. Let's see how the work with a Django project changes when we have only one file. This solution automatically transforms Django into a microservice-oriented async framework with \"batteries included\u201d philosophy.", "description": "The history of the lightweight Django project isn't new. \r\nThe first time single-py-file Django project paradigm appears in 2014 in book Lightweight Django.\r\nI with Django project consisting of only 2 files in 2015. At that time, the tiny Django project wasn't comparable to the capabilities of projects based on FASTAPI or FLASK.\r\nBut a couple of years later, Django introduced ASGI, and in 2022, Django was ready for use in microservices.\r\n\r\nThe concept of creating micro-projects on Django reappeared within the Django community in 2019 and again in the spring of 2023, and now we have a full-fledged technology for creating asynchronous microservices consisting of one or two files. It was named uDjango.\r\n\r\nIn this talk, I will share my experience in creating high-performance microservices on Django and how i can keep simplicity and minimalism in projects.\r\n\r\nDuring the talk, I'll discuss the advantages of Django microservices:\r\n\r\n* All-in-one package\r\n* Standard architecture and syntax\r\n* Extremely rapid development and deployment speed\r\n\r\nAfter years of work with uDjango paradigm, I have identified the challenges in creating Django microservices:\r\n\r\n* The prevailing opinion that the 'Django framework isn't suitable for microservices'\r\n* Django settings.py - cause of many problems.\r\n* URL routing in Django that could be stricter\r\n* Initialization time of forms and model objects reduces performance\r\n\r\nThe result of this Talk for the audience will be knowlege about mDjango, a ready-to-use technology for building synchronous and asynchronous microservices.\r\n\r\nTalk Based on ideas of:\r\nJulia Elman and Mark Lavin, Lightweight Django 2014.\r\nWill Vincent, django-microframework 2019.\r\nKirill Klenov, python benchmark repository, 2019.\r\nCarlton Gibson, linked in post about one app Django project, 2022\r\nPaolo Melchiore 2023, uDjango", "recording_license": "", "do_not_record": false, "persons": [{"guid": "2926c361-78a7-541a-9dc2-e5151c958cbb", "id": 17714, "code": "TJSMCP", "public_name": "Maxim Danilov", "avatar": "https://pretalx.com/media/avatars/TJSMCP_GsfYTly.jpg", "biography": "Python/Django Senior Software Engineer, Solution Architect and Tech Speaker.\r\n\r\nI began my career as a programmer specializing in embedded solutions in 1997, and grow to the role of Chief Technology Officer in 2023. Through many successful projects, I gained a robust understanding of various software development paradigms. After more than 10 years as a code mentor, I finally earned the title 'Super Mentor in Engineering' in December 2023.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/ZLDMGM/", "id": 42904, "guid": "d9934b2d-03bc-54a5-a26d-f7e6f294fe46", "date": "2024-04-24T13:45:00+02:00", "start": "13:45", "logo": null, "duration": "00:30", "room": "B07-B08", "slug": "pyconde-pydata-2024-42904-beyond-deployment-exploring-machine-learning-inference-architectures-and-patterns", "title": "Beyond Deployment: Exploring Machine Learning Inference Architectures and Patterns", "subtitle": "", "track": "PyCon: MLOps & DevOps", "type": "Talk", "language": "en", "abstract": "This talk is about setting up robust and scalable machine learning systems for high-throughput real-time predictions and large numbers of users. It is meant for ML engineers and people who work with data and want to learn more about MLOps focusing on cloud-based platforms. The focus of this talk will be about different ways to make predictions -\u2013 real-time, asynchronously and batch processing. It discusses the advantages and disadvantages of the different patterns and highlights the importance of choosing the right pattern for specific use cases, including generative large language models \r\n\r\nWe will use examples from StepStone's production systems to illustrate how to  build systems that scale to thousands of simultaneous requests while delivering low-latency, robust predictions.  \r\n\r\nI  will cover some of the technical details, how to efficiently manage operations, and real-life examples in a way that is easy to understand and informative. You will learn about different setups for ML and how to make them work. This will help you make your ML inference faster, more cost-efficient, and reliable.", "description": "This talk explains the major challenges of ML deployment and management, emphasizing inference patterns for robust, scalable applications. Using StepStone's infrastructure as an example, we'll discuss efficiently handling large workloads and complex models, including recent large language models, to ensure fast, cost-effective, and reliable results. \r\n\r\nThe session begins with an introduction, highlighting the significance of ML inference and outlining the objective of providing insights into effective MLOps strategies. We'll then overview various ML inference patterns, emphasizing their advantages, disadvantages, and the importance of selecting the right pattern for specific use cases. \r\n\r\nMoving on, we'll delve into StepStone's ML inference strategy, showcasing real-world applications and how scalability, performance, and cost are managed while maintaining agility for frequent model updates and monitoring in production systems. \r\n\r\nIn summary, this talk provides a practical roadmap of ML inference patterns with a focus on real-world implementation at StepStone.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "e68e0e28-ebd5-5c00-976f-11fa576910d7", "id": 1912, "code": "9EBGDD", "public_name": "Tim Elfrink", "avatar": "https://pretalx.com/media/avatars/9EBGDD_8pEUgjz.jpeg", "biography": "Tim is a Staff Machine Learning Engineer at Stepstone. He is working on the deployment of various machine learning projects.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/QYPLJE/", "id": 41607, "guid": "011a9f5c-8fcd-5edf-b4c3-911e236488e3", "date": "2024-04-24T14:45:00+02:00", "start": "14:45", "logo": null, "duration": "00:30", "room": "B07-B08", "slug": "pyconde-pydata-2024-41607-the-ai-revolution-will-not-be-monopolized-how-open-source-beats-economies-of-scale-even-for-llms", "title": "The AI Revolution Will Not Be Monopolized: How open-source beats economies of scale, even for LLMs", "subtitle": "", "track": "PyData: Natural Language Processing & Computer Vision", "type": "Talk", "language": "en", "abstract": "With the latest advancements in Natural Language Processing and Large Language Models (LLMs), and big companies like OpenAI dominating the space, many people wonder: Are we heading further into a black box era with larger and larger models, obscured behind APIs controlled by big tech monopolies?\r\n\r\nI don\u2019t think so, and in this talk, I\u2019ll show you why. I\u2019ll dive deeper into the open-source model ecosystem, some common misconceptions about use cases for LLMs in industry, practical real-world examples and how basic principles of software development such as modularity, testability and flexibility still apply. LLMs are a great new tool in our toolkits, but the end goal remains to create a system that does what you want it to do. Explicit is still better than implicit, and composable building blocks still beat huge black boxes.", "description": "As ideas develop, we\u2019re seeing more and more ways to use compute efficiently, producing AI systems that are cheaper to run and easier to control. In this talk, I'll share some practical approaches that you can apply today. If you\u2019re trying to build a system that does a particular thing, you don\u2019t need to transform your request into arbitrary language and call into the largest model that understands arbitrary language the best. The people developing those models are telling that story, but the rest of us aren\u2019t obliged to believe them.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "b60e58b3-bd41-534c-a286-22ae8481a00a", "id": 25952, "code": "FZKG9N", "public_name": "Ines Montani", "avatar": "https://pretalx.com/media/avatars/FZKG9N_5iBQp5R.jpg", "biography": "Ines Montani is a developer specializing in tools for AI and NLP technology. She\u2019s the co-founder and CEO of Explosion and a core developer of spaCy, a popular open-source library for Natural Language Processing in Python, and Prodigy, a modern annotation tool for creating training data for machine learning models.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/DBGXJN/", "id": 41830, "guid": "51ea0d72-ce2d-5c8c-b113-488ce860aa78", "date": "2024-04-24T15:20:00+02:00", "start": "15:20", "logo": null, "duration": "00:30", "room": "B07-B08", "slug": "pyconde-pydata-2024-41830-jupyter-notebooks-for-print-media", "title": "Jupyter Notebooks for Print Media", "subtitle": "", "track": "PyData: Visualisation & Jupyter", "type": "Talk", "language": "en", "abstract": "In this talk, we will discuss leveraging Jupyter Notebooks to generate print media - books, magazine and newspaper articles, business reports, academic papers, etc. We will motivate the problem, introduce a library for accomplishing the task (nbprint), and walk through some end-to-end examples.", "description": "Jupyter Notebooks are the tool of choice for researchers and data scientists, and a lot of work has been done to take Jupyter Notebooks and turn them into standalone websites. From [Voil\u00e0](https://voila.readthedocs.io/en/stable/index.html) to [Jupyter Book](https://jupyterbook.org/en/stable/intro.html), with widget and app libraries galore, it has never been easier to take a notebook and produce an interactive website. In contrast, despite the origins of notebooks in academic research, comparatively less work has been done in building tools to take notebooks and produce print media - newspaper articles, business reports, textbooks, academic publications, etc.\r\n\r\nIn this talk, we will do four things. First, we will motivate print media as a good target for Jupyter Notebooks. We will do so through three worked examples:\r\n- a data-driven news publications such as those from The New York Times\r\n- a computer science textbook\r\n- a business intelligence report\r\n\r\nSecond, we will highlight the correct set of technologies for producing notebook-derived print media. In particular, we will discuss NBPrint, a small [NBConvert](https://nbconvert.readthedocs.io/en/latest/)-based library that leverages [paged.js](https://pagedjs.org), a free and open source library which has [been used to produce real, printed books](https://pagedjs.org/made-with-paged.js.html).\r\n\r\nThird, we will give an end-to-end example from Jupyter Notebook to publication quality result for one of the above examples, showing a side-by-side comparison with the original media.\r\n\r\nFinally, we will discuss the power of the notebook oriented approach, and discuss which disciplines might be best suited for adopting notebooks as the source format for their print-oriented media.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "881ccffb-5f79-5c8b-84d5-8349c1dd56fa", "id": 37915, "code": "TM3KV3", "public_name": "Tim Paine", "avatar": "https://pretalx.com/media/avatars/TM3KV3_NqDxFjv.jpg", "biography": "Quantitative Developer - Cubist Systematic Strategies\r\nAssociate in Computer Science - Columbia University", "answers": []}], "links": [], "attachments": [], "answers": []}], "B05-B06": [{"url": "https://pretalx.com/pyconde-pydata-2024/talk/SSKV9R/", "id": 41808, "guid": "d4747581-cffd-5145-8508-0545cf1fa918", "date": "2024-04-24T10:30:00+02:00", "start": "10:30", "logo": null, "duration": "00:30", "room": "B05-B06", "slug": "pyconde-pydata-2024-41808-reinforcement-learning-bridging-the-gap-between-research-and-applications", "title": "Reinforcement Learning: Bridging The Gap Between Research and Applications", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Stats", "type": "Talk", "language": "en", "abstract": "Reinforcement learning (RL) has great potential for industrial applications, but few mature software frameworks exist to facilitate its use. This talk discusses efforts to improve the software landscape for RL, making it easier for researchers to contribute algorithms and for engineers to apply RL in real-world settings. Specifically, we highlight the open-source library Tianshou, which provides high-level interfaces for painless RL application development along with lower-level APIs that cater to the needs of researchers. By improving RL software, we aim to accelerate research progress and expand RL adoption in industry.", "description": "Despite the very general applicability of reinforcement learning (RL) to a variety of decision and control problems, there are comparatively few applications of it in current industries. Moreover, many important developments emerging in the highly active RL research community do not get added to existing frameworks or libraries. Code written for successful RL applications in industry is also rarely contributed to open source software (OSS). This is in stark contrast to other areas of machine learning (ML), where reported progress is often transferred to mature OSS within weeks, if not days.\r\n\r\nPart of the reason behind this lamentable state may be the intrinsically higher complexity of RL when compared to, say, supervised learning. However, we believe that the lower permeation of RL in mature software arises in large part because writing RL-based software is currently much harder than it has to be. Widely used OSS for RL is either too complex for researchers to contribute to (like ray/RLlib or Pearl), too buggy and unstable for industry to consider (also RLlib), too limited in scope (like stable-baselines3, which includes relatively few algorithms), lacking high-level interfaces (like torch-rl), or even completely gives up on modularity (like cleanRL).\r\n\r\nAnother reason is the difference in focus between RL research and applications. In research, an important goal is to find an algorithm that works well in a variety of environments, whereas in applications, one is usually interested in solving a particular environment of interest, by any means. This leads to wildly differing evaluation scenarios and selection criteria.\r\n\r\nWe believe that the current state of RL software is reminiscent of the pre-PyTorch/pre-Keras era for supervised deep learning, when the implementation of a task like training a convolutional network on a large image dataset was non-trivial. Today, it requires but a few lines of code. We thus infer that significant progress in the software landscape supporting RL is still to be made, and that this progress will have high impact both on researchers and ML engineers.\r\n\r\nWith this goal in mind, the appliedAI Institute for Europe, together with the core developers of the open source RL library Tianshou, took on the task of extending the latter in order to democratize RL in applications and accelerate reliable and trustworthy research on it. In this talk, we will highlight Tianshou\u2019s high-level interfaces, which allow painless applications of RL algorithms in industry applications, as well as the lower-level interfaces that researchers can base their work on. Research code that is compatible with Tianshou\u2019s interfaces will not only get mature evaluation, reporting and hyper-parameter optimization \u201cfor free\u201d, but will also be much easier to use in applications, thereby boosting its impact. We will also address the question of environment design, which is a highly important RL engineering topic that is largely ignored in RL research.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "ac862f29-7360-562f-aaa5-81487c15b068", "id": 38376, "code": "PJHQMT", "public_name": "Michael Panchenko", "avatar": null, "biography": "Mischa is a researcher with background in physics and mathematics who decided to change course and go into AI (for the sake of falsifiability of ideas). On his path since then he has worked on multiple projects in ML and data analysis and as a bonus gained some experience DevOps and in developing production grade solutions.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/GNK3PV/", "id": 44945, "guid": "5be5801d-acbd-5fad-9e6e-535f45549401", "date": "2024-04-24T11:05:00+02:00", "start": "11:05", "logo": null, "duration": "00:30", "room": "B05-B06", "slug": "pyconde-pydata-2024-44945-climate-crisis-in-numbers", "title": "Climate Crisis in Numbers", "subtitle": "", "track": "Sponsor", "type": "Sponsored Talk", "language": "en", "abstract": "Climate change is one of the biggest and most daunting challenges that our and future generations are going to face. In order to mitigate climate change and its consequences, first one needs to understand the problem and get a rough idea about the magnitude of human made global warming. As a proper numbers nerd I understand problems best when looking at science, statistics, and measurements. So here\u2019s my little guide to better grasp what climate change is all about through data.", "description": "About 5 years ago my co-founder and I launched alcemy, a Machine Learning startup to help decarbonize the cement and concrete supply chain. My primary motivation to run the startup is to find ways to tackle and prevent climate change and human made global warming. In the course of building the company I not only wanted to understand how much we can contribute in our niche sector of cement and concrete, but get a better idea of the problem and its magnitude as a whole. So here\u2019s my little guide to better grasp what climate change is all about through data.\r\n\r\nI am going to talk about a variety of things regarding climate change and the greenhouse effect:\r\n- CO2 Equivalence\r\n- Magnitude and origin of different emission sources\r\n- The consequences of global warming and our potentially grim future\r\n- A (very) brief outlook of what humankind needs to do to tame global warming\r\n\r\n\r\nPS: Absolutely no Python experience needed here ;-)", "recording_license": "", "do_not_record": false, "persons": [{"guid": "5828ae9d-b538-54b9-99fc-b61636221eee", "id": 16049, "code": "RL3YL7", "public_name": "Robert Meyer", "avatar": "https://pretalx.com/media/avatars/RL3YL7_mWLe0WR.jpeg", "biography": "Robert Meyer is a Data Scientist and Neuroscience researcher by training. He completed his PhD at TU Berlin and simulated parts of the cat brain.\r\n\r\nAfter working for the German unicorn Flixbus for two and half years building an automated bus ticket pricing pipeline, he joined the Entrepreneur First incubator. There he met his co-founder Leopold Spenner and together they started alcemy, a Machine Learning startup to accelerate the decarbonization of the cement and concrete supply chain", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/UGJJMP/", "id": 41452, "guid": "14d6562e-7f36-5690-9576-c1f25c3d5fbb", "date": "2024-04-24T11:40:00+02:00", "start": "11:40", "logo": null, "duration": "00:30", "room": "B05-B06", "slug": "pyconde-pydata-2024-41452-lessons-learned-from-deploying-machine-learning-in-an-old-fashioned-heavy-industry", "title": "Lessons learned from deploying Machine Learning in an old-fashioned heavy industry", "subtitle": "", "track": "PyData: Machine Learning & Deep Learning & Stats", "type": "Talk", "language": "en", "abstract": "About 5 years ago my co-founder and I launched alcemy, a Machine Learning startup to help decarbonize the cement and concrete supply chain. I experienced first hand moving from a simple proof of concept, a ML model inside a Jupyter notebook, to a full-fledged pipeline running 24/7 and steering massive amounts of cement production in real plants. I can tell you the road was long and winding. I want to share some of the hard lessons we learned along the way with you. If you are an aspiring ML or Software Engineer, Data Scientist, Entrepreneur, or you are just wondering how Machine Learning applied in the wild looks like this talk is for you. No prior knowledge is required except some familiarity with basic concepts and terminology of Machine Learning.", "description": "Introduction\r\n------------------\r\n**Cement alone is responsible for about 8% of worldwide CO2 emissions**. Fortunately, we have quickly learned that low-carbon alternatives to \"conventional\" cement and concrete already exist. For instance, 60% of carbon emissions can be avoided if burnt limestone, the main ingredient for cement, is replaced partly by limestone powder (which isn't burnt, and therefore doesn't release carbon into the atmosphere). Yet, these low-carbon cement recipes have a substantial shortcoming: They react much more sensitive to changes, e.g. changes in weather conditions or in the chemical and mineralogical composition of ingredients. As a consequence, low-carbon cements and the resulting concrete (made by mixing cement with sand and water) can only be reliably produced under laboratory conditions. \r\n\r\nWe are changing this. We use data intelligence and predictive Machine Learning control to optimize production processes such that low-carbon cement and concrete can be manufactured in real plants and at scale. I will quickly introduce our solution that is already deployed in 5 cement plants. Moreover, we are currently prototyping to move into concrete production as well. Of course, we do this (mostly) in Python.\r\n\r\nPart 1: Machine Learning\r\n-------------------------------------\r\nMachine Learning in production is vastly different from solving a kaggle challenge. In fact, the particular choice of Machine Learning model is much less important than you think. I will cover the benefits of using rather simple models such as random forests or even linear regression in comparison to deep learning. If stuff goes wrong, and it will, interpretable and debuggable models are far superior to complex architectures. Also having proper model evaluation that reflects production requirements, and good baselines for comparison are always crucial first steps and pay off in the long run. It was surprising how much less time we spent on the core Machine Learning algorithms in comparison to infrastructure, such as deployments on AWS fargate or k8s, re-training processes, proper database layout, or home-brewed tooling to allow easier configurations of dozens of ML models.\r\n\r\nPart 2: Data\r\n------------------\r\nWe quickly learned that data is way more important than models. Some might have heard the phrase *Garbage in garbage out* coined by programmers in the 50s. This is even more important when it comes to today's widespread usage of Machine Learning. We run ML not on our own data, but on data provided by our customers. While the level of data-maintenance and quality that our customers are used to allows for in-house bookkeeping and short analyses, it does not necessarily suffice for ML. I will discuss why and how we spend a good amount of time cleaning and really drilling into the data provided by our customers.\r\n\r\nMoreover, differences between training and real-time inference data can be a real challenge. For example, it is not guaranteed that the location where samples are drawn from cement mills, i.e. the live data used for inference, is as representative of the actual cement as silo samples that can be used for training. Fine particles might not be captured simply due to the physical properties of the sample site. To tackle problems like these as a Machine Learning engineer you have to become an expert in the domain your models are applied. You really need to understand the data in every detail and know how it is generated by your customers and understand the context and consequences of all of your customers' processes.\r\n\r\nPart 3: Customers and Business\r\n-----------------------------------------------\r\nOur customers are, of course, no Machine Learning experts. Why should they be? If they were, they wouldn't need us anyway. However, oftentimes we as Machine Learning engineers forget the ramifications of this. I will talk about customer relations and their interactions with our Machine Learning models. For example, we had to deal with a rather skeptical customer not believing our models' predictions. They pretty much went against all recommendations made by the model. Although it is nice if in the end the model predictions turn out to be right, your customer does not necessarily feel the same way. In contrast, the customer does not enjoy being wrong and may even feel mocked by a machine. Having a strong customer success team, who knows both how ML works and, of course, how the customer operates and thinks, is often more valuable than \"rockstar\" Machine Learning engineers.\r\n\r\nLastly, a tough lesson to learn was that Machine Learning as a service should not be mistaken for a software as a service business model. Our marginal costs are not zero. Besides a great deal of consulting that is needed for every customer, on-boarding a new customer is time consuming and needs a lot of work. Integrating into existing infrastructure of cement plants (who are not top-notch IT companies) can be tough or plain-right frustrating at times. Therefore, scaling a Machine Learning startup can be hard, and we learned to better go hunting for elephants, i.e. few high paying customers, than for mice, many low paying ones.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "5828ae9d-b538-54b9-99fc-b61636221eee", "id": 16049, "code": "RL3YL7", "public_name": "Robert Meyer", "avatar": "https://pretalx.com/media/avatars/RL3YL7_mWLe0WR.jpeg", "biography": "Robert Meyer is a Data Scientist and Neuroscience researcher by training. He completed his PhD at TU Berlin and simulated parts of the cat brain.\r\n\r\nAfter working for the German unicorn Flixbus for two and half years building an automated bus ticket pricing pipeline, he joined the Entrepreneur First incubator. There he met his co-founder Leopold Spenner and together they started alcemy, a Machine Learning startup to accelerate the decarbonization of the cement and concrete supply chain", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/TMF8V7/", "id": 41753, "guid": "26b2b037-e010-53d1-8b13-c5b24c320091", "date": "2024-04-24T13:10:00+02:00", "start": "13:10", "logo": null, "duration": "00:30", "room": "B05-B06", "slug": "pyconde-pydata-2024-41753-how-python-helped-us-uncover-secrets-of-protein-motion", "title": "How Python helped us uncover secrets of protein motion", "subtitle": "", "track": "General: Industry & Academia Use-Cases", "type": "Talk", "language": "en", "abstract": "This presentation will give an overview of the scientific project that focuses on understanding how proteins move and function. Along the way a very large collection of Python tools was used, and on top of them our own innovative approaches are based. To be able to understand everything about living beings, including our health and origin of deseases in humans, we have to know how proteins do what they do. Hence is of utmost importance to understand their structure and function. Thanks to extraordinary technique called X-ray crystallography we are able to see how the proteins look at atomic scale, but it is impossible to see how they move. Therefore the next best thing we can do is to simulate the motion of the protein by so-called molecular dynamics (MD) simulations. These simulations generate incredible amounts of data, generally hundreds of GB of data per 1 microsecond of protein movement! Extracting useful and meaningful information from it is a daunting task.\r\nWe are going to show how we have used many Python tools to tackle this problem in the project. Using Django to place everything in an interactive web app (https://alokomp.irb.hr/), along with Pandas, Numpy, Scipy, Dask, Jupyther, NetworkX, Bokeh, Datashader and many more under the hood, we have created an innovative new way of seeing protein move and communicate.", "description": "Proteins are one of the main building blocks of the living world. They are largely responsible for the amazing diversity that we witness in the nature around us. Although proteins are composed of sequences of just 20 amino acids, clever nature\u2019s design has endowed them with an incredibly diverse set of functions. It is not an overstatement to say that this diversity and the myriad of ways proteins interact with each other is at the very heart of life. Therefore it is of utmost importance to understand their structure and function. \r\nProteins are very large molecules, composed of thousands up to even millions of atoms connected in a giant hairball like structures. But still they are too tiny to be seen by any sort of microscope, even the most powerful ones. That is why in order to \u201csee\u201d how they look we use X-rays and shine them on crystals made entirely of single proteins species in the fascinating method of X-ray crystallography. It then gives us the picture of how the proteins look to unprecedented atomic detail. \r\nIn order to do their function proteins also move their parts, but unfortunately this motion is too quick to be seen by any device. X-ray crystallography alone, although mighty in giving us the details, gives us only one static image. It is a bit like trying to tell a story of a movie just by seeing a movie poster. Therefore we have to simulate the motion of the protein by so-called molecular dynamics (MD) simulations. Basically we give the computer the initial positions of all the atoms that we know from X-ray crystallography and then kick them and see how the protein moves in time, in very tiny steps. This results in so-called MD trajectories which contain all atom positions in millions of steps. Needles to say that this results in super heavy data that usually contains hundreds of GB of data that needs to be processed somehow. \r\nIn the project called \u201cAllosteric communication pathways in oligomeric enzymes\u201d (https://alokomp.irb.hr/) we have faced that very problem. How to extract information about protein movement from such enormous quantities of data? Of course the answer was using marvelous Python suite of tools available. Python has established itself as a de facto standard programming language in data science, and with already available plethora of options for X-ray crystallography and MD analysis it was a logical choice (not to mention its awesomeness and being our favourite anyway). The whole project really displays how mature and diverse Python is to be able to tackle every single aspect of such a specialized problem. To begin with, we have centered the entire project around a web page built using Django. It serves both as a front-end wih general information, but also as a web app for diving into the data. Behind it is a PostgreSQL relational database containing all the structural and derived data from a family of proteins, called PNPs, which serve as sort of proof of concept (https://alokomp.irb.hr/pdbase/structures/). It also contains data derived from MD simulations and analysed with MDanalysis tool (https://www.mdanalysis.org/). It is hard to mention all the Python tools we have used for analysis of the data in the database. Of course the backbone of it are indispensable Pandas, Numpy, Scipy, Dask, Jupyther, NetworkX, Bokeh, HoloViz to name but a few. More specifically we have developed a special approach (\u201cavocado\u201d plots, example https://alokomp.irb.hr/md/avocados/1458/A) to visualize the motion of protein as a whole in time, as a series of snapshots each containing plots of millions of points, using awesome Datashader library (https://datashader.org). We have also used Ruptures (https://github.com/deepcharles/ruptures) library to detect changes in the positions of protein and to detect correlations. Everything is wrapped up in a form of interactive web app which can be used to visually browse vast amounts of data, giving a whole new perspective on a highly complex multidimensional data.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "75256c6b-a026-569a-bcc8-62a25e7a25ab", "id": 38353, "code": "CSDFVG", "public_name": "Zoran \u0160tefani\u0107", "avatar": "https://pretalx.com/media/avatars/CSDFVG_i8aAXqz.jpg", "biography": "Dr. Zoran \u0160tefani\u0107 senior research associate and Head of the Laboratory for Chemical and Biological Crystallography at the Ru\u0111er Bo\u0161kovi\u0107 Institute. His main areas of expertise include: chemical crystallography of small organic molecules and hydrogen bonded networks, macromolecular crystallography, strong background in physics and mathematics, development of computer algorithms mainly in Python programming language, database design and web development. As the principal investigator of the ALOKOMP project, he is responsible for the overall coordination of the research and managing all activities between team members, organizational and financial matters, as well as for the publication of results, annual scientific and financial reports to the Croatian Science Foundation. More specifically in this project his tasks will be data collection and 3D structure determination of new enzyme structures, development of central relational database, programming of algorithms for data extraction, development of web server, and co-mentorship of PhD student.", "answers": []}, {"guid": "3d546c52-c4e0-527c-bba1-bb6154c3c1a6", "id": 38387, "code": "TAJRF9", "public_name": "Boris Gomaz", "avatar": "https://pretalx.com/media/avatars/TAJRF9_opg2vxF.jpg", "biography": "A structural biology researcher enthusiastic about crystallography, molecular dynamics, and programming.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/ZMC9FU/", "id": 40942, "guid": "3b31ffdc-f03f-5cff-9441-a47227792c8c", "date": "2024-04-24T13:45:00+02:00", "start": "13:45", "logo": null, "duration": "00:30", "room": "B05-B06", "slug": "pyconde-pydata-2024-40942-525-days-working-full-time-on-foss-lessons-learned", "title": "525 days working full-time on FOSS: lessons learned", "subtitle": "", "track": "General: Community, Diversity, Career, Life and everything else", "type": "Talk", "language": "en", "abstract": "I've been working full-time on a Python FOSS project for 525 days, so what did I learn?\r\n\r\nAm I a better (Python) programmer?\r\nAm I a better teammate?\r\nAm I a better person?\r\n\r\nIn this talk I will share some of the lessons I learned over the course of these 525 days:\r\n\r\n - how to get a tech job in this day & age\r\n - how to put your ego aside when working with others (who know more than you!) and how to deal with mistakes\r\n - how to interact with users & contributors online\r\n - how it feels to collaborate to a large codebase\r\n\r\nAs for the first three reflective questions, you'll have to ask my colleagues!", "description": "## Outline\r\n\r\n### Introduction (~5min)\r\n\r\nPersonal and professional context for the talk:\r\n - Who am I?\r\n - What FOSS project have I been working on for 525 days?\r\n - Who am I working with?\r\n\r\n### Lesson learned 1 \u2013 how to get a tech job (~5min)\r\n\r\nIn this segment of the talk I share the story of how I got this job.\r\nThis will explain how my writing on my blog contributed to establish some reputation and how my (Python-focused) social media presence connected me with the person who would eventually become my employer.\r\n\r\n### Lesson learned 2 \u2013 put your ego aside (~5min)\r\n\r\nIn this segment of the talk I explain how I deal with PR reviews and how I've learned to embrace the criticism, taking into account that all of your work is scrutinised every time you make a PR.\r\nI'll also tell the story of how I made a couple of blunders in successive PRs, how my team dealt with those, and what I got away from those weeks when I underperformed.\r\n\r\n### Lesson learned 3 \u2013 interacting with users & contributors (~5/7min)\r\n\r\nThis segment of the talk covers the other end of the interactions on a FOSS project, answering questions like:\r\n - How should you behave when interacting with users making feature requests?\r\n - What about users that report \u201cbugs\u201d that would be \u201csolved\u201d if they read the documentation carefully?\r\n - How do you review external PRs, leave feedback, and request changes?\r\n\r\nDepending on how the audience reacts to this segment, I might also tell an anecdote about how bad I felt when rejecting an external PR and how that feeling was amplified tenfold when I found out that the external PR came from a \u201cPython personality\u201d, which also contains another lesson because the person whose PR was rejected handled it in the most graceful way possible.\r\n\r\n### Lesson learned 4 \u2013 working on a large project (~5min)\r\n\r\nI will dedicate this segment of the presentation to talk about the strategies I use to deal with the fact that the project I work on is too big for me to keep all of it in my head.\r\nThis includes my note-taking system and my PR checklist.\r\n\r\n### Wrap-up (~2min)\r\n\r\nTo wrap up the talk, I'll summarise my learnings and share a bullet-point list of the ones that are more likely to be helpful to others.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "deadfed1-3cce-5c6f-b1cc-2dd63deac2a5", "id": 13092, "code": "BLNV7P", "public_name": "Rodrigo Gir\u00e3o Serr\u00e3o", "avatar": "https://pretalx.com/media/avatars/BLNV7P_bI8IutA.jpg", "biography": "Rodrigo has always been fascinated by problem solving and that is why he picked up programming \u2013 so that he could solve more problems. He also loves sharing knowledge, and that is why he spends so much time writing articles in his blog [mathspp.com/blog](https://mathspp.com/blog), writing on Twitter [@mathsppblog](https://twitter.com/mathsppblog), and giving workshops and courses. You can also find his past talks on [github.com/mathspp/talks](https://github.com/mathspp/talks).\r\n\r\nHis main areas of scientific interest are mathematics (numerical analysis in particular) and programming in general (with a preference for the Python and APL languages), but Rodrigo also enjoys reading fantasy books, watching silly comedy movies and eating chocolate.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/VEACZM/", "id": 40987, "guid": "23632865-50ef-525d-9cef-a197535f888f", "date": "2024-04-24T14:45:00+02:00", "start": "14:45", "logo": null, "duration": "00:30", "room": "B05-B06", "slug": "pyconde-pydata-2024-40987-python-monorepos-the-polylith-developer-experience", "title": "Python Monorepos: The Polylith Developer Experience", "subtitle": "", "track": "PyCon: Programming & Software Engineering", "type": "Talk", "language": "en", "abstract": "What if writing software could be more like building with LEGO bricks? A more playful and productive developer experience. For me, that is all about writing code without the hassle. A productive setup should also let let us make design decisions while learning what to actually build, and allow changes during the way. Polylith solves this in a nice and simple way. I am the developer of the Open Source Python-specific tooling for Polylith. I\u2019ll walk through the simple Architecture & the Developer friendly tooling for a joyful Python Experience.", "description": "If you haven\u2019t heard about Polylith before: it has a really simple take on Software Architecture - with tooling support. Polylith is based on small building blocks, very much like LEGO bricks. In fact, the Polylith Architecture originates from the Clojure community and is well suited for functional programming. It is a fresh take on how to share & reuse code, by using monorepos in a very developer-friendly way. And we have that in Python!\r\n\r\nI am the developer of the Open Source Python-specific tooling for Polylith. I\u2019ll walk through the simple architecture & developer-friendly tooling for a joyful Python Experience.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "8340a5e5-4f48-50c5-8763-596aa68afb3f", "id": 13601, "code": "VLHGQG", "public_name": "David Vujic", "avatar": "https://pretalx.com/media/avatars/VLHGQG_5nKwpe4.jpg", "biography": "My name is David and I'm a software developer. Colleagues and friends may know me as an early adopter of agile ideas and test driven development. I am passionate about things like that, and share the things I learn to the community and the people I work with. My favorite programming languages are Python and Clojure. On my spare time I practice outdoor Parkour & contribute to Open Source.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/D7AEQY/", "id": 41707, "guid": "1dffaabf-ada4-5aae-8622-59d768c31b19", "date": "2024-04-24T15:20:00+02:00", "start": "15:20", "logo": null, "duration": "00:30", "room": "B05-B06", "slug": "pyconde-pydata-2024-41707-marketing-media-mix-models-with-python-pymc-a-case-study", "title": "Marketing Media Mix Models with Python & PyMC: a Case Study", "subtitle": "", "track": "General: Industry & Academia Use-Cases", "type": "Talk", "language": "en", "abstract": "In today's digital landscape, traditional analytics struggle with understanding marketing ROI, especially with evolving privacy norms. But Python and its ecosystem come to the rescue. \r\nIn this talk, we will discuss how we leveraged Python and PyMC to build a Bayesian Marketing Media Mix model for the fastest-growing Italian tour operator. We'll cover the challenges we faced, the valuable insights we gained, and the results achieved. This will offer you a clear and practical roadmap for developing a similar model for your business.", "description": "Understanding the effectiveness of various marketing channels is crucial to maximise the return on investment (ROI). However, the limitation of third-party cookies and an ever-growing focus on privacy make it difficult to rely on basic analytics. This talk discusses a pioneering project where a Bayesian model was employed to assess the marketing media mix effectiveness of WeRoad, the fastest-growing Italian tour operator.\r\n\r\nThe Bayesian approach allows for the incorporation of prior knowledge, seamlessly updating it with new data to provide robust, actionable insights. This project leveraged a Bayesian model to unravel the complex interactions between marketing channels such as online ads, social media, and promotions. We'll dive deep into how the Bayesian model was designed, discussing how we provided the AI system with expert knowledge, and presenting how delays and saturation were modelled. \r\n\r\nWe will also tackle aspects of the technical implementation, discussing how Python, PyMC, and Streamlit provided us with the all the tools we needed to develop an effective, efficient, and user-friendly system.\r\n\r\nAttendees will walk away with:\r\n\r\n- A simple understanding of the Bayesian approach and why it matters.\r\n- Concrete examples of the transformative impact on WeRoad's marketing strategy.\r\n- A blueprint to harness predictive models in their business strategies.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "c296f2b6-fd5d-5c56-8de9-22854fda30ff", "id": 37966, "code": "9CGGBC", "public_name": "Emanuele Fabbiani", "avatar": "https://pretalx.com/media/avatars/9CGGBC_JGHNmeM.png", "biography": "Engineer by education, Data Scientist by choice, researcher and lecturer by passion. Emanuele earned his PhD in AI by researching time series forecasting in the energy field. He was a guest researcher at EPFL Lausanne, and he's now the Head of AI at xtream, where he solves business problems with AI. He published 8 papers in international journals, presented and organized tracks and workshops at international conferences, including AMLD, ODSC, WeAreDevelopers, PyCon, and ERUM, and lectured in Italy, Switzerland, and Poland.", "answers": []}], "links": [], "attachments": [], "answers": []}], "A1": [{"url": "https://pretalx.com/pyconde-pydata-2024/talk/ECCJAG/", "id": 42881, "guid": "72523322-92f8-5de1-b376-f9b2628abc99", "date": "2024-04-24T10:30:00+02:00", "start": "10:30", "logo": null, "duration": "00:30", "room": "A1", "slug": "pyconde-pydata-2024-42881-flixbus-citysnap-how-we-use-genai-and-not-only-to-collect-captivating-images-for-cities-and-confirm-their-locations", "title": "FlixBus CitySnap: How we use GenAI and not only to collect captivating images for cities and confirm their locations", "subtitle": "", "track": "PyData: Natural Language Processing & Computer Vision", "type": "Talk", "language": "en", "abstract": "Have you ever wondered how travel e-commerce companies gather photos of cities? While I can't speak for everyone, I will demonstrate the innovative approach we are using at Flix.\r\n\r\nIn recent years, text-to-text models like ChatGPT and text-to-image models such as DALL-E 3 have become increasingly integrated into various industries. The main aim of these initiatives is typically to generate text or images. In our presentation, we propose a slightly different approach to leveraging these models commercially. Our objective is to gather images for thousands of cities that inspire travel. We utilize ChatGPT to tailor prompts for our business requirements, enabling efficient image retrieval through API queries from free stock image services. Then we apply image-to-text models to confirm the images' locations. Finally, we need to adjust the resolution of images for display across various platforms, such as social media campaigns on Instagram, email marketing, and on our website. To achieve this, we have used an automated cropping service to get images in the required aspect ratios, followed by Lanczos sampling for downscaling the images. This integration of cutting-edge models has resulted in an automated, highly flexible process that aligns with varied business needs. Our approach is cost-efficient; processing several hundred cities amounts to only a few euros, and we have utilized commonly available services, making replication easy for everyone.", "description": "Flix's buses serve over 5,000 cities, and to elevate our customers' experience, we aim to collect captivating photos for each city. Photo city collection task is not new, but previously, it was predominantly addressed with human resources. However, due to the extensive number and the growing scale of our bus network, manually gathering photos for each city is unfeasible and non scalable. In this talk, we will demonstrate how we built a fully automated end-to-end pipeline to achieve this goal. Our pipeline comprises three main steps.\r\n\r\nThe first step involves collecting city images from free image stock services like Pixabay and Pexels, via API. Simple queries by city names yielded poor results as not every image is enticing enough to inspire visits to the city. People often travel to see a city's landmarks, which is why we utilized ChatGPT to gather images of prominent landmarks for each city. \r\n\r\nThe second and most complicated step is to verify that the images accurately represent the targeted cities. Initially, we relied on metadata from the image stock services, such as tags from photographers. However, this information is often not sufficient to validate an image's location. To improve accuracy, we investigated various services. Models like DALLE from OpenAI can predict image locations but currently lack an API for full automation. We found two services from the Google Cloud Platform with APIs suitable for location validation: the Gemini multimodal and the landmark detection service.\r\n\r\nThe third and final step of our pipeline involves adjusting the images to various resolutions for display across different platforms, such as social media campaigns on Instagram, email marketing, and our website. This is achieved by cropping images to the desired aspect ratios using Google Cloud Vision API's smart cropping service, followed by Lanczos sampling for image downscaling, which is available in various open-source Python libraries. \r\n\r\nOur pipeline is a cost-efficient approach using widely available services, thereby facilitating easy replication. During this presentation, we will share our results across several countries, discuss the most challenging problems we encountered, and offer insights into how this pipeline could be improved with the release of upcoming cutting-edge models. We believe that our case shows how the industry can use Generative AI not only to create a new context, but also to find, analyze and filter publicly available information for different business needs.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "ae2f0c5f-9967-5fe3-b964-e649f3d19dbf", "id": 38771, "code": "TCFUAP", "public_name": "Andrei Chernov", "avatar": "https://pretalx.com/media/avatars/TCFUAP_p9kphwP.jpg", "biography": "Career:\r\nSince 2022, I have continued my career as a data scientist at FlixBus.\r\nFrom 2018 to 2022, I worked as a Data Scientist in banking.\r\n\r\nEducation:\r\nFrom 2021 to 2022, I received a micro master's degree in Finance.\r\nFrom 2019 to 2021, I received a master's degree in computer science.\r\nFrom 2015 to 2019, I received a bachelor's degree in applied math.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/DEKGYM/", "id": 43017, "guid": "4aa11a6e-9328-5801-95f7-dc98edcb64cd", "date": "2024-04-24T11:05:00+02:00", "start": "11:05", "logo": null, "duration": "00:30", "room": "A1", "slug": "pyconde-pydata-2024-43017-public-money-public-experiment-open-source-processes-in-the-public-administration", "title": "Public Money, Public Experiment - open source processes in the public administration", "subtitle": "", "track": "General: Others", "type": "Talk", "language": "en", "abstract": "Imagine a data lab in a federal ministry wants to publish python applications - how long could it possibly take? While open code is widely acknowledged as beneficial, the lack of thriving open code platforms from public institutions gets you wondering: a day, a week, months, or even years? \r\n\r\nWhen publishing code, a private person, a company or a public institution all face unique circumstances and take different considerations into account. While individuals or companies frequently publish their code and share their experiences, less is known about these processes in public institutions. In our talk we will cover how a data lab, located in a federal ministry would go about this topic. We will share insights into the publishing process, touching upon existing pioneers and the alignment of open source with administrative principles, as well as the hurdles, surprises, and regulatory considerations of our journey.\r\n\r\nSince we are a newly established unit with the word lab in our name, our talk delves into a unique real-world experiment: How much progress can our data lab make in publishing code within the three months leading up to PyCon DE & PyData Berlin 2024?", "description": "As one of many data labs in the public administration, sharing code and software increases the speed with which technical problems can be solved and reduces overall costs. In the previous months, we started collaborating with other public units to share a python prototype between labs. Now it's time for the next step: as we approach PyCon DE & PyData Berlin 2024, we aim to make code publicly available.\r\n\r\nThe presentation will address the following questions:\r\n\r\n1. How can the process of publishing code look like in a public administration and where can you get access to code already published? (Spoiler: Check out OpenCoDE)\r\n2. How does open source align with public administration principles? \r\n3. What legal and political and security requirements shape the process and possibly the code base?\r\n\r\nWhether we succeed or encounter challenges, this talk serves as an attempt to transparently share our journey and contribute to the broader discourse on the intersection of public administration and open source initiatives. Join us at PyCon DE & PyData Berlin 2024 and stay tuned for a glimpse into the evolving landscape of our code publication.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "6291c110-1ead-5204-ae63-ee12a6767457", "id": 22722, "code": "HQJATV", "public_name": "Lisa Reiber", "avatar": "https://pretalx.com/media/avatars/HQJATV_TzAQQsy.jpeg", "biography": "Lisa works as a data engineer at the in-house data lab of the German Federal Ministry for Family Affairs, Senior Citizens, Women and Youth.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/QCNXLW/", "id": 41814, "guid": "35327ba1-b253-51a7-af65-94b56b252556", "date": "2024-04-24T11:40:00+02:00", "start": "11:40", "logo": null, "duration": "00:30", "room": "A1", "slug": "pyconde-pydata-2024-41814-improve-llm-based-applications-with-fallback-mechanisms", "title": "Improve LLM-based Applications with Fallback Mechanisms", "subtitle": "", "track": "PyData: Generative AI", "type": "Talk", "language": "en", "abstract": "While RAG addresses the common LLM pitfalls, challenges like handling out-of-domain queries still persist. Learn the significance of fallback mechanisms to tackle these issues gracefully, incorporating strategies like web searches and alternative data sources to improve the user experience of your system. In this session, we\u2019ll discover various fallback techniques and practical implementation using Haystack, empowering you to develop resilient LLM-based systems for diverse scenarios without human intervention.", "description": "Large Language Model (LLM)-based systems have demonstrated remarkable advancements in various natural language processing (NLP) tasks, particularly through the Retrieval Augmented Generation (RAG) approach. This approach addresses some of the pitfalls associated with LLMs, such as hallucination or issues related to the recentness of its training data. However, RAG systems may encounter other challenges in real-world scenarios, including handling out-of-domain queries (e.g., requesting medical advice from a finance app), struggling to generate meaningful answers from retrieved data, or failing to provide any answer at all. To address these situations effectively, it is necessary to implement a fallback mechanism capable of gracefully handling such scenarios. \ud83e\uddd7\r\n\r\nThis fallback mechanism can incorporate alternative strategies, such as conducting a web search with the same query to retrieve more up-to-date information or utilizing alternative information sources (such as Slack, Notion, Google Drive, etc.) to gather more relevant data and generate a satisfactory or comprehensive response. However, the question arises: how can we determine if the response is inadequate? \ud83e\udd14\r\n\r\nDuring this session, we will explore various fallback mechanism techniques and ensure that our system can assess the adequacy of a response and improve it if necessary without human intervention. On the practical side, we will use the open source LLM framework Haystack to implement end-to-end RAG systems. By the end of this talk, you will have learned to select the appropriate fallback method for your use case, enabling you to develop more dependable and versatile LLM-based systems and implement them effectively using Haystack. \ud83d\udcaa", "recording_license": "", "do_not_record": false, "persons": [{"guid": "f8c5b23a-9de7-56c2-8983-a22c3cce5eb0", "id": 32412, "code": "PVSNUG", "public_name": "Bilge Y\u00fccel", "avatar": "https://pretalx.com/media/avatars/PVSNUG_e8eYgZh.jpg", "biography": "Bilge is a Developer Advocate at deepset, working with Haystack, an open source LLM framework. With over two years of experience as a Software Engineer, she developed a strong interest in NLP and pursued a master's degree in Artificial Intelligence at KU Leuven with a focus on NLP. Now, she enjoys working with Haystack and helping the community build LLM applications.  \u2728\u00a0\ud83e\udd51\r\nLet's connect if you'd like to talk about how fast the AI field is: [Twitter](https://twitter.com/bilgeycl), [Linkedin](https://www.linkedin.com/in/bilge-yucel/)", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/CWUQF3/", "id": 41817, "guid": "d2881bd7-8b81-5d8f-a26e-34af58b4dc52", "date": "2024-04-24T13:10:00+02:00", "start": "13:10", "logo": null, "duration": "00:30", "room": "A1", "slug": "pyconde-pydata-2024-41817-is-genai-all-you-need-to-classify-text-some-learnings-from-the-trenches", "title": "Is GenAI All You Need to Classify Text? Some Learnings from the Trenches", "subtitle": "", "track": "PyData: Natural Language Processing & Computer Vision", "type": "Talk", "language": "en", "abstract": "In recent times, GenAI has sparked fervent excitement, sometimes touted as the panacea for all natural language processing (NLP) tasks. This presentation explores a practical text classification scenario at Malt, highlighting the practical hurdles encountered when employing GenAI (latency, environmental impact, and budgetary constraints). To overcome these obstacles, a smaller, dedicated model emerged as a viable solution. We'll delve into the construction and optimization (quantization, graph optimization) of this multilingual model. Finally we\u2019ll see how GenAI's unparalleled zero-shot capabilities enables its continuous adaptation.", "description": "In recent times, GenAI has sparked fervent excitement, sometimes touted as the panacea for all natural language processing (NLP) tasks. This presentation explores a practical text classification scenario at Malt, highlighting first the practical hurdles encountered when employing GenAI (latency, environmental impact, and budgetary constraints). \r\n\r\nIn a second part, we\u2019ll cover how we overcame these obstacles by building a small dedicated model built from a pre-trained SentenceBERT [1], a model trained on semantic similarity. We'll explain how training a classification network on top of it preserves the original language alignment [2], enabling multilingual generalization.\r\n\r\nNext, we'll unveil the secret to unlocking even more efficiency: quantization and graph optimization techniques thanks to the ONNX ecosystem [3]. These optimizations while reducing even more the latency and resource consumption of this dedicated model enable it to be deployed with just a CPU.\r\n\r\nFinally, we\u2019ll see that GenAI still plays a relevant role in our text classification journey. Its unparalleled zero-shot capabilities allow us to continuously adapt our dedicated model, ensuring it remains relevant amidst an ever-changing product. \r\n\r\n[1] Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence embeddings using Siamese BERT-networks.\r\n[2] Reimers, N., & Gurevych, I. (2020). Making monolingual sentence embeddings multilingual using knowledge distillation.\r\n[3] https://onnx.ai/onnx/", "recording_license": "", "do_not_record": false, "persons": [{"guid": "26848e61-9970-58a2-a0eb-005c2cd82100", "id": 38368, "code": "JUMZBA", "public_name": "Marc Palyart", "avatar": "https://pretalx.com/media/avatars/JUMZBA_4yv0r60.jpg", "biography": "Marc Palyart is the Head of Data Science at Malt, the freelancer marketplace, where he leads the search and matching team. With over a decade of data-wizardry under his belt, he's ventured into the depths of academia and scaled the heights of industry where he's had the pleasure of collaborating with some truly remarkable people.", "answers": []}, {"guid": "eab4f7ea-31f2-5129-bfa2-5bbbea551790", "id": 38606, "code": "93VSJ9", "public_name": "Kateryna Budzyak", "avatar": "https://pretalx.com/media/avatars/93VSJ9_AMl8Nas.jpeg", "biography": "Kateryna is Data Scientist at Malt, the freelancer marketplace, where she works in the search and matching team. She has a background in bioinformatics and passionate about beautiful code.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/UXQJTF/", "id": 42752, "guid": "738c87be-837f-50e0-9fcc-266f6f84f177", "date": "2024-04-24T13:45:00+02:00", "start": "13:45", "logo": null, "duration": "00:30", "room": "A1", "slug": "pyconde-pydata-2024-42752-mostly-harmless-fixed-effects-regression-in-python-with-pyfixest", "title": "Mostly Harmless Fixed Effects Regression in Python with PyFixest", "subtitle": "", "track": "PyData: PyData & Scientific Libraries Stack", "type": "Talk", "language": "en", "abstract": "This session introduces PyFixest, an open source Python library inspired by the \"fixest\" R package. PyFixest implements fast routines for the estimation of regression models with high-dimensional fixed effects, including OLS, IV, and Poisson regression. The library also provides tools for robust inference, including heteroscedasticity-robust and cluster robust standard errors, as well as the wild cluster bootstrap. Additionally, PyFixest implements several routines for difference-in-differences estimation with staggered treatment adoption. \r\n\r\nPyFixest aims to faithfully replicate the core design principles of \"fixest\", offering post-estimation inference adjustments, user-friendly syntax for multiple estimations, and efficient post-processing capabilities. By making efficient use of jit-compilation, it is also one of the fastest solutions for regressions with high-dimensional fixed effects. \r\n\r\nThe presentation will cover PyFixest's functionality, design philosophy, and future development prospects.", "description": "When regression models contain very high-dimensional categorical features, estimation can become cumbersome: inverting a matrix with more than a few hundred rows is no simple task! Fortunately, the problem of estimating models with high-dimensional fixed effects has been effectively solved since at least the 1930s. A range of software packages now implement what is known as the Frisch-Waugh-Lovell Theorem (FWL) for efficient estimation of regression models with high-dimensional fixed effects. These packages are available in various programming languages, including Stata, R, Julia, and Python.\r\n\r\nAmong these, the R package fixest particularly stands out. It is not only blazing fast but also offers an innovative and user-friendly post-estimation functionality and syntax. \r\n\r\nWhen I started my journey with Python, fixest was the R package I missed the most. In fact, I missed it so much that I began working on PyFixest, a software package that aims to faithfully replicate all of fixest's innovations in Python.\r\n\r\nIn this talk, I will introduce the audience to both fixest and PyFixest and the FWL theorem that underpins these packages. We will explore how PyFixest can be used for analyzing AB Tests and for conducting event studies with staggered rollouts.\r\n\r\nFor more information:\r\n\r\n-    PyFixest GitHub repository: https://github.com/s3alfisc/pyfixest\r\n-    Introduction to PyFixest: https://aeturrell.github.io/coding-for-economists/econmt-regression.html#regression-basics\r\n-    PyFixest Documentation: https://s3alfisc.github.io/pyfixest/", "recording_license": "", "do_not_record": false, "persons": [{"guid": "95b3c7eb-3207-52e4-b2d6-97866f31d445", "id": 38800, "code": "KFGF9T", "public_name": "Alexander Fischer", "avatar": "https://pretalx.com/media/avatars/KFGF9T_UVIXeRX.JPG", "biography": "Economist and Data Scientist. I spend most of my week working on online auctions at Trivago and open source packages for regression modeling and inference in R and Python.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/BJUQ9E/", "id": 42950, "guid": "8651f41e-6e7f-5b46-8a68-474bc001baf0", "date": "2024-04-24T14:45:00+02:00", "start": "14:45", "logo": null, "duration": "00:30", "room": "A1", "slug": "pyconde-pydata-2024-42950-can-chatgpt-convince-you-to-get-a-covid19-vaccine-comparing-chatgpt-to-an-expert-system-which-one-is-more-convincing-", "title": "Can ChatGPT convince you to get a COVID19 vaccine? Comparing ChatGPT to an expert system - which one is more convincing?", "subtitle": "", "track": "PyData: Natural Language Processing & Computer Vision", "type": "Talk", "language": "en", "abstract": "This study explores the efficacy of chatbots as dialogical argumentation systems for behaviour change, focusing on vaccine hesitancy during the COVID-19 pandemic. A Python-based chatbot, developed in 2021, engaged in argumentative dialogues with users reluctant to get vaccinated, resulting in a 20% positive change in participants' stances. As natural language processing technologies, like ChatGPT, advance, it is crucial to compare them to traditional expert systems. Prior studies have shown ChatGPT's reliability in addressing vaccine hesitancy. This research compares our chatbot with ChatGPT, evaluating persuasiveness through crowdsourced participants. The findings inform resource allocation decisions, guiding the choice between domain-specific expert systems and enhancing versatile models like ChatGPT. Understanding comparative strengths aids in preventing the dissemination of misinformation in behaviour change contexts.", "description": "Chatbots have the potential of being used as dialogical argumentation systems for behaviour change applications. They thereby offer a cost-effective and scalable alternative to in-person consultations with health professionals that users could engage in from the comfort of their own home. During events like the global COVID-19 pandemic, it is even more important than usual that people are well informed and make conscious decisions that benefit themselves. Getting a COVID-19 vaccine is a prime example of a behaviour that benefits the individual, as well as society as a whole. In 2021, prior to the release of ChatGPT, we presented a chatbot (developed in Python using scikit learn and flask) that engaged in dialogues with users who did not want to get vaccinated, with the goal to persuade them to change their stance and get a vaccine. The chatbot was equipped with a small repository of arguments that it used to counter user arguments which were presented in free-text by the user on why they were reluctant to get a vaccine. We evaluated our chatbot in a study with participants and found that 20% of the participants had a positive change in stance (e.g. changing their stance from \"unlikely to get a vaccine\" to \"neutral\" or \"likely to get a vaccine\" after chatting with the chatbot).\r\n\r\nThe rapid advancements in natural language processing and the release of technologies such as ChatGPT raises the need to compare them to traditional expert systems in order to (1) identify potential problems in the new technologies and (2) assess whether they can replace traditional expert systems. Several studies have already used ChatGPT to address vaccine hesitancy and to tackle vaccine myths and concluded that ChatGPT is indeed a reliable source of non-technical information to the public. We were, therefore, interested to compare our system to ChatGPT and simulate the conversations participants had with our chatbot using ChatGPT and evaluate which conversations were considered more convincing by crowdsourced participants who are not domain experts.\r\n\r\nResearch like this helps us understand whether we need to continue investing resources into domain specific expert systems or rather invest them into improving ChatGPT and make it more reliable and credible to avoid spreading misinformation.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "b568852f-df5a-5424-84b6-c3cc3c10b948", "id": 25402, "code": "TXMZT7", "public_name": "Dr. Lisa Andreevna Chalaguine", "avatar": "https://pretalx.com/media/avatars/TXMZT7_9jbCEAQ.jpeg", "biography": "Meet Lisa, a dynamic force at the intersection of computer science, education, and global exploration. Armed with a Ph.D. from University College London in computer science, Lisa's academic journey was a quest to imbue chatbots with the art of engaging in compelling, argumentative dialogues using authentic language.\r\n\r\n\r\nDuring her time at UCL, Lisa discovered her passion for teaching, dedicating herself to shaping the minds of future technologists. Her expertise and enthusiasm led her to positions as an associate lecturer at UCL and a guest teacher at the prestigious Oxford University.\r\n\r\n\r\nEager for new horizons, Lisa decided to take her skills beyond the academic realm. Breaking away from London and the confines of traditional employment, she embraced the life of a digital nomad. Lisa now crisscrosses the globe, sharing her knowledge on Python and machine learning with corporate and private clients. Her expertise doesn't stop there; Lisa also delves into data science projects in the intriguing realm of legal tech.\r\n\r\n\r\nIn the digital realm, Lisa is a prolific creator of online teaching content, fluent in English, German, and Russian. When she's not immersed in the world of technology, Lisa finds balance through her love for yoga. As a certified Ashtanga and Rocket Vinyasa Yoga teacher, she brings mindfulness and physical well-being to her nomadic lifestyle. Additionally, Lisa is a skilled Thai Massage Therapist, adding a therapeutic touch to her repertoire.\r\n\r\n\r\nIn the quieter moments of her globetrotting life, Lisa unwinds with a good book, illustrating that even in the fast-paced world of technology and travel, there's always time for the simple pleasure of a captivating story. Lisa's journey exemplifies the harmonious blend of intellect, wanderlust, and a passion for sharing knowledge that defines her unique and inspiring presence in both the tech and wellness spheres.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/DWGV7W/", "id": 42851, "guid": "ec1bfb6b-8aa4-56f3-9d32-239ae21ef49b", "date": "2024-04-24T15:20:00+02:00", "start": "15:20", "logo": null, "duration": "00:30", "room": "A1", "slug": "pyconde-pydata-2024-42851-the-struggles-we-skipped-data-engineering-for-the-tiktok-generation", "title": "The Struggles We Skipped: Data Engineering for the TikTok Generation", "subtitle": "", "track": "PyData: Data Handling & Engineering", "type": "Talk", "language": "en", "abstract": "In a world increasingly embracing Python, plug-and-play solutions and AI-generated code, our generation growing up with these advancements may not fully grasp the challenges faced by our predecessors. Meanwhile, data engineering, traditionally known for its complexity, can now transition into the plug-and-play realm too, thanks to Python libraries such as dlt. \r\n\r\nAimed to be both fun and insightful, this talk will educate the listener on the concepts of data engineering our generation finds most important and enable them to use high level abstractions to automate most of what used to be highly manual work. The juniors will gain an appreciation for the difficulties in data pipeline engineering, the seniors - a straightforward solution to expedite the creation of robust pipelines.\r\n\r\nFrom the perspective of junior data engineers such as us, the talk will walk through the challenges associated with constructing a data pipeline and demonstrate how these can be effectively addressed using Python libraries such as dlt that simplify the intricacies of data extraction, transformation, and loading.", "description": "A tale of two junior data engineers. \r\n\r\nOur generation of developers might have it \u201ceasy\u201d due to there being a plethora of tools available to automate and plug and play everything. However, this abundance poses challenges in breaking into a field. This talk explores the perspectives of two junior data engineers\u2014one entirely new to data and the other with a data science background\u2014both navigating the complexities of data engineering.\r\n\r\nThe first one, a data scientist navigating her tasks without the luxury of well-formatted data. This journey inadvertently led to a gradual familiarity with complex tools like Spark, and the necessity of understanding various connectors and writing detailed code for data extraction and normalization. With the introduction of dlt, a significant shift occurred. This technology automated many of the tedious processes, allowing analysts to focus more on analytics, and less on tedious data handling.\r\n\r\nThe second one, never having had to deal with the chaos of unstructured data, was directly introduced to dlt. Spared by the typical struggles faced by traditional data engineers, she's set to find out what happens behind dlt\u2019s automation throughout the talk. After realizing that the two lines of Python code she wrote saved her from the manual tasks of data normalization, structuring, and loading, she will gain an appreciation for the tools at her disposal, especially dlt.\r\n\r\ndlt, or data load tool is an open-source python library for data teams of all sizes. It can extract a range of data formats from various sources, then normalizes that unstructured data into a relational structure and loads it into the destination of your choice. All of this is done within a few lines of Python code, as compared to the usage of different tools that were needed to get these tasks done. It is a valuable and cost effective addition to a company\u2019s data stack.\r\n\r\nThe talk will follow a step-by-step, linear narrative to outline the challenges of building a data pipeline and illustrate how dlt can resolve these issues, thereby automating the process. Beginning with schema inference and evolution, then progressing to dependency handling and data governance, each challenge will be portrayed as a quest on the journey to constructing a well-defined data pipeline. As junior data engineers, we would like to emphasize the paradigm shift in data engineering towards a greater level of abstraction. This shift, enabled by tools such as dlt's declarative incremental loading, empowers junior engineers to tackle tasks that traditionally would not be considered junior-level work.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "a63ce353-c8e9-5983-879f-70363654e615", "id": 38739, "code": "MNGHHW", "public_name": "Anuun", "avatar": "https://pretalx.com/media/avatars/MNGHHW_oQyClhG.jpeg", "biography": "Writer by choice and a data enthusiast at heart. Crafting compelling narratives with Open Source Software at dltHub. With a background in International Relations, I am currently pursuing Computer Science, focusing on Machine Learning, at TU Berlin.", "answers": []}, {"guid": "9c325acb-6e73-5dd6-be13-545408c0d1b1", "id": 38864, "code": "ZVBWJZ", "public_name": "Hiba Jamal", "avatar": "https://pretalx.com/media/avatars/ZVBWJZ_HyMoYSg.jpeg", "biography": "The data field has been my home for 3 years. I'm now a Data Science Working Student at dltHub in Berlin. Previously, I contributed as a researcher, data scientist and business analyst in startups and government-funded projects in Pakistan. Currently pursuing a master's degree in data analytics and AI for business management, I hold a prior degree in Computer Science with a touch of liberal arts.", "answers": []}], "links": [], "attachments": [], "answers": []}], "A03-A04": [{"url": "https://pretalx.com/pyconde-pydata-2024/talk/SYJE7B/", "id": 41746, "guid": "ca88e879-37f0-5662-ad08-4d20e5e3f567", "date": "2024-04-24T10:30:00+02:00", "start": "10:30", "logo": null, "duration": "01:30", "room": "A03-A04", "slug": "pyconde-pydata-2024-41746-lose-your-fear-of-equations-", "title": "Lose your fear of equations!", "subtitle": "", "track": "PyData: Data Handling & Engineering", "type": "Tutorial", "language": "en", "abstract": "The skill of quickly judging what a formula does and how changing a parameter will affect the result is crucial when dealing with real-life data science - but it's a skill not easily acquired if you don't come from a STEM background. In this tutorial we'll work on guesstimating what complex mathematical expressions do so that you, too, can lose your fear of math!", "description": "If you transitioned into data science from \"soft\" sciences, you've already had a steep learning curve. Coding, data engineering, statistics... There is a lot to catch up on. And while there are plenty of true black box models in machine learning, just as many can and should be described in mathematical terms.\r\n\r\nThis tutorial is for everyone who is scared by formulae. We will learn how to quickly recognize which part of an equation matters and how changing individual parameters will affect it. We will make differential equations less scary and get a \"feel\" for the logistic function that goes beyond running Logreg in sklearn.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "8145cb31-004b-5639-bcd7-df73cf15d469", "id": 1950, "code": "HKMYKS", "public_name": "Darina Goldin", "avatar": "https://pretalx.com/media/avatars/HKMYKS_PDbYUYp.jpeg", "biography": "Dr. Darina Goldin has a Ph.D. in Control Science, but secretly wishes she had pursued a degree in pure math instead. Either way, she'd probably still have ended up as a principal data scientist at Bayes Esports, where she has been modeling Esports Data for the past 8 years. When not at her desk, Darina can be found on the mats competing in Brazillian Jiu-Jitsu.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/LERYUY/", "id": 41838, "guid": "04ebf4da-2834-5d11-b397-b521de02ddd1", "date": "2024-04-24T13:00:00+02:00", "start": "13:00", "logo": null, "duration": "01:30", "room": "A03-A04", "slug": "pyconde-pydata-2024-41838-a-deep-dive-into-the-arrow-columnar-format-with-pyarrow-and-nanoarrow", "title": "A deep dive into the Arrow Columnar format with pyarrow and nanoarrow", "subtitle": "", "track": "PyData: PyData & Scientific Libraries Stack", "type": "Tutorial", "language": "en", "abstract": "Apache Arrow has become a de-facto standard for efficient in-memory columnar data representation. You might have heard about Arrow or using Arrow, but do you understand the format and why it\u2019s so useful? This tutorial will dive deep into the details of the Arrow columnar format, the different types and buffer layouts, and explore those details interactively using the pyarrow and nanoarrow libraries.", "description": "**You can find the material and setup instructions at https://github.com/voltrondata-labs/2024-arrow-format-tutorial/**\r\n\r\nAccording to the website, Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing. Nowadays, the Arrow project encompasses many things, including serialization, messaging and database specifications and a variety of language implementations. But at its core is the Columnar Format: a language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware like CPUs and GPUs. \r\nThis format is being used (fully or partially) by many libraries that you might know, such as pandas, polars, datafusion, duckdb, cudf, influxdb, and many more. \r\n\r\nThis tutorial will dive into the details of the Columnar format, explore the physical memory layout and the different data types. It will do so with interactive code examples using the pyarrow and nanoarrow libraries, learning how you can create and inspect Arrow data with those libraries. So at once you will also learn a bit about those two libraries, but the insights about the columnar format itself is general for any project using such data under the hood.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "7e876587-827f-57eb-8ec2-ba1bbb58a7f3", "id": 75, "code": "7VUXWM", "public_name": "Joris Van den Bossche", "avatar": "https://pretalx.com/media/avatars/7VUXWM_5SP7h9s.png", "biography": "I am a core contributor to pandas and Apache Arrow, and a maintainer of GeoPandas. I did a PhD at Ghent University and VITO in air quality research and worked at the Paris-Saclay Center for Data Science. Currently, I work at Voltron Data, contributing to Apache Arrow, and am a freelance teacher of python (pandas) at Ghent University.", "answers": []}, {"guid": "0bb1b110-34cf-5d7e-a569-4dbdafdc8fb2", "id": 37505, "code": "ZDP7FM", "public_name": "Alenka Frim", "avatar": "https://pretalx.com/media/avatars/ZDP7FM_Zz7LwcT.jpg", "biography": "My software development journey started with open source and Apache Arrow project. More specifically, I started with contributing to the Arrow R package in 2021. After that I have contributed to other open source projects connected to the Python dataframe API standard while on Quansight and became a Apache Arrow committer in 2022 after being a regular contributor to Apache Arrow (Python) since 2021. I am currently working at Voltron Data as a Software Engineer.", "answers": []}, {"guid": "889683ec-286e-5e7e-af85-c909a8c29302", "id": 37502, "code": "EMFTT7", "public_name": "Ra\u00fal Cumplido", "avatar": "https://pretalx.com/media/avatars/EMFTT7_eBvZyAY.jpg", "biography": "I started working with Python in 2008 with Python 2.5 and since then it became my language of choice. I have been involved in the Spanish Python community being one of the co-founders of the Python Spanish Association. I have been involved in the organisation of EuroPython in Bilbao, several PyCon ES (Spain) and the Barcelona meetup.\r\nA couple of years ago I started working in Apache Arrow and since then I have become a committer and a PMC member and I want to share to the rest of the world what we have done and what we are doing.", "answers": []}], "links": [], "attachments": [], "answers": []}], "A05-A06": [{"url": "https://pretalx.com/pyconde-pydata-2024/talk/A8HJHV/", "id": 41462, "guid": "abfcedaf-c48d-50c3-8f08-9937c8b84b98", "date": "2024-04-24T10:30:00+02:00", "start": "10:30", "logo": null, "duration": "01:30", "room": "A05-A06", "slug": "pyconde-pydata-2024-41462-securing-python-race-condition-vulnerabilities", "title": "Securing Python: Race Condition Vulnerabilities", "subtitle": "", "track": "PyCon: Security", "type": "Tutorial", "language": "en", "abstract": "This workshop addresses the critical and often underestimated topic of race conditions in Python, with a focus on their security implications. We begin with an overview of race conditions, explaining their nature and the security risks they pose. Participants will engage with small Python applications designed to demonstrate these vulnerabilities. Through hands-on analysis, we identify where and why these race conditions occur. The session progresses to simulate attacks exploiting these weaknesses, highlighting their potential for exploitation. Finally, we explore effective mitigation strategies, emphasizing thread synchronization and safe programming practices. The workshop aims to equip attendees with a deep understanding of race conditions in Python and practical skills to enhance the security and robustness of their code.", "description": "We will begin by exploring the fundamentals of race conditions, and understanding how concurrent processes can lead to unpredictable and hazardous outcomes. This segment focuses on the theoretical underpinnings and real-world implications of these conditions in Python applications.\r\n\r\nNext, the workshop transitions into a more hands-on approach. Participants will be presented with small, intentionally vulnerable Python applications. These applications are designed to showcase various forms of race conditions, providing a practical context for understanding their impact. We will analyze the source code of these applications, identifying the critical sections where race conditions occur and discussing why these vulnerabilities are often overlooked during development.\r\n\r\nFollowing the analysis, the workshop shifts to the offensive aspect. We will simulate attacks exploiting these race conditions. This exercise aims to demonstrate the ease with which malicious entities can take advantage of these vulnerabilities, underscoring the importance of addressing them in the development phase.\r\n\r\nThe final segment of the workshop is dedicated to resolution strategies. We will explore various techniques and best practices to mitigate race conditions in Python. This includes implementing thread synchronization mechanisms, such as locks, semaphores, and queues, and adopting safe programming practices that minimize the risk of concurrent execution issues. We'll also discuss how to incorporate these strategies into the software development lifecycle to enhance code quality and maintainability.\r\n\r\nThroughout the workshop, emphasis will be placed on clean, maintainable, and secure code architecture, aligning with contemporary best practices in Python development. By the end of the session, participants will not only have a thorough understanding of race conditions and their security implications but also possess the knowledge and tools to identify, exploit, and mitigate these vulnerabilities in their Python projects.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "c7d21853-c477-51b7-a000-5e7a51d1ce0d", "id": 25815, "code": "C9GDRS", "public_name": "Shahriyar Rzayev", "avatar": "https://pretalx.com/media/avatars/C9GDRS_Cb18BDy.jpeg", "biography": "Senior Software Engineer @NordVPN at Nord Security.\r\nInterested in Security, Architecture, and Clean Code.\r\nLeading Azerbaijan Python User Community.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/pyconde-pydata-2024/talk/AT9HCG/", "id": 41719, "guid": "238f48cf-40f9-5f57-8425-92d7fb47c43b", "date": "2024-04-24T13:00:00+02:00", "start": "13:00", "logo": null, "duration": "01:30", "room": "A05-A06", "slug": "pyconde-pydata-2024-41719-django-loves-strawberries", "title": "Django loves strawberries", "subtitle": "", "track": "PyCon: Django & Web", "type": "Tutorial", "language": "en", "abstract": "Explore the dynamic duo of GraphQL Strawberry and Django in an immersive workshop! Discover the seamless integration of Strawberry with Django, mastering type definitions, queries and  mutations. Harness the power of Starlette for efficient API development, empowering your projects with this potent blend of cutting-edge technologies.", "description": "<strong>Update<br />\r\nPlease prepare the Workshop as described [here](https://github.com/Speedy1991/strawberry-workshop)</strong><br />\r\n\r\n---------------------------------------\r\n\r\n\r\nDelve into the world of GraphQL Strawberry and Django in this comprehensive workshop designed to unravel the intricacies of these technologies. Throughout the sessions, participants will navigate the synergy between Strawberry, a GraphQL library for Python, and Django, a robust web framework. The workshop kicks off with an exploration of type definitions, offering insights into creating robust schemas and defining custom types to suit project requirements.\r\n\r\nMoving beyond the fundamentals, attendees dive into the realm of queries and mutations, mastering the art of fetching data and manipulating it through GraphQL. With Django's ORM seamlessly integrated into Strawberry, participants discover how to effortlessly execute complex queries and mutations.\r\n\r\nFurthermore, the workshop explores the integration of Starlette, a lightweight ASGI framework, into the mix. Uncover how Starlette complements Django and Strawberry, enhancing API development with its performance and flexibility.\r\n\r\nThe hands-on approach of this workshop ensures participants grasp each concept thoroughly. Through guided exercises and practical examples, attendees gain confidence in implementing GraphQL APIs using Strawberry and Django, unlocking the potential to build robust and scalable applications.\r\n\r\nBy the workshop's conclusion, participants will have a comprehensive understanding of:\r\n\r\n- Creating GraphQL schemas using Strawberry and Django\r\n- Executing queries and mutations seamlessly within Django applications\r\n- Leveraging Starlette for efficient API development alongside Django\r\n\r\nWhether you're a seasoned developer or new to these technologies, this workshop promises to equip you with the skills needed to harness the combined power of GraphQL Strawberry and Django for your projects' success.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "b478e7fb-a88f-59fa-853d-0dfff9c73e78", "id": 38339, "code": "N3QP3G", "public_name": "Arthur Bayr", "avatar": "https://pretalx.com/media/avatars/N3QP3G_UOXbUKg.png", "biography": "- Working for [sdox](https://www.sdox.io/) for 4 year as a fullstack engineer with focus on API architecture/performance and security on BE and FE site\r\n- Django for 7+ years\r\n- Vue/React/Apollo for 5+ years\r\n- DevOps and Server Admin for 5+ years\r\n- Prompt Engineer (raising 3 kids \u2192 giving some input \u2192 get random response)", "answers": []}], "links": [], "attachments": [], "answers": []}]}}]}}}