PyData Boston 2025

The speaker’s profile picture
Aayush Gauba

Aayush Gauba is a researcher and developer working at the intersection of machine learning, quantum-inspired models, and AI security. He has created open-source projects such as AIWAF, an adaptive web application firewall, and has published research on quantum-inspired neural architectures and robust learning methods. His work focuses on building practical tools that are both scientifically innovative and accessible to the wider Python community. Outside of research, Aayush is passionate about sharing knowledge through talks, tutorials, and collaborations that bridge theory with real world application.

  • Embracing Noise: How Data Corruption Can Make Models Smarter
The speaker’s profile picture
Allen Downey

Allen Downey is a professor emeritus at Olin College and Principal Data Scientist at PyMC Labs. He is the author of several books -- including Think Python, Think Bayes, and Probably Overthinking It -- and a blog about programming and data science. He is a consultant and instructor specializing in Bayesian statistics. He received a Ph.D. in computer science from the University of California, Berkeley, and Bachelor's and Master's degrees from MIT.

  • The SAT math gap: gender difference or selection bias?
The speaker’s profile picture
Brandon (Anbang) Wu

Brandon (Anbang) Wu is a Senior Machine Learning Engineer at Quizlet, where he drives search relevance for tens of millions of learners worldwide. Previously at Shopify At Shopify, he built large-scale recommendation systems that powered product discovery for hundreds of thousands of merchants. Earlier at NBCUniversal’s Fandango, he led machine learning initiatives developing content recommendation algorithms for both theatrical and streaming platforms. Brandon holds master’s degrees in Computer Science from Georgia Tech and Analytics from UCLA.

  • Unlocking Smarter Typeahead Search: A Hybrid Framework for Large-Scale Query Suggestions
The speaker’s profile picture
Caitlin Lewis
  • fastplotlib: driving scientific discovery through data visualization
The speaker’s profile picture
Daina Bouquin

Daina brings technical depth and community-building expertise to her role as Sr. Developer Relations Engineer at Anaconda. With over 12 years bridging data science, library science, and open source advocacy, she's spent her career making complex technology more accessible to researchers and practitioners. Her work has included pioneering software citation and preservation initiatives at the Harvard-Smithsonian Center for Astrophysics and developing AI evaluation frameworks for federal agencies. This experience has given her insight into both the technical challenges developers face and the human side of adopting new tools. At Anaconda, she works to strengthen connections between Anaconda's engineering teams and the broader developer community, creating resources and fostering relationships that help people solve important problems with open source tools.

  • Is Your LLM Evaluation Missing the Point?
The speaker’s profile picture
Dr. Rebecca Bilbro

Dr. Rebecca Bilbro, co-founder and CTO of Rotational Labs, is a trailblazer in applied AI and machine learning engineering. She co-created Yellowbrick, a Python library that enhances model diagnostics by integrating scikit-learn and matplotlib APIs, facilitating more intuitive model steering.

At Rotational Labs, Dr. Bilbro leads initiatives that empower companies to harness their domain expertise and data, resulting in the successful deployment of large language models and data-driven products. Her efforts bridge the gap between data science and engineering, driving AI solutions that are grounded in real-world business needs, informed by research, rigorously prototyped, and built with deployment and data governance in mind.

She is the co-author of Applied Text Analysis with Python (2018, O’Reilly) and Apache Hudi: The Definitive Guide (2025, O’Reilly). Dr. Bilbro earned her Ph.D. from the University of Illinois, Urbana-Champaign, focusing her research on domain-specific languages within engineering.

  • Where Have All the Metrics Gone?
The speaker’s profile picture
Eric Ma

As Senior Principal Data Scientist at Moderna Eric leads the Data Science and Artificial Intelligence (Research) team to accelerate science to the speed of thought. Prior to Moderna, he was at the Novartis Institutes for Biomedical Research conducting biomedical data science research with a focus on using Bayesian statistical methods in the service of discovering medicines for patients. Prior to Novartis, he was an Insight Health Data Fellow in the summer of 2017 and defended his doctoral thesis in the Department of Biological Engineering at MIT in the spring of 2017.

Eric is also an open-source software developer and has led the development of pyjanitor, a clean API for cleaning data in Python, and nxviz, a visualization package for NetworkX. He is also on the core developer team of NetworkX and PyMC. In addition, he gives back to the community through code contributions, blogging, teaching, and writing.

His personal life motto is found in the Gospel of Luke 12:48.

  • Building LLM Agents Made Simple
The speaker’s profile picture
Gilberto Hernandez

Gilberto has spent over a decade shaping technical developer education worldwide. To date, he's made complex concepts accessible to over 100,000 students and engineers through both online learning platforms and in-person experiences.

At Codecademy, he authored and launched several of their foundational courses. Since then, he's worn multiple hats as both product manager and technical content creator at industry leading companies, including MongoDB, Domino Data Lab, Plaid, and Snowflake.

Gilberto is passionate about crafting exceptional developer experiences and educational resources. He frequently writes about data engineering, AI, and application development.

Connect with him on LinkedIn: https://www.linkedin.com/in/gilberto-hernandez/

  • From Notebook to Pipeline: Hands-On Data Engineering with Python
The speaker’s profile picture
Ian Stokes-Rees

Ian is a Computational Scientist and Software Engineer. His current role is "Partner in AI Engineering" with BCG, a global management consulting firm. He works with BCG's clients around the world to identify opportunities to combine data, technology, and analytics to create step change capabilities in their organization. What does that translate to? On a day-to-day basis it means leading crack teams of BCG engineers and data scientists in the development of AI-driven and (typically) Python-based bespoke solutions which leverage the best tools, technology, and techniques available.

Prior to BCG, Ian was a Product Manager at Anaconda, and has been in the Python community for over 20 years. Ian has a PhD from Oxford where he worked on the CERN LHCb experiment and developed the Python-based distributed computing middleware that managed 10 million queued physics jobs to schedule across a quarter million servers in a globally federated compute environment. He also spent several years at Harvard Medical School collaborating with bio physicists on novel techniques for protein structure discovery.

Ian is a member of the Python Software Foundation and the Open Source Initiative. In his free time he enjoys sailing, cycling, xc skiing, and motorbiking. On rainy days he'll pull out a board game to play with his wife & kids: Ark Nova, Ticket To Ride, and Takenoko are current favorites.

  • "Save your API Keys for someone else" -- Using the HuggingFace and Ollama ecosystems to run good-enough LLMs on your laptop
The speaker’s profile picture
Isaac Godfried
  • Going multi-modal: How to leverage the lastest multi-modal LLMs and deep learning models on real world applications
The speaker’s profile picture
Itamar Turner-Trauring

Itamar Turner-Trauring is a consultant, and writes about Python performance at https://pythonspeed.com/. He helps companies maintain open source software and speed up their data processing code.

In his spare time he is a volunteer with Cambridge Bicycle Safety, and writes about Cambridge local politics at Let's Change Cambridge.

  • Processing large JSON files without running out of memory
The speaker’s profile picture
Jacob Tomlinson

Jacob Tomlinson is a senior Python software engineer at NVIDIA with a focus on deployment tooling for distributed systems. His work involves maintaining open source projects including RAPIDS and Dask. RAPIDS is a suite of GPU accelerated open source Python tools which mimic APIs from the PyData stack including those of Numpy, Pandas and SciKit-Learn. Dask provides advanced parallelism for analytics with out-of-core computation, lazy evaluation and distributed execution of the PyData stack. He also tinkers with the open source Kubernetes Python framework kr8s in his spare time. Jacob volunteers with the local tech community group Tech Exeter and lives in Exeter, UK.

  • Accelerating Geospatial Analysis with GPUs
The speaker’s profile picture
Jake Lorocco
  • Generative Programming with Mellea: from Agentic Soup to Robust Software
The speaker’s profile picture
Jaya Venkatesh

Jaya Venkatesh is a software engineer at NVIDIA, working on the RAPIDS ecosystem with a focus on simplifying deployment in the cloud and distributed systems. Previously, Jaya worked as a machine learning engineer at Pixxel Space, where he developed large scale, real-time inferencing models for Earth Observation. He holds a Master’s degree in Computer Science from Arizona State University, where his research project centered on snow melt monitoring in the Arizona region through satellite imagery analysis.

  • Accelerating Geospatial Analysis with GPUs
The speaker’s profile picture
Jules Walzer-Goldfeld

Jules was a Mathematics and Computer Science major at Williams College, with an interest in data and data visualization. He is excited about interactivity with data, whether that be tables, emails, dashboards, emails, or fully-fledged websites. He is now working on open source tools for data: namely Great Tables and email in Python for Posit.

  • Wrappers and Extenders: Companion Packages for Python Projects
The speaker’s profile picture
Kushal Kolar

PhD Candidate at NYU. 10+ years of experience using Python for data analysis and machine learning with neuroscience datasets. Core developer of fastplotlib and maintainer of several Python libraries in neuroscience with significant user bases, and a contributor to other libraries such as tslearn.

  • fastplotlib: driving scientific discovery through data visualization
The speaker’s profile picture
Luca

Luca Fiaschi is an accomplished tech executive and AI/ML expert with over 15 years of leadership experience in AI, data science, and analytics teams at hypergrowth technology companies. He currently serves as a partner at PyMC Labs, where he drives Gen AI solutions and Bayesian consultancy for enterprise clients. Previously, he led as Chief Data & AI Officer at Mistplay, and held executive roles at HelloFresh, Stitch Fix, Rocket Internet, and Redmart/Alibaba, delivering scalable AI-driven products and revenue growth through advanced personalization and experimentation platforms.

  • MMM Open- Source Showdown: A Practitioner's Benchmark of PyMC-Marketing vs. Google Meridian
The speaker’s profile picture
Mingxuan Zhao

Ming Zhao is an open source developer and Developer Advocate at IBM, where he helps IBM leverage open technologies while building impactful tools and growing vibrant open-source communities. He’s passionate about making open tech accessible to all and ensuring developers have the tools they need to succeed in the rapidly developing AI space. Ming now leads community efforts around Docling, IBM’s fastest-growing open source project, recently welcomed into the LF AI & Data Foundation.

  • Learn to Unlock Document Intelligence with Open-Source AI
The speaker’s profile picture
Nathan Fulton

Nathan Fulton is a manager at IBM Research. He is an expert in large language models, formal verification, and reinforcement learning. Nathan earned bachelors degree from Carthage College in Computer Science and Mathematics, and a Ph.D. from Carnegie Mellon University's Computer Science Department. During his PhD studies, Nathan was a member of André Platzer's Logical Systems Lab and a core developer of the KeYmaera X theorem prover for hybrid systems. Nathan has previously worked as a Senior Applied Scientist at Amazon Web Services and as a Research Scientist at the MIT-IBM AI Lab.

  • Generative Programming with Mellea: from Agentic Soup to Robust Software
The speaker’s profile picture
Naty Clementi

Naty Clementi is a senior software engineer at NVIDIA. She is a former academic with a Masters in Physics and PhD in Mechanical and Aerospace Engineering to her name. Her work involves contributing to RAPIDS, and in the past she has also contributed and maintained other open source projects such as Ibis and Dask. She is an active member of PyLadies and an active volunteer and organizer of Women and Gender Expansive Coders DC meetups.

  • Accelerating Geospatial Analysis with GPUs
The speaker’s profile picture
Paddy Mullen

Paddy Mullen is a full‑stack engineer and data‑tooling builder. An early employee at Anaconda, he contributed to the Bokeh visualization library. He has built data tools and led teams at hedge funds and startups. Since 2023 he has been developing Buckaroo, an interactive dataframe viewer for notebook environments.

  • The Column's the limit: interactive exploration of larger than memory data sets in a notebook with Polars and Buckaroo
The speaker’s profile picture
Sarthak Pattnaik

I have a passion for leveraging data to drive transformative outcomes. My journey spans across diverse roles, including that of a data analyst, data engineer, and currently an AI engineer.

As a Graduate Research Assistant at Boston University, I was involved in a wide array of projects that allowed me to creatively juxtapose the technological aspects of data science and machine learning on top of sophisticated concepts from finance, advertising, energy to perform analysis on interesting use cases. I presented my research at prominent conferences like Computer Science and Education in Computer Science (CSECS), ITISE, and NEDSI.

My professional experience encapsulates working with a wide array of AI and Cloud tools like OpenAI and Gemini models, RAGs, Agents using crew.ai, MCPs with Claude, advanced prompting, Snowflake, SnapLogic, and Power BI.

I am on a relentless pursuit of knowledge and excellence, committed to harnessing the power of data for informed decision-making and driving meaningful impact. Let's connect and explore how my versatile skill set can contribute to your data-centric endeavors.

  • Tracking Policy Evolution Through Clustering: A New Approach to Temporal Pattern Analysis in Multi-Dimensional Data
The speaker’s profile picture
Serhii Sokolenko

Serhii Sokolenko is a co-founder of Tower.dev. Tower orchestrates Python-native workflows and offers management tools for data lakehouses. Prior to founding Tower, Serhii worked at Databricks, Snowflake and Google on data processing and databases.

  • Surviving the Agentic Hype with Small Language Models
The speaker’s profile picture
Susan Shu Chang

Susan Shu Chang is a Principal Data Scientist at Elastic (Elasticsearch). She has spoke at 6 PyCons around the world, and is the author of Machine Learning Interviews (O'Reilly).

  • Evaluating AI Agents in production with Python
The speaker’s profile picture
Yunxin Gao

Yunxin holds a Bachelor’s degree in Applied Statistics from the University of Wisconsin–Madison and a Master’s degree in Applied Statistics from New York University, with a focus on data science and big data. Since completing graduate school, Yunxin has worked under Model Risk in the finance industry for the past 2.5 years, where they specialize in evaluating, validating, and interpreting complex quantitative models. Their experience spans statistical modeling, machine learning, and model risk management, with a strong emphasis on translating analytical insights into actionable business decisions.

  • Rethinking Feature Importance: Evaluating SHAP and TreeSHAP for Tree-Based Machine Learning Models