SciPy 2026

Grammars of Data: lessons from ~20 years of the tidyverse
2026-07-16 , Thomas Swain Room

The tidyverse is a collection of R packages designed to facilitate data science. My team and I have been working on it for nearly 20 years, and in this talk, I’ll share some of what we’ve learned about software development and open source community building in that time.

It’s very clear that AI is having a profound impact on how we develop software and do data science, so I’ll also offer a look into the (near) future, discussing how we’re updating our thinking about how people will do data science, and speculating on what work is likely to have the biggest impact.


My team and I have spent the last almost 20 years building a collection of R packages known as the tidyverse. The tidyverse includes packages like ggplot2 (for visualisation) and dplyr and tidyr (for data manipulation) and is designed to make data science easier to learn by embracing a consistent design across makes. The overall aim of the tidyverse is make data science faster, more effective, more fun, and more accessible to more people.

The tidyverse was named and created in 2016, but the core ideas started development in 2006 with ggplot and reshape, predecessors of the core ggplot2 and tidyr packages. We’ve learned a lot about software development and open source community building over those 20 years and I’d love to share some of what we’ve learned with the scipy community.

I’ll also talk about how we’re thinking about coding data science today: it’s clear that AI is having and will continue to have a profound impact the practice of data science. What are the implications for open source tool builders? What does it mean for our identities as programmers and data scientists? It’s hard to speculate too much, but I will discuss the changes we’re seeing (and making!) and offer some very near term predictions.

Hadley is Chief Scientist at Posit PBC, winner of the 2019 COPSS award, and a member of the R Foundation. He builds tools (both computational and cognitive) to make data science easier, faster, and more fun. His work includes packages for data science (like the tidyverse, which includes ggplot2, dplyr, and tidyr)and principled software development (e.g. roxygen2, testthat, and pkgdown). He is also a writer, educator, and speaker promoting the use of R for data science. Learn more on his website, http://hadley.nz.