PyCon DE & PyData 2025

Topological data analysis: How to quantify "holes" in your data and why?
2025-04-25 , Zeiss Plenary (Spectrum)

Do you need to compare sets of points in a plane? Identify a potential cyclic event in high-dimensional time series data? Find the second or the third highest peak of a noisily sampled function? Topological data analysis (TDA) is not a universal hammer, but it might just be the 16 mm wrench for your 16 mm hex head bolt. There is no shortage of Python libraries implementing TDA methods for various settings, but navigating the options can be challanging without prior familiarity with the topic. In my talk I will demonstrate the utility of the tool with several simple examples, list various libraries used by the TDA community, and dive a bit deeper into the methods to explain what the libraries implement and how to interpret and work with the outputs.


For specific tasks, topological data analysis can be a more rigid, straightforward and interpretable alternative to complicated machine learning pipelines. However, it is not so widely known and can be intimidating to get into when starting from zero. The goal of this talk is to introduce persistent homology, the main tool of topological data analysis, show concrete examples of how to apply it using available Python libraries, and reveal more details about what is going on "under the hood", which is important to correctly utilize the methods. I will start with several examples showcasing the possible uses of persistent homology and how to establish an analysis pipeline in Python. Then I will describe more about different variants within such a pipeline, like a choice of a filtered complex or vectorization, and their advantages and disadvantages.


Expected audience expertise: Domain:

None

Expected audience expertise: Python:

Intermediate

I am a researcher in Topological Data Analysis working both on theoretical mathematical aspects of it and applications. I have completed my PhD at ISTA in Austria and then moved to INRIA in France to apply the TDA methods to brain cancer data.