Developing an efficient large-scale machine learning pipeline to classify the millions of NASA TESS light curves in search for variable stars
The NASA Transiting Exoplanet Survey Satellite (TESS) is observing millions of stars each month. The vast amounts of light curves that are being generated from these photometric observations contain a wealth of information for asteroseismology, binarity and rotation studies. However, before these light curves can be used for stellar structure and evolution studies, we first need to be able to identify the relevant stars in this massive data set. The TESS Data for Asteroseismology (T’DA) working group therefore created an automated open-source machine learning pipeline to classify the millions of light curves delivered by TESS according to their stellar variability types. The pipeline is highly-parallelized and has been optimized for large-scale computing infrastructures. Furthermore, it has been developed in a modular way such that new state-of-the-art classifiers in search for other variability types can easily be added. In this contribution, we will present the pipeline and the structure of the machine learning classifiers, and explore how the pipeline can be used for other space missions and large ground-based observatories.