PyCon DE & PyData 2025

Henrik Sebastian Steude

Henrik is an ML researcher at Helmut Schmidt University, specializing in the application of ML in cyber-physical systems. In his current project, he is developing an anomaly detection and diagnostic AI system for use with data from the International Space Station. Before returning to academia, Henrik spent five years as a data scientist in various consulting roles, where he had the opportunity to delve into a range of exciting datasets. During this time, Henrik became a Python and Kubeflow enthusiast.


LinkedIn

www.linkedin.com/in/henrik-sebastian-steude

Github

https://github.com/hsteude


Session

04-24
10:15
30min
Scaling Python: An End-to-End ML Pipeline for ISS Anomaly Detection with Kubeflow
Christian Geier, Henrik Sebastian Steude

Building and deploying scalable, reproducible machine learning pipelines can be challenging, especially when working with orchestration tools like Slurm or Kubernetes. In this talk, we demonstrate how to create an end-to-end ML pipeline for anomaly detection in International Space Station (ISS) telemetry data using only Python code.

We show how Kubeflow Pipelines, MLFlow, and other open-source tools enable the seamless orchestration of critical steps: distributed preprocessing with Dask, hyperparameter optimization with Katib, distributed training with PyTorch Operator, experiment tracking and monitoring with MLFlow, and scalable model serving with KServe. All these steps are integrated into a holistic Kubeflow pipeline.

By leveraging Kubeflow's Python SDK, we simplify the complexities of Kubernetes configurations while achieving scalable, maintainable, and reproducible pipelines. This session provides practical insights, real-world challenges, and best practices, demonstrating how Python-first workflows empower data scientists to focus on machine learning development rather than infrastructure.

PyCon: MLOps & DevOps
Hassium