Scaling Python: An End-to-End ML Pipeline for ISS Anomaly Detection with Kubeflow PyCon DE & PyData 2025

Scaling Python: An End-to-End ML Pipeline for ISS Anomaly Detection with Kubeflow
.ical

2025-04-24 10:15–10:45, Hassium

Building and deploying scalable, reproducible machine learning pipelines can be challenging, especially when working with orchestration tools like Slurm or Kubernetes. In this talk, we demonstrate how to create an end-to-end ML pipeline for anomaly detection in International Space Station (ISS) telemetry data using only Python code.

We show how Kubeflow Pipelines, MLFlow, and other open-source tools enable the seamless orchestration of critical steps: distributed preprocessing with Dask, hyperparameter optimization with Katib, distributed training with PyTorch Operator, experiment tracking and monitoring with MLFlow, and scalable model serving with KServe. All these steps are integrated into a holistic Kubeflow pipeline.

By leveraging Kubeflow's Python SDK, we simplify the complexities of Kubernetes configurations while achieving scalable, maintainable, and reproducible pipelines. This session provides practical insights, real-world challenges, and best practices, demonstrating how Python-first workflows empower data scientists to focus on machine learning development rather than infrastructure.

Among popular open-source MLOps tools, Kubeflow stands out as a Kubernetes-native platform designed to support the entire ML lifecycle, from data preprocessing to model training, deployment, and retraining. Its modular structure enables the integration of a wide range of tools, making it a highly versatile framework for building scalable and reproducible ML workflows. Despite this, most existing resources focus on individual components rather than demonstrating how these can be orchestrated into a seamless, end-to-end pipeline.

In this talk, we present a practical case study that highlights the potential of Kubeflow in a real-world application. Specifically, we showcase how an automated ML pipeline for anomaly detection in International Space Station (ISS) telemetry data can be built and deployed using Kubeflow and other open-source MLOps tools. The dataset, originating from the Columbus module of the ISS, introduces unique challenges due to its complexity and high-dimensional nature, providing an excellent testbed for MLOps workflows.

What makes this approach unique?

Our workflow is built entirely in Python, leveraging Kubeflow’s Python SDK to orchestrate every stage of the pipeline. This eliminates the need for manual interaction with Kubernetes or container configurations, making the process accessible to ML engineers and data scientists without extensive DevOps expertise.

Key takeaways for attendees:

Tool integration: Learn how to combine Dask for distributed preprocessing, Katib for hyperparameter optimization, PyTorch Operator for distributed training, MLFlow for experiment tracking and monitoring, and KServe for scalable model serving. These tools are orchestrated into a unified pipeline using Kubeflow Pipelines.
Overcoming challenges: Gain insights into the technical hurdles faced during the implementation of this pipeline and discover the strategies and best practices that made it possible.
Real-world impact: Understand how to apply MLOps principles to complex, real-world datasets and how these principles translate into scalable, maintainable, and reproducible workflows.

To ensure reproducibility and accessibility, the entire pipeline, including configurations and code, is publicly available in our GitHub repository here. Attendees will be able to replicate the workflow, adapt it to their own use cases, or extend it with additional features.

Who should attend?

This session is designed for data scientists, ML engineers, and Python enthusiasts who want to simplify the development of scalable ML pipelines. Whether you're new to Kubernetes or looking to streamline your MLOps workflows, this talk will provide actionable insights and tools to help you succeed.

Expected audience expertise: Domain:

Novice

Expected audience expertise: Python:

Intermediate

Public link to supporting material, e.g. videos, Github, etc.:

https://github.com/hsteude/code-ml4cps-paper

Scaling Python: An End-to-End ML Pipeline for ISS Anomaly Detection with Kubeflow .ical 2025-04-24 10:15–10:45, Hassium

What makes this approach unique?

Key takeaways for attendees:

Who should attend?

Scaling Python: An End-to-End ML Pipeline for ISS Anomaly Detection with Kubeflow
.ical

2025-04-24 10:15–10:45, Hassium