2025-04-24 –, Hassium
Building and deploying scalable, reproducible machine learning pipelines can be challenging, especially when working with orchestration tools like Slurm or Kubernetes. In this talk, we demonstrate how to create an end-to-end ML pipeline for anomaly detection in International Space Station (ISS) telemetry data using only Python code.
We show how Kubeflow Pipelines, MLFlow, and other open-source tools enable the seamless orchestration of critical steps: distributed preprocessing with Dask, hyperparameter optimization with Katib, distributed training with PyTorch Operator, experiment tracking and monitoring with MLFlow, and scalable model serving with KServe. All these steps are integrated into a holistic Kubeflow pipeline.
By leveraging Kubeflow's Python SDK, we simplify the complexities of Kubernetes configurations while achieving scalable, maintainable, and reproducible pipelines. This session provides practical insights, real-world challenges, and best practices, demonstrating how Python-first workflows empower data scientists to focus on machine learning development rather than infrastructure.
Among popular open-source MLOps tools, Kubeflow stands out as a Kubernetes-native platform designed to support the entire ML lifecycle, from data preprocessing to model training, deployment, and retraining. Its modular structure enables the integration of a wide range of tools, making it a highly versatile framework for building scalable and reproducible ML workflows. Despite this, most existing resources focus on individual components rather than demonstrating how these can be orchestrated into a seamless, end-to-end pipeline.
In this talk, we present a practical case study that highlights the potential of Kubeflow in a real-world application. Specifically, we showcase how an automated ML pipeline for anomaly detection in International Space Station (ISS) telemetry data can be built and deployed using Kubeflow and other open-source MLOps tools. The dataset, originating from the Columbus module of the ISS, introduces unique challenges due to its complexity and high-dimensional nature, providing an excellent testbed for MLOps workflows.
What makes this approach unique?
Our workflow is built entirely in Python, leveraging Kubeflow’s Python SDK to orchestrate every stage of the pipeline. This eliminates the need for manual interaction with Kubernetes or container configurations, making the process accessible to ML engineers and data scientists without extensive DevOps expertise.
Key takeaways for attendees:
- Tool integration: Learn how to combine Dask for distributed preprocessing, Katib for hyperparameter optimization, PyTorch Operator for distributed training, MLFlow for experiment tracking and monitoring, and KServe for scalable model serving. These tools are orchestrated into a unified pipeline using Kubeflow Pipelines.
- Overcoming challenges: Gain insights into the technical hurdles faced during the implementation of this pipeline and discover the strategies and best practices that made it possible.
- Real-world impact: Understand how to apply MLOps principles to complex, real-world datasets and how these principles translate into scalable, maintainable, and reproducible workflows.
To ensure reproducibility and accessibility, the entire pipeline, including configurations and code, is publicly available in our GitHub repository here. Attendees will be able to replicate the workflow, adapt it to their own use cases, or extend it with additional features.
Who should attend?
This session is designed for data scientists, ML engineers, and Python enthusiasts who want to simplify the development of scalable ML pipelines. Whether you're new to Kubernetes or looking to streamline your MLOps workflows, this talk will provide actionable insights and tools to help you succeed.
Novice
Expected audience expertise: Python:Intermediate
Public link to supporting material, e.g. videos, Github, etc.:Christian has 12+ years of experience in the scientific application of python in academic and industry settings. He is one of the founders of prokube.ai where he builds an MLOps platform build around Kubeflow, MLFlow, Kubernetes, and a host of other open source tools. He also holds a PhD in physics, where he gained experiences in maintaining a distributed compute clusters. Christian is a maintainer of several OSS projects.
Henrik is an ML researcher at Helmut Schmidt University, specializing in the application of ML in cyber-physical systems. In his current project, he is developing an anomaly detection and diagnostic AI system for use with data from the International Space Station. Before returning to academia, Henrik spent five years as a data scientist in various consulting roles, where he had the opportunity to delve into a range of exciting datasets. During this time, Henrik became a Python and Kubeflow enthusiast.