2025-04-23 –, Platinum3
NOTE: This talk focuses on Explainable AI, and building an interactive application with Shiny
We demonstrate a pure Python solution for exploring and understanding datasets using state-of-the-art machine learning and explainable AI techniques. Our application features a reactive dashboard built with Shiny, specifically designed for the daily work of data scientists.
The tool provides insights into data rapidly and effortlessly through an interactive dashboard. It facilitates data preprocessing, interactive exploratory data analysis, on-demand model training, evaluation, and interpretation. It further renders dynamic, annotated, and interactive visualizations. This allows to pinpoint critical elements and relations as root causes in a haystack of features, compressing a full day's work into under an hour.
Utilizing Plotly for dynamic visualizations, along with Scikit-learn, CatBoost, SHAP values, and MLflow for experiment tracking, married with shiny reactive dashboard, we facilitate quick and easy data preprocessing and exploration, model training and evaluation, together with explainable AI.
Problem Statement
Data scientists' daily work is characterized by a repetitive and time-consuming cycle of exploratory data analysis, preprocessing, model training, and feature identification. This ultimately means missing key insights into the data. Time spent on repetitive tasks detracts from critical work. We enable data scientists to focus on what matters.
Solution
We streamline the data analysis process to facilitate efficient dataset exploration and uncovering critical insights without time spent on coding. We empower users to seamlessly conduct data preprocessing, interactive exploratory analysis, on-demand model training, evaluation, and interpretation, reducing the time to understand a dataset to under an hour.
Demonstrator
Our pure Python application features a reactive dashboard. It allows users to engage with data—uploading, manipulating, creating interactive visualizations, performing on-demand model training and interpretation, while tracking results in MLflow. We demonstrate how to quickly deliver insights and identify root causes.
Architecture/Technical Implementation
Our application is built entirely in Python, utilizing the Shiny framework for a reactive dashboard. The backend uses Plotly, Scikit-learn, CatBoost, SHAP values, and MLflow. We highlight the core functionalities and development choices, emphasizing data preprocessing, model training, evaluation, and explainable AI features.
Novice
Expected audience expertise: Python:Novice
Trained as a mathematician, I quickly delved into the world of machine learning and computational statistics to learn about more about cancer dynamics in molecular biology and patient data.
I currently work as a Machine Learning Engineer in the domains of Med-Tech, optics, and semi-conductors at Carl Zeiss AG.