BHAD: Explainable unsupervised anomaly detection using Bayesian histograms PyCon DE & PyData Berlin 2023

BHAD: Explainable unsupervised anomaly detection using Bayesian histograms
.ical

2023-04-17 16:20–16:50, B05-B06

The detection of outliers or anomalous data patterns is one of the most prominent machine learning use cases in industrial applications. I present a Bayesian histogram anomaly detector (BHAD), where the number of bins is treated as an additional unknown model parameter with an assigned prior distribution. BHAD scales linearly with the sample size and enables a straightforward explanation of individual scores, which makes it very suitable for industrial applications when model interpretability is crucial. I study the predictive performance of the proposed BHAD algorithm with various SoA anomaly detection approaches using simulated data and also using popular benchmark datasets for outlier detection. The reported results indicate that BHAD has very competitive predictive accuracy
compared to other more complex and computationally more expensive algorithms, while being explainable and fast.

I present an unsupervised and explainable Bayesian anomaly detection algorithm. For this I consider the posterior predictive distribution of a Categorical-Dirichlet distribution and use it to construct a Bayesian histogram-based anomaly detector (BHAD).
BHAD scales linearly with the size of the data and allows a direct explanation of individual anomaly scores due to its simple linear functional form, which makes it very suitable for practical applications when model interpretability is crucial. Based on simulated data and also using popular benchmark datasets for outlier detetcion I analyze the predictive performances of the used candidate models and also compare them with outlier ensemble approaches. The results suggest that the proposed BHAD model has very competitive performance compared to other more complex models like variational autoencoders, in fact it is among the best performing candidates while offering individual and global model explainability.

Expected audience expertise: Domain:

Advanced

Expected audience expertise: Python:

Intermediate

Abstract as a tweet:

We present a Bayesian histogram anomaly detector (BHAD). BHAD scales linearly with the size of the data and allows a direct explanation of individual anomaly scores due to its simple linear form

Public link to supporting material:

https://pypi.org/project/bhad/

Alexander Vosseler

Alexander Vosseler works as Principal Data Scientist at the Advanced Analytics Claims team of Allianz Germany - Chief Data Office in Munich. He has many years of industry experience as a data scientist and holds a PhD in Statistics with majors in Bayesian and Computational statistics. During his industry career as a data scientist he worked for companies such as Siemens AG, Allianz Global Corporate & Specialty SE and Allianz Germany.

His current methodological interests lies in probabilistic machine learning and uncertainty quantification with applications in time series methods, anomaly detection and NLP. In his spare time he likes to go jogging and play the drums.

BHAD: Explainable unsupervised anomaly detection using Bayesian histograms .ical 2023-04-17 16:20–16:50, B05-B06

BHAD: Explainable unsupervised anomaly detection using Bayesian histograms
.ical

2023-04-17 16:20–16:50, B05-B06