JuliaCon 2023

An introduction to UnsupervisedClustering.jl package
07-26, 15:10–15:20 (US/Eastern), 32-124

We introduce UnsupervisedClustering.jl, a package that implements traditional unsupervised clustering algorithms and proposes advanced global optimization algorithms that allow escape from local optima.


In this talk, we will delve into the limitations of the traditional k-means algorithm, which often struggles to fit data that deviates from spherical distributions. In comparison, general Gaussian Mixture Models (GMMs) can fit richer structures but require estimating a quadratic number of parameters per cluster to represent the covariance matrices. Our research addresses these issues by proposing advanced global optimization algorithms that effectively combine with regularization strategies, leading to superior performance in cluster recovery compared to classical GMMs or k-means algorithms. Through a wide range of experiments on synthetic, we demonstrate the effectiveness of the proposed methods. We made available two Julia packages, UnsupervisedClustering.jl and RegularizedCovarianceMatrices.jl, that implement the proposed techniques for easy use and further research.

Raphael Sampaio graduated in Computer Engineering at PUC-Rio in 2015. During his undergraduate studies, he took classes through the academic exchange program at the University of Illinois at Urbana-Champaign, USA. In 2018, he received an MSc degree in Informatics with an emphasis on Optimization and Machine Learning, also at PUC-Rio. He joined PSR in 2016 and currently works on the software development of optimization models for hydrothermal dispatch under uncertainty with transmission constraints (SDDP).