JuliaCon 2023

Analyzing Large Graphs with QuasiStableColors.jl
07-27, 10:30–11:00 (US/Eastern), 32-123

Graphs (aka networks) are a key part of the data science pipeline at many organizations. However, scalability is the most frequently reported limitation by graph analysts. I introduce QuasiStableColors.jl, a Julia library for approximate graph analysis. On tasks such as ranking node importance (centrality) it enables an over 10x speedup while introducing less than 5% error. In this talk, I will demonstrate how to use this novel graph compression for your own workloads.

In this talk I will give an overview of the new Julia package QuasiStableColors.jl. This package allows for accelerating graph analytics by compressing the underlying data. This results in big gains for supported tasks, like computing centrality or maximum-flow. Our approximation allows a speedup of >10x on a variety of real-life benchmark graphs while introducing a negligible relative error, often less than 5%.

The compression scheme is a novel one recently introduced in the database research community. It will be presented at the Very Large Databases 2023 conference under the title "Quasi-stable Coloring for Graph Compression: Approximating Max-Flow, Linear Programs, and Centrality" by Moe Kayali and Dan Suciu. paper link This package is the reference implementation.

I will focus on the practical usage of this library for data scientists rather than the theoretical aspects of the research work. By the end of the talk the audience will have understood the basics of this compression method and have the knowledge to start using it for their own graph analysis applications.

This package follows software development best practices, with thorough documentation, a tutorial, tests, and CI/CD. I intend to maintain this package into the future. I will also briefly discuss how using Julia and following these practices helped in achieving my research goals.

This material is based upon work supported by the National Science Foundation under Grant No. NSF-BSF 2109922 and NSF IIS 1907997.

I'm a PhD student in the database group at the University of Washington, Seattle. I work on discovering new techniques to accelerate data management and make its results more trustworthy.