Cleaning messy data with Julia and Gen
07-23, 15:45–16:15 (US/Eastern), Elm A

Julia is home to a growing ecosystem of probabilistic programming languages—but how can we put them to use for practical, everyday tasks? In this talk, we'll discuss our ongoing effort to automate common-sense data cleaning by building a declarative modeling language for messy datasets on top of Gen.


Julia is home to a growing ecosystem of probabilistic programming languages—but how can we put them to use for practical, everyday tasks? In this talk, we'll discuss our ongoing effort to automate common-sense data cleaning by building a declarative dataset description language on top of Gen. Users of the language can encode domain knowledge about their dataset and the ways in which it might be unclean in short, declarative probabilistic scripts, which are compiled to Gen programs that infer locations of probable errors, impute missing values, and propose likely corrections in tabular data.

Alex is a first-year PhD student at MIT's Probabilistic Computing Project. He's interested in building tools that automate the tedious calculations associated with approximate Bayesian inference, and making probabilistic inference algorithms accessible to software engineers solving practical, everyday problems.