Juliacon 2024

Kezdi.jl: A data analysis package for economists
07-12, 15:00–15:30 (Europe/Amsterdam), REPL (2, main stage)

This talk introduces Kezdi.jl, a Julia package designed to ease the transition for Stata users into Julia's data analysis ecosystem. Recognizing Stata's limitations with big data and its cost, Kezdi.jl offers Stata-like syntax for data wrangling, exploratory analysis, and regression in Julia, leveraging the strengths of DataFrames.jl and Tidier.jl. The presentation will discuss the challenges Stata users face in adopting open source alternatives, and demonstrate Kezdi.jl's capabilities.


Stata (registered TM) is widely used by economists for data wrangling, exploratory analysis and regressions. Its scripting language is easy to learn and provides a user friendly syntax for these tasks. There is a large and active community of Stata users, discussing problems and developing custom packages.

The cost of Stata and its performance limitations with big data encourage analysts to seek open source alternatives. Switching to popular R, Python or Julia solutions, however, requires very different syntax and, more importantly, mental models for analysts.

The aim of our package, Kezdi.jl, is to help Stata users adopt Julia as their main language for data analysis. It implements the key data wrangling, exploratory analysis and regression operations in Julia, using Stata-like syntax. Inspired by and building on DataFrames.jl and Tidier.jl, it provides a user interface familiar for Stata users, while retaining the flexibility and high performance of Julia.

The talk will illustrate the problem and the gap between existing solutions and user needs. We will showcase the Kezdi.jl implementation of the most widely used data analysis commands and discuss further development plans.

See also:

Miklós teaches reproducible coding practices to economists to help them maximize their scientific impact. He believes in the command line, plain text, and that every problem can be solved with the right combination of Stata, Python, Julia, git, and make. He is Professor of Economics at Central European University and the Data Editor of the Review of Economic Studies, a leading scientific journal.