JuliaCon 2022 (Times are UTC)

Random utility models with DiscreteChoiceModels.jl
07-29, 17:50–18:00 (UTC), Purple

Random utility models are widely used in social science. While most statistical software, including Julia, has some facilities for estimating multinomial logit models, more advanced models such as mixed logit models and models with different utility functions for different outcomes generally require specific choice modeling software. This presentation describes a new package, DiscreteChoiceModels.jl, which provides flexible and high-performance multinomial and forthcoming mixed logit estimation.


Random utility models are ubiquitous in fields including economics, transportation, and marketing [1]. Estimation of simple multinomial logit models is available in many statistical packages, including Julia via Econometrics.jl [2], more advanced choice models are generally fit with choice-model-specific packages e.g., [3], [4]. These packages allow more-flexible utility specifications by allowing utility function definitions to vary over outcomes, and by allowing additional forms of random utility model, such as the mixed logit model which allows random parameter variation [5].

DiscreteChoiceModels.jl provides such a package for Julia. It has an intuitive syntax for specifying discrete-choice models, allowing users to directly write out utility functions. For instance, the code below specifies the Swissmetro example mode-choice mode distributed with Biogeme [3]:

multinomial_logit(
    @utility(begin
        1 ~ αtrain + βtravel_time * TRAIN_TT / 100 + βcost * (TRAIN_CO * (GA == 0)) / 100
        2 ~ αswissmetro + βtravel_time * SM_TT / 100 + βcost * SM_CO * (GA == 0) / 100
        3 ~ αcar + βtravel_time * CAR_TT / 100 + βcost * CAR_CO / 100
    end),
    :CHOICE,
    data,
    availability=[
        1 => :avtr,
        2 => :avsm,
        3 => :avcar,
    ]
)

Within the utility function specification (@utility), the first three lines specify the utility functions for each of the three modes specified by the CHOICE variable: train, car, and the hypothetical Swissmetro. Any variable starting with α or β is treated as a coefficient to be estimated, while other variables are assumed to be data columns. The remainder of the model specification indicates that the choice is indicated by the variable CHOICE, what data to use, and, optionally, what columns indicate availability for each alternative.

Mixed logit models

Support for mixed logit models is under development. Mixed logit models will specify random coefficients as distributions from Distributions.jl [6]. For instance, to specify that αtrain should be normally distributed with mean 0 and standard deviation 1 as starting values, you would add

αtrain = Normal(0, exp(0))

with the exponent indicating that the value will be exponentiated to ensure that the standard deviation will always be positive.

Performance

Julia is designed for high-performance computing, so a major goal of DiscreteChoiceModels.jl is to estimate models more quickly than other modeling packages. To that end, two multinomial logit models were developed and benchmarked using three packages—DiscreteChoiceModels.jl, Biogeme [3], and Apollo [4], using default settings for all three packages. The first model is the Swissmetro example from Biogeme, with 6,768 observations, 3 alternatives, and 4 free parameters. The second is a vehicle ownership model using the 2017 US National Household Travel Survey, with 129,696 observations, 5 alternatives, and 35 free parameters. All runtimes are the median of 10 runs, and executed serially on a quad-core Intel i7 with 16GB of RAM, running Debian 11.1. DiscreteChoiceModels.jl outperforms other packages when used with a DataFrame, while using Dagger introduces distributed computing overhead on a single machine.

Model DiscreteChoiceModels.jl: DataFrame DiscreteChoiceModels.jl: Dagger Biogeme Apollo


Swissmetro 188ms 2047ms 252ms 824ms
Vehicle ownership 35.1s 46.9s 163.4s 227.2s

References

[1] M. Ben-Akiva and S. R. Lerman, Discrete choice analysis: Theory and application to travel demand. MIT Press, 1985.

[2] J. B. S. Calderón, “Econometrics.jl,” Proc JuliaCon Conf, doi: 10.21105/jcon.00038.

[3] M. Bierlaire, “A short introduction to PandasBiogeme,” Ecole Poltechnique Fédérale de Lausanne, Lausanne, TRANSP-OR 200605, Jun. 2020. Available: https://transp-or.epfl.ch/documents/technicalReports/Bier20.pdf

[4] S. Hess and D. Palma, “Apollo: A flexible, powerful and customisable freeware package for choice model estimation and application,” J Choice Model, doi: 10.1016/j.jocm.2019.100170.

[5] K. Train, Discrete Choice Methods with Simulation. Cambridge, UK: Cambridge University Press, 2009.

[6] M. Besançon et al., “Distributions.jl: Definition and Modeling of Probability Distributions in the JuliaStats Ecosystem,” J Stat Soft, doi: 10.18637/jss.v098.i16.

Matthew Bhagat-Conway is an Assistant Professor in the Department of City and Regional Planning. His research interests are in travel behavior, urban transportation, and statistical methods for transportation data analysis. He is also jointly appointed in the Odum Institute for Research in the Social Sciences, where he is available to assist researchers with statistics and data analysis.

Dr. Bhagat-Conway has a PhD and MA in Geography from Arizona State University, and a BA in Geography from the University of California, Santa Barbara. Prior to graduate school, he was a software developer and project manager for Conveyal, a public transport planning consulting firm, and a fellow in the Data Science for Social Good fellowship at the University of Chicago.