2019-07-24, 11:50–12:00, Elm A
Julia is increasingly being recognized as one of the big three data science programming languages alongside R and Python. However, Julia’s data ecosystem has had less time to mature when compared to R’s or Python’s. Hence it’s not surprising that some data operations in Julia are slower than their counterparts in R and Python, e.g. group-by.
This talk discusses how under-utilized fast sorting methods, such as radix sort, can be used to speed up group-by operations in Julia so that Julia’s group-by operations can match (or even surpass) the speed of optimized C-based group-by implementations in R and Python.