TimerOutputs.jl - a cheap and cheerful instrumenting profiler
2019-07-25, 17:15–17:25, Elm B

TimerOutputs.jl is a tool that lets you annotate sections in your code so that after execution, a nicely formatted table with information about how much time and allocations were spent in each section can be shown.

Julia has for a long time come with a built-in profiler, which is available as the Profile standard library. This profiler is a sampling profiler which means that it collects snapshots of the stacktrace at regular intervals. After execution of the profiled code, the number of traces that were collected in each function can be displayed which gives a good estimate where the code is spending its time. While sampling profiling is a very useful and cheap way of profiling code, there are a few drawbacks:
Since only samples of the execution is collected, exact information of e.g. the time spent in functions or the number of calls to a function is not available.
Not being able to annotate what parts of the code you care about mean that the output from the profiler is sometimes noisy and hard to interpret for non-experts.
Allocations can not be attributed to certain sections of the code.

TimerOutputs is a Julia package that provides an instrumenting profiler which requires you to annotate your code with labeled sections. While the code is running, TimerOutputs records timing and allocation data in these sections and can then print a summary back to the user. This can sometimes make it easier to get a high level overview of the performance characteristics of the code.

An example of the output from TimerOutputs is shown below.

                                        Time                   Allocations
                                ──────────────────────   ───────────────────────
        Tot / % measured:            6.89s / 97.8%           5.20GiB / 85.0%

 Section                ncalls     time   %tot     avg     alloc   %tot      avg
 assemble                    6    3.27s  48.6%   545ms   3.65GiB  82.7%   624MiB
   inner assemble         240k    1.92s  28.4%  7.98μs   3.14GiB  71.1%  13.7KiB
 linear solve                5    2.73s  40.5%   546ms    108MiB  2.39%  21.6MiB
 create sparse matrix        6    658ms  9.77%   110ms    662MiB  14.6%   110MiB
 export                      1   78.4ms  1.16%  78.4ms   13.1MiB  0.29%  13.1MiB

The timing and allocations for each section is presented and the call graph (e.g. that assemble calls inner assemble) is shown by indentation.

This lightning talk gives a short summary of the implementation, syntax and use cases of TimerOutputs.