Juliacon 2024

Dude, where's my code?
07-11, 14:00–14:30 (Europe/Amsterdam), REPL (2, main stage)

Julia's potential is held back by significant usability problems in error messages, debugger operation and performance, and the need for a package, Revise, for live code updates. These diverse issues can be resolved by improving Julia's ability to relate source code written by the programmer to the various internal representations in Julia's compiler. In this talk we will explain this common thread and discuss ongoing work to improve compiler "provenance."


Julia has enormous strengths and its widespread adoption is well justified. However, there remain important usability barriers in areas such as interactive debugging and live code updates. We believe that resolving these barriers will be an important step towards extending Julia's reach.

In this presentation, we will explain how a single common thread underlies several of the most significant usability challenges. We will also discuss ongoing work to resolve these challenges.

Challenges

  • Efficient Julia development requires the use of a package, Revise.jl to updating already-compiled code in interactive sessions. There are several things that make Revise complex, but the most important is that to track line numbers in actively-edited files, Revise must selectively re-evaluate the package's source code. The Julia compiler doesn't track all the information required for this rather complex task and as a result it is the main source of Revise bugs and of regressions with each new Julia version.
  • Julia's interactive debugger has poor usability:
  • it often struggles to relate program state back to the source code as written by the programmer. This happens because the debugger/interpreter is targeted not at raw source code, but at an early-to-intermediate stage of the compilation pipeline. This "lowered code" is difficult to relate back to the original source code.
  • it exhibits poor performance, interpreting code at speeds that are often four orders of magnitude slower than compiled code. One way to speed it up --- likely by one to two orders of magnitude --- would be to target the interpreter at an even deeper layer of the compilation pipeline, after many of the most costly operations (method lookup, inlining, etc.) have been finished. But at these deeper layers, the challenge of mapping state back to the original source code is currently intractable.
  • Compiler error messages have low resolution; they refer to line numbers but don't include not the position within a line of the source code. Sometimes they refer to the wrong line of source code.
  • Debug information in compiled code has low resolution and stack traces may be ambiguous as to which function call was entered.

These issues arise because of a mismatch between the user's view of the code as a text file compared to the compiler's view of the code as a heavily processed intermediate representation (IR). In particular, the IR only knows about the source in an incomplete, low resolution way in terms of line numbers that are sometimes inaccurate.

Steps towards resolution

To resolve these issues we need a high precision map from the compiler's various IRs back to the user's source code --- we need to track the provenance of every construct in the IR. With this map we can always talk to the user in terms of their source code, solving many usability problems with error reporting and source-level debuggers. In addition, tools like Revise.jl can use this information to know which range of bytes in the source file define which method signatures.

For our purposes compilation may be broken into these stages:

  • Parsing
  • Macro expansion
  • Symbolic simplification (code lowering)
  • Type inference and high level optimization
  • Low level optimization and machine code generation

Provenance tracking during parsing has been improved in prior work as part of the JuliaSyntax package. To improve error message generation during lowering, improve the debuggers, and to simplify Revise, we additionally need to track provenance through the next two stages of macro expansion and lowering and this talk will focus on those areas. Solving problems such as backtrace ambiguities will require changes all the way through to machine code generation; we leave that to future work.

Macro expansion presents a particular challenge because user-written macros accept Julia's Expr data structure that can't support our goals of high precision provenance. To solve this problem we need new data structures and APIs for macro writers and these need to be opt-in to preserve compatibility. A working proof of concept is available at JuliaLang/JuliaSyntax.jl#329. Designing a completely new public API for macros also allows us to tackle some long standing macro usability issues including
* "automatic hygiene" (no more esc())
* issues with expansion of nested macros
* precise reporting of errors in the syntax supported by a macro
and we'll briefly touch on these points.

Reworking code lowering has fewer compatibility constraints but is a fairly large body of code. We'll discuss progress toward rewriting lowering with a particular emphasis on the data structures used. Some of the code analyses implemented in lowering are useful outside the compiler and we'll talk about how to make them accessible for tasks like tool assisted refactoring.

See also: GitHub

Tim Holy is the Alan A and Edith L Wolff Professor of Neuroscience at Washington University. Claire Foster is the creator of JuliaSyntax.

Claire is a long time enthusiastic user of Julia and enjoys contributing to various
packages across the open source ecosystem, Julia standard libraries and
compiler. She love hearing about people's fascinating technical computing
adventures of all types! Find her at https://github.com/c42f