JuliaCon 2025

Monitor & Modify Values in realtime, Debug 1000 Programs At Once
2025-07-25 , Main Room 1 (Main stage)

Monitor.jl is a tool that allows you to monitor and modify Julia values you choose in real-time, even across thousands of jobs. Combined with ad hoc UIs / dashboards, this significantly speeds up debugging, especially in complex scenarios like cloud-based simulations.

The talk will discuss and demo

  • Monitor.jl’s capabilities
  • how it works
  • connecting to single or multiple Julia sessions (map/reduce, etc.)
  • diagnostic exploratory programming
  • monitoring and rollups
  • front ends

Monitor.jl is a simple monitoring, evaluation, and communication system with pluggable transports, including using REDIS streams for pubsub.

https://github.com/Leisure-tools/Monitor.jl

Imagine many jobs simulating Formula 1 engines where some overheat unexpectedly, which the developer can tell from log messages but without a clear idea of why it happens. So far, the developer has had to stop, change, and restart the jobs and then wait for log messages with more information three times in a row. Each cycle takes around 15 minutes, so the initial run plus 3 restarts has cost almost an hour of work.

Then the developer decides to use Monitor.jl with these steps:

  1. Modify the job's start script to use a different main program that
    a. Loads Monitor.jl,
    b. Connect to a REDIS server for communication,
    c. Calls original main program within a handler that waits upon crashing instead of terminating,
    d. There should normally not be a need to change the job’s original main program, only the start script, although certain modifications might be needed to make data accessible for monitoring.
  2. Rerun the jobs and start the REDIS server.
  3. Connect a notebook to REDIS.
  4. Use the notebook to:
    a. Publish code to all the jobs that defines a function to check for overheating and then changes the monitor (below) to send updates to a stream named “overheating”.
    b. Publish a monitor to all the jobs that uses the new function.
    c. Watch the “overheating” stream for incoming monitors on problematic jobs.
    d. Use monitor UI views in the notebook to explore data in problematic jobs, creating additional monitors as needed.

After the developer fixes the problem, they decide to add another monitor block to each job which reaches into the simulation data and produces dynamically updating trend information. They then use the notebook to display a roll-up for all the jobs. Without needing to restart jobs or change code, the developer is able to add real time monitoring to get more insight on the activities of their jobs.

More technical information is available in the repository readme.