2024-07-10 –, Function (4.1)
A study of a case of Julia's use for data analysis in the nuclear physics field
will be presented. Binary data, coming from a nuclear fusion-fission reaction
study, including signals from about 200 detectors, amounted to 20-30
TB. Julia's capability of seamless calling C-libraries, multi-processing
schemes and efficient data processing allowed us to analyze this dataset with an
ordinary off-the-shelf computer in a reasonable time, without the need to employ
HPC.
In the field of physics, it is high-energy particle physics that is commonly
associated with top high-performance computing. CERN and LHC are one of the
staple marks of this branch. But high performance is needed in other areas as
well and is not limited to supercomputers only. The older brother of particle
physics, nuclear physics, takes advantage of recent developments as well. If we
take a look at state-of-the-art computers of the mid 90's today's laptops have
similar computational performance. What needed a whole cluster of computers 30
years ago, now can be achieved with a single cheap machine. This, along with the
development of detectors, detector setups and acquisition systems, allows for
planning larger experiments, while keeping the similar size of scientific groups
counting from a few persons up to tens. But there is a catch here: the
programming of the HPC machines is and was performed by whole devoted teams. In
the nuclear physics field, often the whole data analysis is done by a single
scientist or a very small group. Nevertheless, the needs are similar, including,
first of all, a programming language capable of fast data processing to analyze
relatively large volumes of data. On the other hand, another set of tools is
needed for the data analysis, including common operations like plotting,
histogramming, fitting, or data manipulation. And this is very often,
accomplished in an exploratory data analysis approach, where the final method or
expected results are not clearly stated. The typical way to achieve that is to
use two languages, one for efficient data processing (C/C++), and preliminary
analysis, and the second for convenient, dynamic analysis and visualization
(Python or specialized programs).
Julia ideally fits into this scheme with its high performance, while keeping
dynamic and simplicity, effectively removing the need to master two languages. In a
one-person analysis scenario, this is especially welcome. An important feature,
needed to unlock the full potential of the modern, multi-core machines, and
accomplish efficient data processing, are easily accessible multi-processing
capabilities present in Julia. The whole analysis code can be consistently
written in one language, from the raw file interpretation to the publication
figures, which makes it easy to maintain, reuse and train new members of the
group.
I'm a professor at the University of Warsaw, Poland, working in the field of
experimental nuclear physics. Since 2020 Julia has been my main programming
tool, I use it for all kinds of work, from simple calculations, through the
preparation of figures, communication with laboratory equipment and detectors,
to analysis of relatively large datasets.