JuliaCon Local Paris 2025

FFTs on non-equispaced points with NonuniformFFTs.jl
2025-10-02 , Jean-Baptiste Say Amphitheater
Language: English

Nonuniform FFT (NUFFT) algorithms enable the efficient evaluation of Fourier series from non-equispaced points. They have a wide range of applications, from medical imaging to the simulation of physical systems. The NonuniformFFTs.jl package provides a highly optimised NUFFT implementation, compatible with CPUs and with various GPU platforms. I will discuss its unique features and the strategies that make it one of the fastest available GPU implementations for large-scale problems.


The NonuniformFFTs.jl package provides a fast and generic NUFFT implementation in Julia. It is can be parallelised on multiple CPUs using threads, while it can also be run on various GPU platforms thanks to the KernelAbstractions.jl package. It also implements the AbstractNFFTs.jl interface, which allows to easily switch between this and other available NUFFT packages in Julia.

This package was initially motivated by the need to compute Fourier series from real-valued data on non-equispaced points. Most existent NUFFT packages only support complex-valued data, leading to unnecessary computations and extra memory usage when data is purely real. This is the case in various applications including molecular dynamics and fluid dynamics simulations. A second motivation was to minimise the cost of successive transforms on different sets of non-uniform points, as these typically change over time in the simulation of physical systems.

NonuniformFFTs.jl has since evolved into one of the fastest NUFFT implementations available across different languages. On the CPU, it provides comparable performance to the popular FINUFFT library written in C++ (and wrapped in Julia by the FINUFFT.jl package). On GPUs, NonuniformFFTs.jl can currently be considerably (up to 4x) faster than the CUDA-only CuFINUFFT implementation for highly dense three-dimensional problems. This is achieved by taking advantage of GPU shared memory for fast memory accesses, while also avoiding atomic operations on this special kind of memory, which requires to rethink the way the different GPU threads concurrently compute and write to memory.

In this talk, I will quickly go through the mathematical definition of the NUFFT and its algorithm. I will then present the main technical difficulties in its practical implementation. Finally, I will discuss the techniques that are used in NonuniformFFTs.jl to accelerate the GPU implementation.

I am a research scientist at CNRS and at the LEGI laboratory in Grenoble, France. I work on modelling and simulation of various fluid dynamical systems, with a special interest in turbulent flows and in the use of fast and accurate numerical algorithms. In this context I also develop different tools in Julia. Over the years, I have contributed a small number of open-source Julia packages motivated by my research work. These include WriteVTK.jl, PencilArrays.jl, PencilFFTs.jl, BSplineKit.jl, NonuniformFFTs.jl and VortexPasta.jl.