Lessons learned from comparing Numba-CUDA and C-CUDA
09-04, 16:45–17:00 (UTC), Track 1 (Mitxelena)

We compared the performance of GPU-Applications written in C-CUDA and Numba-CUDA. By analyzing the GPU assembly code, we learned about the reasons for the differences. This helped us to optimize our codes written in NUMBA-CUDA and NUMBA itself.

Numba allows the development of GPU code in Python style. When a Python script using Numba is executed, the code is compiled just-in-time (JIT) using the LLVM framework. Using Python for GPU programming can mean a considerable simplification in the development of parallel applications compared to C and C-CUDA.

Python, however, has to live with the prejudice of low performance, especially in HighPerformance Computing.
We wanted to get to the bottom of whether this is really true and where these differences come from. For this reason, we first analyzed the performance of typical micro benchmarks used in HPC. By analyzing the assembly codes, we learned a lot about the difference between codes produced by C-CUDA and NUMBA-CUDA. Some of these insights have helped us to improve the performance of our application - and also of Numba-CUDA. With a few tricks it is possible to achieve very good performance with our Numba-Codes, which are very close - or sometimes even better than the C-CUDA versions.

Project Homepage / Git Abstract as a tweet

NUMBA-CUDA: How to write efficient CUDA-Code in Python

Python Skill Level


Domain Expertise



Big Data, Parallel computing / HPC

Lena Oden recently became a Junior Professor for Computer Architecture at the FernUniversität Hagen. Before that, she worked as a postdoctoral researcher at the Forschungszentrum Jülich and at Argonne National Laboratory in the USA. She received her PhD in Computer Science from the Ruprecht-Karls-Universität Heidelberg and a Diploma in Electrical Engineering from RWTH Aachen. During her PhD, she worked at the Fraunhofer Institute for Industrial Mathematics. Her main research areas are Computer Architectures and Runtime Systems for HPC.
Her interest in Python started when she worked with people from other scientific areas. She likes the simplicity of Python, and started to use it as her main programming language for teaching parallel programming.
She is interested in improving the performance of Python, to make it more usable in HPC.