PyCon JP 2024

Crafting Your Own Numpy: Do More in C++ and Make It Python
2024-09-27 , 4F Track3

Numpy is a powerful tool for scientific computing, but large-scale simulations need more. The high-performance computing calls for a custom array library that can be customized in C++ and interact with Python. A great way to do it is to use Pybind11. Just like Numpy, we allow arrays to use various data types while providing a single array type in Python. In C++, we use template generic programming for array code of specific data types. This talk shares the approaches to overcome the "dtype" challenge, so we can leverage the C++ typing information at compile-time for high performance and seamless Python integration.


While most people typically use Numpy directly, as it generally meets their needs well, there are specific scenarios where a custom array library is necessary. In high-performance computing, real-time data processing, and large-scale simulations, users may require a library with a Numpy-like interface that offers enhanced performance and greater flexibility.

We created an array library similar to Numpy, providing a comparable interface, high performance, and a high degree of customization. While Pybind11 allows us to bind C++ classes to Python objects easily, thus simply binding a C++ array to a Python object, challenges arise regarding the "dtype" of the array.

With Numpy, we can create arrays using syntax like np.array([...], dtype='float64') to specify the underlying data type as "float64". Despite the specified "dtype", all Numpy arrays are always of the type "numpy.ndarray".

In designing a Numpy-like array, we encountered an issue with this "dtype". Considering the C++ array templates with different data types, such as Array<T>, each data type results in a different template type, like Array<int> or Array<double>. However, we want to maintain a single array type on the Python side, so we cannot simply bind Array<T> to a Python object.

This talk will explain our approach to addressing the "dtype" challenge in building our own Numpy-like library by leveraging C++'s compile-time type knowledge.


Why did you choose this topic?:

Everyone knows Numpy, but few consider the principles behind it. In certain scenarios, a general tool may not be the best fit, and we may need to create our own solution. We aim to demonstrate how it is possible to build a Numpy-like array library ourselves, and to share the joy and elegance of creating something from scratch.

Knowledges and know-how the audience can get from your talk:

people who use Numpy, people who are interested in Python binding, people who use customized array in work

Prior knowledges speakers assume the audience to have:

knowledge of computer architecture, Numpy experience, C++ experience

Audience experiment:

Intermediate

Language of presentation:

English

Language of presentation material:

English

See also: Slides

Liu is a software engineer working in Tokyo. He uses the ID, @tigercosmos, in open-source communities. He likes photography, snowboarding, and traveling. His website: https://tigercosmos.xyz