PyCon JP 2024

Crafting Your Own Numpy: Do More in C++ and Make It Python
2024-09-27 , 4F Track3

Numpy is a powerful tool for scientific computing, but large-scale simulations need more. The high-performance computing calls for a custom array library that can be customized in C++ and interact with Python. A great way to do it is to use Pybind11. Just like Numpy, we allow arrays to use various data types while providing a single array type in Python. In C++, we use template generic programming for array code of specific data types. This talk shares the approaches to overcome the "dtype" challenge, so we can leverage the C++ typing information at compile-time for high performance and seamless Python integration.


While most people typically use Numpy directly, as it generally meets their needs well, there are specific scenarios where a custom array library is necessary. In high-performance computing, real-time data processing, and large-scale simulations, users may require a library with a Numpy-like interface that offers enhanced performance and greater flexibility.

We created an array library similar to Numpy, providing a comparable interface, high performance, and a high degree of customization. While Pybind11 allows us to bind C++ classes to Python objects easily, thus simply binding a C++ array to a Python object, challenges arise regarding the "dtype" of the array.

With Numpy, we can create arrays using syntax like np.array([...], dtype='float64') to specify the underlying data type as "float64". Despite the specified "dtype", all Numpy arrays are always of the type "numpy.ndarray".

In designing a Numpy-like array, we encountered an issue with this "dtype". Considering the C++ array templates with different data types, such as Array<T>, each data type results in a different template type, like Array<int> or Array<double>. However, we want to maintain a single array type on the Python side, so we cannot simply bind Array<T> to a Python object.

This talk will explain our approach to addressing the "dtype" challenge in building our own Numpy-like library by leveraging C++'s compile-time type knowledge.


Why did you choose this topic?

Everyone knows Numpy, but few consider the principles behind it. In certain scenarios, a general tool may not be the best fit, and we may need to create our own solution. We aim to demonstrate how it is possible to build a Numpy-like array library ourselves, and to share the joy and elegance of creating something from scratch.

Knowledges and know-how the audience can get from your talk

people who use Numpy, people who are interested in Python binding, people who use customized array in work

Prior knowledges speakers assume the audience to have

knowledge of computer architecture, Numpy experience, C++ experience

Audience experiment

Intermediate

Language of presentation

English

Language of presentation material

English

See also: Slides

Liu is a software engineer working in Tokyo. He uses the ID, @tigercosmos, in open-source communities. He likes photography, snowboarding, and traveling. His website: https://tigercosmos.xyz