2024-09-27 –, 4F Track3
Numpy is a powerful tool for scientific computing, but large-scale simulations need more. The high-performance computing calls for a custom array library that can be customized in C++ and interact with Python. A great way to do it is to use Pybind11. Just like Numpy, we allow arrays to use various data types while providing a single array type in Python. In C++, we use template generic programming for array code of specific data types. This talk shares the approaches to overcome the "dtype" challenge, so we can leverage the C++ typing information at compile-time for high performance and seamless Python integration.
While most people typically use Numpy directly, as it generally meets their needs well, there are specific scenarios where a custom array library is necessary. In high-performance computing, real-time data processing, and large-scale simulations, users may require a library with a Numpy-like interface that offers enhanced performance and greater flexibility.
We created an array library similar to Numpy, providing a comparable interface, high performance, and a high degree of customization. While Pybind11 allows us to bind C++ classes to Python objects easily, thus simply binding a C++ array to a Python object, challenges arise regarding the "dtype" of the array.
With Numpy, we can create arrays using syntax like np.array([...], dtype='float64')
to specify the underlying data type as "float64". Despite the specified "dtype", all Numpy arrays are always of the type "numpy.ndarray".
In designing a Numpy-like array, we encountered an issue with this "dtype". Considering the C++ array templates with different data types, such as Array<T>
, each data type results in a different template type, like Array<int>
or Array<double>
. However, we want to maintain a single array type on the Python side, so we cannot simply bind Array<T>
to a Python object.
This talk will explain our approach to addressing the "dtype" challenge in building our own Numpy-like library by leveraging C++'s compile-time type knowledge.
Everyone knows Numpy, but few consider the principles behind it. In certain scenarios, a general tool may not be the best fit, and we may need to create our own solution. We aim to demonstrate how it is possible to build a Numpy-like array library ourselves, and to share the joy and elegance of creating something from scratch.
Knowledges and know-how the audience can get from your talk:people who use Numpy, people who are interested in Python binding, people who use customized array in work
Prior knowledges speakers assume the audience to have:knowledge of computer architecture, Numpy experience, C++ experience
Audience experiment:Intermediate
Language of presentation:English
Language of presentation material:English
Liu is a software engineer working in Tokyo. He uses the ID, @tigercosmos, in open-source communities. He likes photography, snowboarding, and traveling. His website: https://tigercosmos.xyz