PyCon JP 2024

ロケール設定が保存されました。pretalxでは英語のサポートが充実していると思っていますが、問題やエラーが発生した場合は、ぜひご連絡ください。

Crafting Your Own Numpy: Do More in C++ and Make It Python
2024/09/27 , 4F Track3

Numpy is a powerful tool for scientific computing, but large-scale simulations need more. The high-performance computing calls for a custom array library that can be customized in C++ and interact with Python. A great way to do it is to use Pybind11. Just like Numpy, we allow arrays to use various data types while providing a single array type in Python. In C++, we use template generic programming for array code of specific data types. This talk shares the approaches to overcome the "dtype" challenge, so we can leverage the C++ typing information at compile-time for high performance and seamless Python integration.


While most people typically use Numpy directly, as it generally meets their needs well, there are specific scenarios where a custom array library is necessary. In high-performance computing, real-time data processing, and large-scale simulations, users may require a library with a Numpy-like interface that offers enhanced performance and greater flexibility.

We created an array library similar to Numpy, providing a comparable interface, high performance, and a high degree of customization. While Pybind11 allows us to bind C++ classes to Python objects easily, thus simply binding a C++ array to a Python object, challenges arise regarding the "dtype" of the array.

With Numpy, we can create arrays using syntax like np.array([...], dtype='float64') to specify the underlying data type as "float64". Despite the specified "dtype", all Numpy arrays are always of the type "numpy.ndarray".

In designing a Numpy-like array, we encountered an issue with this "dtype". Considering the C++ array templates with different data types, such as Array<T>, each data type results in a different template type, like Array<int> or Array<double>. However, we want to maintain a single array type on the Python side, so we cannot simply bind Array<T> to a Python object.

This talk will explain our approach to addressing the "dtype" challenge in building our own Numpy-like library by leveraging C++'s compile-time type knowledge.


この題材を選んだ理由やきっかけ

Everyone knows Numpy, but few consider the principles behind it. In certain scenarios, a general tool may not be the best fit, and we may need to create our own solution. We aim to demonstrate how it is possible to build a Numpy-like array library ourselves, and to share the joy and elegance of creating something from scratch.

オーディエンスが持って帰れる具体的な知識やノウハウ

people who use Numpy, people who are interested in Python binding, people who use customized array in work

オーディエンスに求める前提知識

knowledge of computer architecture, Numpy experience, C++ experience

オーディエンスの経験レベル

Intermediate

発表の言語

English

発表資料の言語

English

See also: Slides

Liu is a software engineer working in Tokyo. He uses the ID, @tigercosmos, in open-source communities. He likes photography, snowboarding, and traveling. His website: https://tigercosmos.xyz