EuroSciPy 2026

Lessons from Building a Large-Scale Engineering Simulation Data Processing Library
2026-07-21 , Room 1.19 (Ground Floor, Shannon)

Pre- and post-processing of large engineering simulation datasets in some scientific domains often demand performance that pure Python cannot yet deliver, and the general approach within the scientific Python ecosystem for overcoming the performance barrier is the combination of Python with compiled languages.

In this talk, I will share practical insights, experiences, and lessons drawn from the development and maintenance of PyDPF-Core, an open-source library that interacts with a C/C++ backend via a client-server architecture for the processing of large engineering simulation datasets. We will start by discussing the challenges of developing and maintaining such a library and then examine how each of these challenges were solved for PyDPF-Core.

The focus of the talk is on transferrable insights and my goal is for attendees to leave the talk with:
- Knowledge of architectural blueprints that can be applied to their own high-performance python projects, especially those involving the processing of large simulation datasets;
- Patterns for managing complexity in large scientific codebases; and
- Understanding the trade-offs that might be encountered.


Large simulation datasets arise in many scientific domains, from engineering and physics to computational biology and geoscience. While Python is the preferred language for many researchers and scientists due to its simplicity, interactivity, and rich ecosystem; high-performance workflows for pre-processing and post-processing large simulation datasets often requires the integration of Python with compiled languages.

In this talk, I will share practical insights, experiences, and lessons drawn from the development and maintenance of PyDPF-Core, an open-source library that interacts with a C/C++ backend via a client-server architecture for the processing of large engineering simulation datasets. Developing and maintaining such a library poses several interconnected challenges, especially when a client-server architecture is preferred between the backend and the Python interface.

These challenges manifest themselves in various ways, such as: making the underlying data model sufficiently abstract and self-describing to maximize applicability across multiple scientific domains; ensuring the data processing pipeline is composable and modular; decoupling the evolution of the server from that of the client; limiting data transfers between the client and the server; managing growing server APIs on the client side; visualizing the processed data; and ensuring up-to-date library documentation alongside development efforts among other things.

This talk will touch how each of these challenges have been solved by briefly examining the following:
- Modelling data that is "aware" of what it describes
- The hourglass interface pattern for ensuring that the Python client and C/C++ server can independently evolve
- Operator-based pipelines for composable data processing workflows
- Techniques for reducing client-server data transfers
- Templating and automated code generation techniques for API scalability
- Leveraging third party python libraries for data visualization
- Sustainable CI/CD practices for testing and library documentation generation.

Pre-requisites:
- No deep C/C++ knowledge required, the talk is not intended to be technical
- Basic understanding of client-server concepts

References:
- PyDPF-Core repository: https://github.com/ansys/pydpf-core
- Documentation: https://dpf.docs.pyansys.com/
- Hourglass Interface pattern: CppCon talk on stable ABI boundaries


Expected audience expertise: Domain: none Expected audience expertise: Python: some Supporting material: Supporting material Project homepage or Git: Project homepage or Git Your relationship with the presented work/project: Active contributor, Maintainer of the presented library/project

I hold a Joint Masters Degree in Advanced Solid Mechanics from the National Technical University of Athens, Università della Calabria, École Centrale de Lille, and Université de Lille.

I currently work as an R&D Software Engineer for Synopsys and I currently contribute to multiple projects within the PyAnsys OSS ecosystem.