Aimilios Tsouvelekakis
Aimilios works as a software engineer for Frontiers Media SA. With a passion for solving technical challenges and a commitment to sharing his knowledge in different aspects of computer engineering, including but not limited to ETL pipelines and optimization, improving the in-house tooling, contributing to different architectural decisions, he makes a valuable contribution to his team's objectives. Prior to joining Frontiers, he gained experience working as a Devops engineer at CERN, where he actively contributed in projects related to cloud computing and disaster recovery, automation, observability and databases. He holds a MEng in Electrical and Computer Engineering from National Technical University of Athens.
Session
Python UDFs often become the slowest part of PySpark pipelines because they run row-by-row and pay a high cost crossing the JVM↔Python boundary. Spark’s Arrow-backed execution changes that cost model by moving data in columnar batches, which can reduce overhead and enable efficient, vectorized processing in Python.
In this session, we’ll cover practical patterns for writing Arrow-friendly UDF logic and integrating it with fast Python execution engines that operate on Arrow data. We’ll compare common approaches—scalar UDFs, Pandas UDFs, Arrow-native UDFs, and table-shaped Arrow transforms—then translate the results into a decision guide you can apply to production pipelines. Attendees will leave knowing when Arrow helps, when it doesn’t, and how to design UDF-heavy transformations that scale.