BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//pretalx.com//pyconde-pydata-2026//speaker//CD8CLV
BEGIN:VTIMEZONE
TZID:CET
BEGIN:STANDARD
DTSTART:20001029T040000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10
TZNAME:CET
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20000326T030000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=3
TZNAME:CEST
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:pretalx-pyconde-pydata-2026-FT7V39@pretalx.com
DTSTART;TZID=CET:20260416T113500
DTEND;TZID=CET:20260416T122000
DESCRIPTION:Python UDFs often become the slowest part of PySpark pipelines 
 because they run row-by-row and pay a high cost crossing the JVM↔Python 
 boundary. Spark’s Arrow-backed execution changes that cost model by movi
 ng data in columnar batches\, which can reduce overhead and enable efficie
 nt\, vectorized processing in Python.\n\nIn this session\, we’ll cover p
 ractical patterns for writing Arrow-friendly UDF logic and integrating it 
 with fast Python execution engines that operate on Arrow data. We’ll com
 pare common approaches—scalar UDFs\, Pandas UDFs\, Arrow-native UDFs\, a
 nd table-shaped Arrow transforms—then translate the results into a decis
 ion guide you can apply to production pipelines. Attendees will leave know
 ing when Arrow helps\, when it doesn’t\, and how to design UDF-heavy tra
 nsformations that scale.
DTSTAMP:20260412T141623Z
LOCATION:Europium [3rd Floor]
SUMMARY:From Row-Wise to Columnar: Speeding Up PySpark UDFs with Arrow and 
 Polars - Aimilios Tsouvelekakis
URL:https://pretalx.com/pyconde-pydata-2026/talk/FT7V39/
END:VEVENT
END:VCALENDAR
