BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//pretalx.com//pyconde-pydata-2026//speaker//JNR9GB
BEGIN:VTIMEZONE
TZID:CET
BEGIN:STANDARD
DTSTART:20001029T040000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10
TZNAME:CET
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20000326T030000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=3
TZNAME:CEST
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:pretalx-pyconde-pydata-2026-TPNBRN@pretalx.com
DTSTART;TZID=CET:20260416T150500
DTEND;TZID=CET:20260416T153500
DESCRIPTION:"Zero-copy" data transfer promises free communication between S
 park's JVM and Python workers\, but at 6 billion rows daily\, the reality 
 is far more complex. This session explores the low-level mechanics of dist
 ributed inference\, focusing on the serialization bottlenecks that plague 
 large-scale Gradient Boosted Tree models.\n\nWe will conduct a forensic an
 alysis of execution plans generated by `pandas_udf`\, `mapInPandas`\, and 
 SynapseML. By profiling memory hierarchies and CPU cycles\, we visualize t
 he true cost of pickling\, Arrow record batching\, and JNI context switchi
 ng. Join this deep dive to understand the physics of distributed inference
  and learn how to tune `spark.sql.execution.arrow.maxRecordsPerBatch` to p
 revent OOMs without starving the CPU.
DTSTAMP:20260412T142015Z
LOCATION:Dynamicum [Ground Floor]
SUMMARY:Zero-Copy or Zero-Speed? The hidden overhead of PySpark\, Arrow & S
 ynapseML for inference - Petar Ilijevski
URL:https://pretalx.com/pyconde-pydata-2026/talk/TPNBRN/
END:VEVENT
END:VCALENDAR
