Allison Wang
Allison is a software engineer at Databricks, working on Spark SQL and PySpark. She holds a Bachelor’s degree in Computer Science from Carnegie Mellon University.
USA
Company / Organisation –Databricks
Session
PySpark is widely adopted for data analysis in distributed computing environments. It supports not only the standard DataFrame API but also Python User Defined Functions (UDFs), Python Data Sources, Python UDTFs, and more. However, debugging and profiling applications in such distributed environments are often challenging - you can't simply add a breakpoint and inspect variables in your IDE.
In this presentation, I will demonstrate effective methods for debugging and profiling PySpark applications using existing tools. These include profiling tools that utilize cProfile, a standard Python profiler, along with various tricks and best practices for monitoring and debugging PySpark applications.