2026-03-21 –, Yuchengco Hall 5th Flr. Y507 (Workshop Room 1)
Pandas is one of the most widely used tools in Python, yet many developers unintentionally write slow or memory heavy DataFrame code. This talk covers practical performance techniques that can significantly speed up Pandas workflows: vectorization, avoiding apply, optimizing data types, reducing memory usage, minimizing DataFrame copies, improving joins and groupbys, and using chunked loading for large files. We also look at when to extend Pandas with Polars, Apache Arrow, or DuckDB for faster execution. If you work with data at any scale, this session gives you simple, actionable tricks to make your Pandas pipelines faster and more production ready.
Pandas is powerful, flexible, and easy to use, but it can also become painfully slow or memory hungry when the data grows or the operations get complex. Most performance problems in Pandas come from a small set of common patterns: unnecessary loops, incorrect dtypes, inefficient joins, and operations that silently create large copies behind the scenes.
In this talk, we explore practical, real world techniques to make Pandas fast without rewriting your entire pipeline or switching to heavier systems like Spark. You will learn how to diagnose slow DataFrame code, apply vectorization effectively, use categoricals to reduce memory, avoid hidden allocations, optimize I/O, and use modern tools like Polars or DuckDB when needed, while still keeping Pandas as the main tool in your workflow.
The session includes before and after examples, benchmarks, and lessons from real production data pipelines. Whether you are a backend engineer, data engineer, or ML practitioner, you will leave with tools and tricks that make your Pandas code much faster and more predictable.
Sooraj is a Product Engineer at Strollby Inc, where he focuses on building scalable backend architectures and driving performance-focused optimizations.
He is an open-source contributor to frameworks like Agno AGI and Google ADK, and an active speaker at community events — including PyData Global 2025 and the Trivandrum Python Community — where he shares insights on backend architecture, AI systems, and developer productivity tools.
He was also a finalist in the Python Code Jam 2025, recognized for his out of the box solutions
I’m Allen, Associate Software Engineer at Red Hat, focused on building scalable, high-performance backend systems using Python, GraphQL, Kubernetes, and cloud native tooling. I work across distributed systems, payments, event-driven pipelines, and async architectures with experience in GCP, AWS, PostgreSQL, and MongoDB.
I’m an active speaker in the Python community, with talks delivered at
PyCascades 2025 (Portland, USA) — Unlocking Concurrency in Python with AsyncIO and ASGI
FOSSASIA Summit 2025 (Bangkok, Thailand) — Kubernetes Kung Fu: Mastering Containers and Microservices
PyCon India 2025 (Banglore, India) — From Stress to Success: Load Testing Python Apps & Visualizing Performance
