Michel Semaan
Michel Semaan is the Analytics Lead for Transaction Banking at Allica Bank, previously a Senior Analytics Engineer at Amazon. Beyond his day job, Michel teaches as a DataCamp instructor with two published SQL courses and as a Python and data science mentor with Great Learning and Springboard.
Session
Large SQL codebases inevitably accumulate duplication, inconsistency, deep nesting, and subtle logic errors, making refactoring slow, risky, and often unrealistic to do by hand. This talk shows how Python metaprogramming can turn SQL itself into data that can be analyzed and transformed safely and automatically.
Instead of relying on fragile regex patterns or manual inspection, we use Python to parse queries into Abstract Syntax Trees (represented as nested dictionaries) using libraries such as sqloxide. Once SQL itself is encoded as data, entirely new workflows become possible.
The session walks through practical examples of treating SQL programmatically via tree operations in Python: computing subquery depth for linting, wrapping all denominators in NULLIF() with a simple AST rewrite, auto‑aliasing aggregate expressions, and generating dependency graphs of temporary tables used across pipelines, among others. Each example highlights how metaprogramming enables precise, automatable refactors that would be error‑prone or impossible through text manipulation alone. This talk is designed for analytics and data engineers who work with large SQL codebases.