PyData Boston 2025

Tracking Policy Evolution Through Clustering: A New Approach to Temporal Pattern Analysis in Multi-Dimensional Data
2025-12-10 , Thomas Paul

Analyzing how patterns evolve over time in multi-dimensional datasets is challenging—traditional time-series methods often struggle with interpretability when comparing multiple entities across different scales. This talk introduces a clustering-based framework that transforms continuous data into categorical trajectories, enabling intuitive visualization and comparison of temporal patterns.What & Why: The method combines quartile-based categorization with modified Hamming distance to create interpretable "trajectory fingerprints" for entities over time. This approach is particularly valuable for policy analysis, economic comparisons, and any domain requiring longitudinal pattern recognition.Who: Data scientists and analysts working with temporal datasets, policy researchers, and anyone interested in comparative analysis across entities with different scales or distributions.Type: Technical presentation with practical implementation examples using Python (pandas, scikit-learn, matplotlib). Moderate mathematical content balanced with intuitive visualizations.Takeaway: Attendees will learn a novel approach to temporal pattern analysis that bridges the gap between complex statistical methods and accessible, policy-relevant insights. You'll see practical implementations analyzing 60+ years of fiscal policy data across 8 countries, with code available for adaptation to your own datasets.


Talk Outline (40 minutes total)
Minutes 0-5: Problem Setup & Motivation

Real-world scenario: Comparing fiscal policies across countries
Limitations of traditional approaches
Overview of categorical trajectory concept

Minutes 5-15: Methodology Deep-Dive

Quartile-based categorical transformation (with live Python demonstration)
Trajectory construction and visualization
Regular vs. modified Hamming distance (mathematical formulation + intuition)
Comprehensive trajectory metrics framework

Minutes 15-25: Implementation & Results

Dataset description (CPDS 1960-2022)
Code walkthrough using pandas and scikit-learn
Visualization techniques (matplotlib, seaborn)
Interpreting distance matrices and trajectory metrics
Key findings from fiscal policy analysis

Minutes 25-30: Extensions & Applications

Alternative domains (financial markets, environmental monitoring, healthcare)
Integration with machine learning pipelines
Limitations and when NOT to use this approach
Comparison to alternative methods (DTW, hierarchical clustering)

Minutes 30-35: Q&A
Minutes 35-40: Wrap-up & Resources

GitHub repository with complete implementation
Adaptations for different data types
Best practices for boundary selection

Prerequisites
Prior Knowledge Expected:

Intermediate Python (pandas, numpy)
Basic statistics (quartiles, distributions)
Familiarity with time-series data concepts
No advanced mathematics required—formulas explained intuitively

Audience Will NOT Need:

Background in econometrics or policy analysis (examples are self-contained)
Deep learning or advanced ML experience
Specialized visualization libraries


Prior Knowledge Expected: No previous knowledge expected

I have a passion for leveraging data to drive transformative outcomes. My journey spans across diverse roles, including that of a data analyst, data engineer, and currently an AI engineer.

As a Graduate Research Assistant at Boston University, I was involved in a wide array of projects that allowed me to creatively juxtapose the technological aspects of data science and machine learning on top of sophisticated concepts from finance, advertising, energy to perform analysis on interesting use cases. I presented my research at prominent conferences like Computer Science and Education in Computer Science (CSECS), ITISE, and NEDSI.

My professional experience encapsulates working with a wide array of AI and Cloud tools like OpenAI and Gemini models, RAGs, Agents using crew.ai, MCPs with Claude, advanced prompting, Snowflake, SnapLogic, and Power BI.

I am on a relentless pursuit of knowledge and excellence, committed to harnessing the power of data for informed decision-making and driving meaningful impact. Let's connect and explore how my versatile skill set can contribute to your data-centric endeavors.