2025-12-10 –, Thomas Paul
Analyzing how patterns evolve over time in multi-dimensional datasets is challenging—traditional time-series methods often struggle with interpretability when comparing multiple entities across different scales. This talk introduces a clustering-based framework that transforms continuous data into categorical trajectories, enabling intuitive visualization and comparison of temporal patterns.What & Why: The method combines quartile-based categorization with modified Hamming distance to create interpretable "trajectory fingerprints" for entities over time. This approach is particularly valuable for policy analysis, economic comparisons, and any domain requiring longitudinal pattern recognition.Who: Data scientists and analysts working with temporal datasets, policy researchers, and anyone interested in comparative analysis across entities with different scales or distributions.Type: Technical presentation with practical implementation examples using Python (pandas, scikit-learn, matplotlib). Moderate mathematical content balanced with intuitive visualizations.Takeaway: Attendees will learn a novel approach to temporal pattern analysis that bridges the gap between complex statistical methods and accessible, policy-relevant insights. You'll see practical implementations analyzing 60+ years of fiscal policy data across 8 countries, with code available for adaptation to your own datasets.
Talk Outline (40 minutes total)
Minutes 0-5: Problem Setup & Motivation
Real-world scenario: Comparing fiscal policies across countries
Limitations of traditional approaches
Overview of categorical trajectory concept
Minutes 5-15: Methodology Deep-Dive
Quartile-based categorical transformation (with live Python demonstration)
Trajectory construction and visualization
Regular vs. modified Hamming distance (mathematical formulation + intuition)
Comprehensive trajectory metrics framework
Minutes 15-25: Implementation & Results
Dataset description (CPDS 1960-2022)
Code walkthrough using pandas and scikit-learn
Visualization techniques (matplotlib, seaborn)
Interpreting distance matrices and trajectory metrics
Key findings from fiscal policy analysis
Minutes 25-30: Extensions & Applications
Alternative domains (financial markets, environmental monitoring, healthcare)
Integration with machine learning pipelines
Limitations and when NOT to use this approach
Comparison to alternative methods (DTW, hierarchical clustering)
Minutes 30-35: Q&A
Minutes 35-40: Wrap-up & Resources
GitHub repository with complete implementation
Adaptations for different data types
Best practices for boundary selection
Prerequisites
Prior Knowledge Expected:
Intermediate Python (pandas, numpy)
Basic statistics (quartiles, distributions)
Familiarity with time-series data concepts
No advanced mathematics required—formulas explained intuitively
Audience Will NOT Need:
Background in econometrics or policy analysis (examples are self-contained)
Deep learning or advanced ML experience
Specialized visualization libraries
I have a passion for leveraging data to drive transformative outcomes. My journey spans across diverse roles, including that of a data analyst, data engineer, and currently an AI engineer.
As a Graduate Research Assistant at Boston University, I was involved in a wide array of projects that allowed me to creatively juxtapose the technological aspects of data science and machine learning on top of sophisticated concepts from finance, advertising, energy to perform analysis on interesting use cases. I presented my research at prominent conferences like Computer Science and Education in Computer Science (CSECS), ITISE, and NEDSI.
My professional experience encapsulates working with a wide array of AI and Cloud tools like OpenAI and Gemini models, RAGs, Agents using crew.ai, MCPs with Claude, advanced prompting, Snowflake, SnapLogic, and Power BI.
I am on a relentless pursuit of knowledge and excellence, committed to harnessing the power of data for informed decision-making and driving meaningful impact. Let's connect and explore how my versatile skill set can contribute to your data-centric endeavors.