Security BSides Las Vegas 2025

Advancing Network Threat Detection Through Standardized Feature Extraction and Dynamic Ensemble Learning
2025-08-05 , Siena

This talk introduces a research-driven approach to improving network intrusion detection by combining standardized feature extraction techniques with dynamic ensemble machine learning. Traditional signature-based detection struggles to identify new or evolving attacks, and prior ML-based research often suffers from poor generalization due to narrow datasets and single-model reliance. This work addresses these shortcomings by proposing a standardized feature extraction framework focusing on metadata and flow-level statistics, training multiple diverse machine learning models, and developing a novel ensemble classifier to optimize detection based on class-specific model strengths. Experimental validation shows the ensemble maintains high detection accuracy (97.92%) across various traffic types while minimizing false positives, offering a promising foundation for building more adaptable and resilient network defenses.


This research from my undergraduate senior thesis for my degree in Applied Computer Science - Cybersecurity from the University of South Carolina to be conferred in May 2025. Through my prior work in Infosec and an extensive literature review, I found deficiencies in both traditional NIDS solutions and ML-driven detection experiments that rely too heavily on limited datasets and monolithic classifiers. Over the past 18 months, I developed a feature extraction framework standardizing packet and flow statistics to enhance model generalization across multiple environments, including CTU-13, TON_IoT, USTC-TFC2016, and custom-collected benign traffic. Eight machine learning models were selected to represent varied classification strategies: Random Forest, Isolation Forest, Gaussian Mixture Models, Quadratic Discriminant Analysis, AdaBoost, XGBoost, CNN, and RNN.

I then designed the Ford-CSWV ensemble algorithm, which applies dynamic class-specific weighting to model outputs during classification, improving robustness across traffic variations. Experimental results demonstrate that while the ensemble yields only minor gains in overall accuracy compared to top individual models, it significantly improves stability and adaptability, which are critical for real-world implementations.

The talk will include a detailed walkthrough of the difference between NIDS and NDR, feature selection rationale, model training approaches, the mechanics of the Ford-CSWV ensemble classifier, and the classification results of my experiment. Slides include dataset comparisons, classifier diagrams, and ensemble methodology visuals. I will not be conducting a live demo, but the session will be highly visual and practical, and designed for security practitioners, researchers, and students interested in applied ML for cybersecurity.

Jason is a Principal Research Engineer at Proofpoint. He is interested in building security tools and has experience writing Java, PowerShell, and Python. Jason has been working in a variety of roles in InfoSec for over 20 years, and has recently found his calling doing research on topics related to security and machine learning. When he's not tinkering with stuff in his home lab, you'll find him listening to EDM and enjoying the outdoors camping, running, hiking, and skiing.

LinkedIn: https://www.linkedin.com/in/jasonsford/
ResearchGate: https://www.researchgate.net/profile/Jason-Ford-6
My GitHub: https://github.com/jasonsford