2026-07-16 –, Memorial Hall
Deploying deep learning models for time-series forecasting at retail scale presents a fundamental tension between prediction accuracy and computational cost. This talk presents a Python-based framework combining structured pruning, quantization-aware training, and knowledge distillation to compress LSTM networks for demand forecasting. Using NumPy, TensorFlow/Keras, and scikit-learn, we achieved 47% accuracy improvement over baseline models while reducing model size by 73% and inference costs by 92%. We discuss practical implementation patterns, reproducibility considerations, and how these compression techniques generalize beyond retail to any domain requiring efficient sequential prediction at scale.
Background
Many teams that use LSTM networks for time-series forecasting hit the same wall: as models get more complex, they become too slow and costly to run in production. In retail, for example, you may need to forecast demand for thousands of products every day. The same challenge shows up in energy, healthcare, logistics, and other fields.
Model compression , making models smaller while keeping them useful , is well studied for image models (CNNs), but less explored for recurrent models like LSTMs used in time-series work. This talk fills that gap using tools from the Python ecosystem.
What We Built
We developed a three-step compression pipeline, all in Python:
Structured Pruning: We used TensorFlow/Keras and NumPy to find and remove LSTM units that contribute the least. Unlike random pruning, this gives you a truly smaller model , not a sparse one that still takes up memory.
Quantization: We converted model weights from 32-bit floats to 8-bit integers using TensorFlow Lite, which cuts memory use and speeds up predictions with minimal loss in quality.
Knowledge Distillation: We trained a small "student" LSTM to learn from the larger "teacher" model. The student learns not just the final predictions but also the internal patterns the teacher uses, through custom Keras loss functions.
Data processing used pandas and NumPy. We tracked experiments with scikit-learn pipelines and visualized results with Matplotlib.
Results
The compressed model delivered strong improvements:
47% better accuracy (lower RMSE) than the uncompressed model
73% smaller model size
92% lower inference cost (wall-clock time)
An interesting finding: moderate compression acted like a regularizer, helping the model generalize better. This is consistent with the lottery ticket hypothesis , smaller networks can often outperform larger ones.
Who Should Attend
This talk is for data scientists, ML engineers, and researchers who deploy deep learning models in production and care about efficiency. You do not need to be a retail expert , the techniques apply to any sequential prediction task.
What You Will Learn
How to prune, quantize, and distill LSTM models using Python tools you already know
When compression helps vs. hurts forecast quality
Practical patterns for setting up reproducible compression experiments
How to adapt these methods to your own forecasting domain
Why This Matters for the SciPy Community
This work shows that the standard Python scientific stack (TensorFlow, NumPy, scikit-learn, Matplotlib) is enough to build production-ready model optimization , no special proprietary tools needed. As more teams scale up ML inference, efficient models become essential.
Links : https://ieeexplore.ieee.org/abstract/document/11380599
About My Background:
I'm a Senior Software Engineer with 9+ years of experience in software engineering and AI/ML research. I pursued MS in Applied Computer Science and am also pursuing PMBA currently. My research focuses on practical applications of machine learning, optimization techniques, and generative AI models across various domains.
Published Work:
My recent publications include:
Planogram Synthesis using Diffusion Models - Published by Springer (constraint-aware generative models for spatial optimization)
LSTM Compression Techniques - Accepted at IEEE ICUIS 2025 (neural network optimization for resource-constrained deployment)
Generative AI for MES Optimization: LLM-Driven Digital Manufacturing Configuration Recommendation- Published in International Journal of Applied Mathematics (LLM-based optimization for manufacturing systems)
Comparative Analysis of Optimized GCD and Hybrid LLM-GCD Approaches for Retail Shelf Space Allocation - Published in European Journal of Information Technologies and Computer Science (hybrid approaches combining LLMs with classical optimization)
Cost-Performance Analysis of Cloud-Based Retail Point-of-Sale Systems: A Comparative Study of Google Cloud Platform and Microsoft Azure
ResearchGate: https://www.researchgate.net/profile/Ravi-Teja-Pagidoju/research
Peer review Experience:
I have reviewed papers at IEEE Transactions on Industrial Informatics Journal(Q1).
I have judged multiple hackathons, DECA startup pitches, business intelligence awards,.
I’m also a mentor at Fuel accelerator (
https://www.fuelaccelerator.com) , an active member in Retail AI Council.
My Speaking Experience:
Presented at Generative AI Expo 2026
Presented at NWA Tech Fest
Presented Keynote at SCRS ConferenceSCRS Conference
Regular knowledge sharing within engineering teams
Email: Pagidojuraviteja1@gmail.com