Open Data Hub Day 2024

Real-time Traffic Prediction in Bolzano Using Bluetooth Sensor Data: A Big Data Approach
2024-05-22 , Seminar room 1

The rapid urbanization of modern cities has led to increasing challenges in traffic management, with a significant impact on daily commutes, environmental pollution, and urban planning. Addressing these challenges requires innovative approaches to predict and analyze traffic patterns effectively. Our project presents a cutting-edge solution by designing and implementing a big data system capable of predicting near-time traffic flows in Bolzano, Italy. This system utilizes data obtained from strategically placed Bluetooth sensors, offering a novel approach to traffic management and analysis. The project leverages data from Open Data Hub Südtirol, focusing primarily on Bluetooth stations data, to predict traffic patterns at specific locations.
The data pipeline is elaborately designed, consisting of data collection, ingestion, preparation, computation, and presentation phases. Historical and stream data collection is automated using GitHub Actions. MySQL was chosen for data ingestion due to its simplicity and compatibility with the collected data types, while MongoDB is used for storing model predictions. The core of our system is a predictive model using Keras layers, specifically Sequential, LSTM (Long Short-Term Memory), and Dense layers, trained on historical data using Spark through Databricks. This LSTM-based neural network was chosen for its efficacy in sequential data prediction.
The predictions generated are integrated into a web application hosted on R based Shiny server, offering an interactive and insightful interface into the traffic trends in Bolzano. Specifically, the application's first page showcases real-time traffic predictions for each station. The second one is dedicated to displaying information about the traffic stations, including average traffic by hour, and traffic distribution throughout different times of the day. On the third page, users can explore historical traffic trends across all stations, with the ability to filter data by date, to identify the busiest periods. The fourth page delves into traffic insights, analyzing patterns based on weekdays, hours, months, and seasons, highlighting variations like decreased traffic on weekends and nights, and increased traffic during warmer months.
Our work highlights the potential and challenges of using big data and machine learning for real-time traffic prediction in urban settings. While the current system effectively predicts traffic patterns using available data, future enhancements could include incorporating additional data sources, optimizing script automation, exploring alternative database technologies, and refining the predictive model to incorporate more variables. Despite some limitations, such as the choice of MySQL over other databases and the inherent slowness of Shiny, our project demonstrates a successful application of big data technologies in addressing real-world problems.

Born on June 20, 1994, in Spilimbergo (PN). I have always been intrigued by the complexities of the human brain and the vast potential of data. My academic journey led me to pursue both a Bachelor's and a Master's degree in Neuroscience, laying a solid foundation for understanding cognitive processes and neural mechanisms. My thirst for knowledge didn't stop there. Recognizing the transformative power of data in the contemporary world, I furthered my education with a Master's degree in Data Science. This additional qualification allowed me to merge my neuroscience background with cutting-edge data analysis techniques. Professionally, I have channeled my expertise into the realm of artificial intelligence solutions and sustainable mobility, a field where data-driven insights can significantly contribute to environmental conservation and smarter urban planning. My career has been marked by significant roles such as a Data Science Intern at FBK (Fondazione Bruno Kessler) and at Almaviva SpA and Edizioni Erickson, where I harnessed the power of data to drive innovation and software solutions.
Currently I'm working as a Data Scientist at Fraunhofer Italia in the Bioeconomy and Sustainability team.