Crash Prediction Under Limited CV Coverage: An Ensemble Deep Learning Model Integrating Multi-Source Traffic Data
Published in Transportation Research Part C: Emerging Technologies, 2026
Authors
Samgyu Yang*, Mohamed Abdel-Aty, Lei Han

Abstract
This study presents a comprehensive crash prediction framework that integrates traditional microwave vehicle detection systems (MVDS) with emerging connected vehicle (CV) data to improve proactive traffic safety management. While MVDS data provide consistent, infrastructure-based traffic measurements, their spatial coverage and behavioral resolution are limited. In contrast, CV data offer high-resolution, continuous vehicle trajectories that capture detailed driving behavior, but suffer from low and uneven market penetration. To fully leverage the strengths of both sources, an ensemble deep learning model was developed, utilizing MVDS data for macroscopic, segment-level traffic patterns and CV data for microscopic, vehicle-level dynamics. Importantly, rather than relying on segment-level aggregated CV metrics, this study directly utilizes individual vehicle trajectories to preserve temporal and spatial fidelity, enabling the model to capture detailed behavior that often precedes crashes. Three modeling configurations, MVDS only, CV only, and MVDS + CV data, were evaluated across different crash types and roadway segment types. Results demonstrate that integrated data source model consistently outperforms single-source models, achieving higher sensitivity and lower false alarm rates, particularly for rear-end and sideswipe crashes. Furthermore, model performance was evaluated under varying CV market penetration rates. While CV-only model showed limited performance under low coverage (<1%), it exhibited strong and stable results at 4 % penetration or higher, with sensitivity exceeding 0.79. These findings highlight the potential of CV data to support scalable crash prediction without relying on infrastructure-based sensors, especially as CV adoption expands. The proposed approach offers a robust and adaptable solution for enhancing roadway safety across diverse traffic environment.
Recommended citation: Yang, S., Abdel-Aty, M., & Han, L. (2026). Crash prediction under limited CV coverage: an ensemble deep learning model integrating multi-source traffic data. Transportation Research Part C: Emerging Technologies, 183, 105472.
Download Paper
