Conditional Generative Adversarial Network-Based roadway crash risk prediction considering heterogeneity with dynamic data
- Authors
- Park, Nuri; Park, Juneyoung; Lee, Chris
- Issue Date
- Feb-2025
- Publisher
- Elsevier Ltd
- Keywords
- Crash risk prediction model; Data augmentation; Explainable artificial intelligence; Machine learning; Traffic safety
- Citation
- Journal of Safety Research, v.92, pp 217 - 229
- Pages
- 13
- Indexed
- SCIE
SCOPUS
- Journal Title
- Journal of Safety Research
- Volume
- 92
- Start Page
- 217
- End Page
- 229
- URI
- https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/121415
- DOI
- 10.1016/j.jsr.2024.12.001
- ISSN
- 0022-4375
1879-1247
- Abstract
- Introduction: Roadway crash data are very rare and occur randomly, therefore there are several challenges to developing a crash prediction model for real-time traffic safety management. Recently, to resolve the problem of crash data sample size, researchers have conducted studies on crash data augmentation using machine learning techniques for developing safety evaluation models. However, it's important to incorporate the specific characteristics of crash data into augmentation and crash risk assessment, as these characteristics vary depending on spatial and temporal conditions. Method: Therefore, this study developed a real-time crash risk model in three stages. First, crash data were clustered to define heterogeneous crash risk situations and then, key variables were derived by the ensemble and explainable artificial intelligence techniques, Boruta-SHAP. Second, augmentation of each clustered crash data was performed using oversampling techniques including Conditional Generative Adversarial Network (CGAN), which can consider each crash risk cluster's characteristics. Finally, crash risk models were developed and compared with other crash risk models developed by using binary logistic regression model (BLM), Random Forest (RF), extreme gradient boosting (XGBoost), and Support Vector Machine (SVM). Results: The results showed that the CGAN-based XGBoost model has the best performance and the variable of the temporal speed difference at 10-minute intervals and the precipitation variable have a large impact on crash risk prediction. This paper emphasizes that crash risk characteristics must be distinguished in crash risk prediction and provides new insights into addressing the imbalance data issue within crash and non-crash datasets. © 2024 National Safety Council and Elsevier Ltd
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - COLLEGE OF ENGINEERING SCIENCES > DEPARTMENT OF TRANSPORTATION AND LOGISTICS ENGINEERING > 1. Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.