Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Machine Learning Models for Predicting the Occurrence of Respiratory Diseases Using Climatic and Air-Pollution Factors

Authors
Ku, YunseoBin Kwon, SoonYoon, Jeong-HwaMun, Seog-KyunChang, Munyoung
Issue Date
May-2022
Publisher
KOREAN SOC OTORHINOLARYNGOL
Keywords
Machine Learning; Respiratory Diseases; Climate; Air Pollution; Gradient Boosting; Gaussian Process Regression
Citation
CLINICAL AND EXPERIMENTAL OTORHINOLARYNGOLOGY, v.15, no.2, pp 168 - 176
Pages
9
Journal Title
CLINICAL AND EXPERIMENTAL OTORHINOLARYNGOLOGY
Volume
15
Number
2
Start Page
168
End Page
176
URI
https://scholarworks.bwise.kr/cau/handle/2019.sw.cau/61919
DOI
10.21053/ceo.2021.01536
ISSN
1976-8710
2005-0720
Abstract
Objectives. Because climatic and air-pollution factors are known to influence the occurrence of respiratory diseases, we used these factors to develop machine learning models for predicting the occurrence of respiratory diseases. Methods. We obtained the daily number of respiratory disease patients in Seoul. We used climatic and air-pollution factors to predict the daily number of patients treated for respiratory diseases per 10,000 inhabitants. We applied the reliefbased feature selection algorithm to evaluate the importance of feature selection. We used the gradient boosting and Gaussian process regression (GPR) methods, respectively, to develop two different prediction models. We also employed the holdout cross-validation method, in which 75% of the data was used to train the model, and the remaining 25% was used to test the trained model. We determined the estimated number of respiratory disease patients by applying the developed prediction models to the test set. To evaluate the performance of each model, we calculated the coefficient of determination (R2) and the root mean square error (RMSE) between the original and estimated numbers of respiratory disease patients. We used the Shapley Additive exPlanations (SHAP) approach to interpret the estimated output of each machine learning model. Results. Features with negative weights in the relief-based algorithm were excluded. When applying gradient boosting to unseen test data, R2 and RMSE were 0.68 and 13.8, respectively. For GPR, the R2 and RMSE were 0.67 and 13.9, respectively. SHAP analysis showed that reductions in average temperature, daylight duration, average humidity, sulfur dioxide (SO2), total solar insolation amount, and temperature difference increased the number of respiratory disease patients, whereas increases in atmospheric pressure, carbon monoxide (CO), and particulate matter <= 2.5 mu m in aerodynamic diameter (PM2.5) increased the number of respiratory disease patients. Conclusions. We successfully developed models for predicting the occurrence of respiratory diseases using climatic and airpollution factors. These models could evolve into public warning systems.
Files in This Item
Appears in
Collections
ETC > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Mun, Seog Kyun photo

Mun, Seog Kyun
의과대학 (의학부(임상-서울))
Read more

Altmetrics

Total Views & Downloads

BROWSE