Deep-learning analysis of speech using mel-spectrograms for the assessment of mild cognitive impairment and Alzheimer's disease
- Authors
- Choi, Yun Ho; Kim, Hyungjun; Hong, Suhun; Baek, Chaneun; Wang, Bohyun; Shim, YongSoo; Hong, Yun Jeong; Byun, Seonjeong; Song, In-Uk; Na, Seunghee; Won, Wang-Yeon; Park, Soung-Kyeong; Ryu, Seon Young; Hahn, Changtae; Shin, Hae Eun; Cho, A-Hyun; Lim, Eunye; Lim, Hyun Kook; Kang, Dong Woo; Kim, Hee-Jin; Choi, Hojin; Yoon, Bora; Kim, Woojun; Lim, Joon S.; Yang, Dong Won
- Issue Date
- Jan-2026
- Publisher
- SAGE Publications Ltd
- Keywords
- Alzheimer's disease; deep learning; diagnosis; mel-spectrogram; mild cognitive impairment; speech
- Citation
- Journal of Alzheimer's Disease, v.109, no.2, pp 928 - 939
- Pages
- 12
- Indexed
- SCIE
SCOPUS
- Journal Title
- Journal of Alzheimer's Disease
- Volume
- 109
- Number
- 2
- Start Page
- 928
- End Page
- 939
- URI
- https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/210299
- DOI
- 10.1177/13872877251401202
- ISSN
- 1387-2877
1875-8908
- Abstract
- Background: Speech abnormalities are recognized as early indicators of Alzheimer's disease (AD) and mild cognitive impairment (MCI).
Objective: To determine whether deep-learning models trained on mel-spectrograms of brief speech tasks can (i) discriminate individuals with MCI and AD from cognitively normal controls (NC) and (ii) estimate cognitive status with clinically useful accuracy.
Methods: Speech from 594 participants (185 NC, 231 MCI, 178 AD) was recorded through a mobile application that included 11 cognitive-linguistic tasks. Audio was converted into mel-spectrogram images and processed using a VGG16-based deep-learning model with transfer learning and fine-tuning of block 5. Task-specific feature vectors were extracted, concatenated, and used to train a deep neural network. The dataset was split into training, validation, and test sets (3:1:1), and five-split cross-validation was performed.
Results: The model demonstrated an overall accuracy of 72.4% in classifying NC from the abnormal group (MCI and AD), with sensitivity and specificity of 72.5% and 72.2%, respectively, a balanced accuracy of 72.4%, and an AUC of 0.997. In binary classifications, the model achieved 82.9% accuracy (balanced accuracy 82.9%, AUC 0.992) for NC versus AD, 70.7% accuracy (balanced accuracy 70.3%, AUC 0.956) for NC versus MCI, and 77.5% accuracy (balanced accuracy 78.9%, AUC 0.889) for MCI versus AD. Tasks such as serial subtraction, storytelling, and picture description contributed most to classification performance, indicating their effectiveness in capturing cognitive deficits.
Conclusions: Mel-spectrogram-based deep-learning analysis of speech shows promise as a rapid, non-invasive, and language-independent screening tool for early cognitive impairment, with potential advantages over traditional assessments such as the Mini-Mental State Examination.
- Files in This Item
-
Go to Link
- Appears in
Collections - 서울 의과대학 > 서울 신경과학교실 > 1. Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.