Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Deep-learning analysis of speech using mel-spectrograms for the assessment of mild cognitive impairment and Alzheimer's disease

Authors
Choi, Yun HoKim, HyungjunHong, SuhunBaek, ChaneunWang, BohyunShim, YongSooHong, Yun JeongByun, SeonjeongSong, In-UkNa, SeungheeWon, Wang-YeonPark, Soung-KyeongRyu, Seon YoungHahn, ChangtaeShin, Hae EunCho, A-HyunLim, EunyeLim, Hyun KookKang, Dong WooKim, Hee-JinChoi, HojinYoon, BoraKim, WoojunLim, Joon S.Yang, Dong Won
Issue Date
Jan-2026
Publisher
SAGE Publications Ltd
Keywords
Alzheimer's disease; deep learning; diagnosis; mel-spectrogram; mild cognitive impairment; speech
Citation
Journal of Alzheimer's Disease, v.109, no.2, pp 928 - 939
Pages
12
Indexed
SCIE
SCOPUS
Journal Title
Journal of Alzheimer's Disease
Volume
109
Number
2
Start Page
928
End Page
939
URI
https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/210299
DOI
10.1177/13872877251401202
ISSN
1387-2877
1875-8908
Abstract
Background: Speech abnormalities are recognized as early indicators of Alzheimer's disease (AD) and mild cognitive impairment (MCI). Objective: To determine whether deep-learning models trained on mel-spectrograms of brief speech tasks can (i) discriminate individuals with MCI and AD from cognitively normal controls (NC) and (ii) estimate cognitive status with clinically useful accuracy. Methods: Speech from 594 participants (185 NC, 231 MCI, 178 AD) was recorded through a mobile application that included 11 cognitive-linguistic tasks. Audio was converted into mel-spectrogram images and processed using a VGG16-based deep-learning model with transfer learning and fine-tuning of block 5. Task-specific feature vectors were extracted, concatenated, and used to train a deep neural network. The dataset was split into training, validation, and test sets (3:1:1), and five-split cross-validation was performed. Results: The model demonstrated an overall accuracy of 72.4% in classifying NC from the abnormal group (MCI and AD), with sensitivity and specificity of 72.5% and 72.2%, respectively, a balanced accuracy of 72.4%, and an AUC of 0.997. In binary classifications, the model achieved 82.9% accuracy (balanced accuracy 82.9%, AUC 0.992) for NC versus AD, 70.7% accuracy (balanced accuracy 70.3%, AUC 0.956) for NC versus MCI, and 77.5% accuracy (balanced accuracy 78.9%, AUC 0.889) for MCI versus AD. Tasks such as serial subtraction, storytelling, and picture description contributed most to classification performance, indicating their effectiveness in capturing cognitive deficits. Conclusions: Mel-spectrogram-based deep-learning analysis of speech shows promise as a rapid, non-invasive, and language-independent screening tool for early cognitive impairment, with potential advantages over traditional assessments such as the Mini-Mental State Examination.
Files in This Item
Go to Link
Appears in
Collections
서울 의과대학 > 서울 신경과학교실 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Choi, Hojin photo

Choi, Hojin
서울 의과대학 (DEPARTMENT OF NEUROLOGY)
Read more

Altmetrics

Total Views & Downloads

BROWSE