Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Deep-learning analysis of speech using mel-spectrograms for the assessment of mild cognitive impairment and Alzheimer's disease

Authors
Choi, Yun HoKim, HyungjunHong, SuhunBaek, ChaneunWang, BohyunShim, YongSooHong, Yun JeongByun, SeonjeongSong, In-UkNa, SeungheeWon, Wang-YeonPark, Soung-KyeongRyu, Seon YoungHahn, ChangtaeShin, Hae EunCho, A-HyunLim, EunyeLim, Hyun KookKang, Dong WooKim, Hee-JinChoi, HojinYoon, BoraKim, WoojunLim, Joon S.Yang, Dong Won
Issue Date
Jan-2026
Publisher
SAGE Publications Ltd
Keywords
Alzheimer's disease; deep learning; diagnosis; mel-spectrogram; mild cognitive impairment; speech
Citation
Journal of Alzheimer's Disease, v.109, no.2, pp 928 - 939
Pages
12
Indexed
SCIE
SCOPUS
Journal Title
Journal of Alzheimer's Disease
Volume
109
Number
2
Start Page
928
End Page
939
URI
https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/210299
DOI
10.1177/13872877251401202
ISSN
1387-2877
1875-8908
Abstract
Background: Speech abnormalities are recognized as early indicators of Alzheimer's disease (AD) and mild cognitive impairment (MCI). Objective: To determine whether deep-learning models trained on mel-spectrograms of brief speech tasks can (i) discriminate individuals with MCI and AD from cognitively normal controls (NC) and (ii) estimate cognitive status with clinically useful accuracy. Methods: Speech from 594 participants (185 NC, 231 MCI, 178 AD) was recorded through a mobile application that included 11 cognitive-linguistic tasks. Audio was converted into mel-spectrogram images and processed using a VGG16-based deep-learning model with transfer learning and fine-tuning of block 5. Task-specific feature vectors were extracted, concatenated, and used to train a deep neural network. The dataset was split into training, validation, and test sets (3:1:1), and five-split cross-validation was performed. Results: The model demonstrated an overall accuracy of 72.4% in classifying NC from the abnormal group (MCI and AD), with sensitivity and specificity of 72.5% and 72.2%, respectively, a balanced accuracy of 72.4%, and an AUC of 0.997. In binary classifications, the model achieved 82.9% accuracy (balanced accuracy 82.9%, AUC 0.992) for NC versus AD, 70.7% accuracy (balanced accuracy 70.3%, AUC 0.956) for NC versus MCI, and 77.5% accuracy (balanced accuracy 78.9%, AUC 0.889) for MCI versus AD. Tasks such as serial subtraction, storytelling, and picture description contributed most to classification performance, indicating their effectiveness in capturing cognitive deficits. Conclusions: Mel-spectrogram-based deep-learning analysis of speech shows promise as a rapid, non-invasive, and language-independent screening tool for early cognitive impairment, with potential advantages over traditional assessments such as the Mini-Mental State Examination.
Files in This Item
Go to Link
Appears in
Collections
서울 의과대학 > 서울 신경과학교실 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Kim, Hee-Jin photo

Kim, Hee-Jin
서울 의과대학 (DEPARTMENT OF NEUROLOGY)
Read more

Altmetrics

Total Views & Downloads

BROWSE