Searching for effective preprocessing method and CNN based architecture with efficient channel attention on speech emotion recognition

Kim, Byunggun; Kwon, Younghun

doi:10.1038/s41598-025-19887-7

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Searching for effective preprocessing method and CNN based architecture with efficient channel attention on speech emotion recognition

Full metadata record

DC Field	Value	Language
dc.contributor.author	Kim, Byunggun	-
dc.contributor.author	Kwon, Younghun	-
dc.date.accessioned	2025-10-21T07:00:39Z	-
dc.date.available	2025-10-21T07:00:39Z	-
dc.date.issued	2025-09	-
dc.identifier.issn	2045-2322	-
dc.identifier.issn	2045-2322	-
dc.identifier.uri	https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/126740	-
dc.description.abstract	Recently, Speech emotion recognition (SER) performance has steadily increased as multiple deep learning architectures have adapted. Especially, convolutional neural network (CNN) models with spectrogram data preprocessing are the most popular approach in the SER. However, designing an effective and efficient preprocessing method and a CNN-based model for SER is still ambiguous. Therefore, it needs to search for more concrete preprocessing methods and a CNN-based model for SER. First, to search for a proper frequency-time resolution for SER, we prepare eight different datasets with preprocessing settings. Furthermore, to compensate for the lack of emotional feature resolution, we propose multiple short-term Fourier transform (STFT) preprocessing data augmentation that augments trainable data with all different sizes of windows. Next, because CNN’s channel filters are core to detecting hidden input features, we focus on the channel filters’ effectiveness on SER. To do so, we design several types of architecture that contain a 6-layer CNN model. Also, with efficient channel attention (ECA) that is well known to improve channel feature representation with only a few parameters, we find that it can more efficiently train the channel filters for SER. With two different SER datasets (Interactive Emotional Dyadic Motion Capture, Berlin Emotional Speech Database), increasing the frequency resolution in preprocessing emotional speech can improve emotion recognition performance. Consequently, the CNN-based model with only two ECA blocks can exceed the performance of previous SER models. Especially, with STFT data augmentation, our proposed model achieves the highest performance on SER.	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	Nature Research	-
dc.title	Searching for effective preprocessing method and CNN based architecture with efficient channel attention on speech emotion recognition	-
dc.type	Article	-
dc.publisher.location	영국	-
dc.identifier.doi	10.1038/s41598-025-19887-7	-
dc.identifier.scopusid	2-s2.0-105016909288	-
dc.identifier.wosid	001580634200006	-
dc.identifier.bibliographicCitation	Scientific Reports, v.15, no.1	-
dc.citation.title	Scientific Reports	-
dc.citation.volume	15	-
dc.citation.number	1	-
dc.type.docType	Article	-
dc.description.isOpenAccess	Y	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Science & Technology - Other Topics	-
dc.relation.journalWebOfScienceCategory	Multidisciplinary Sciences	-
dc.subject.keywordPlus	NEURAL-NETWORKS	-
dc.subject.keywordPlus	RECURRENT	-
dc.subject.keywordPlus	SPECTRUM	-
dc.subject.keywordPlus	FEATURES	-
dc.subject.keywordAuthor	Convolutional neural network	-
dc.subject.keywordAuthor	Data augmentation	-
dc.subject.keywordAuthor	Efficient channel attention	-
dc.subject.keywordAuthor	Log-Mel spectrogram	-
dc.subject.keywordAuthor	Speech emotion recognition	-
dc.identifier.url	https://www.nature.com/articles/s41598-025-19887-7	-

Files in This Item: Go to Link

Appears in Collections: ETC > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Kwon, Young hun photo

Kwon, Young hun: ERICA 첨단융합대학 (ERICA 지능정보양자공학전공)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

55 Hanyangdeahak-ro, Sangnok-gu, Ansan, Gyeonggi-do, 15588, Korea+82-31-400-4269 sweetbrain@hanyang.ac.kr

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE