Improving Joint Speech and Emotion Recognition Using Global Style Tokens

Kyung, Jehyun; Seong, Ju-Seok; Choi, Jeong-Hwan; Jeoung, Ye-Rin; Chang, Joon-Hyuk

doi:10.21437/Interspeech.2023-2375

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Improving Joint Speech and Emotion Recognition Using Global Style Tokens

Full metadata record

DC Field	Value	Language
dc.contributor.author	Kyung, Jehyun	-
dc.contributor.author	Seong, Ju-Seok	-
dc.contributor.author	Choi, Jeong-Hwan	-
dc.contributor.author	Jeoung, Ye-Rin	-
dc.contributor.author	Chang, Joon-Hyuk	-
dc.date.accessioned	2023-10-10T02:35:37Z	-
dc.date.available	2023-10-10T02:35:37Z	-
dc.date.created	2023-10-04	-
dc.date.issued	2023-08	-
dc.identifier.issn	2308-457X	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/191792	-
dc.description.abstract	Automatic speech recognition (ASR) and speech emotion recognition (SER) are closely related in that the acoustic features of speech, such as pitch, tone, and intensity, can vary according to the speaker's emotional state. Our study focuses on a joint ASR and SER task, in which an emotion token is tagged and recognized along with the text. To further improve the joint recognition performance, we propose a novel training method that adopts the global style tokens (GSTs). The style embedding is extracted from the GSTs module to enhance the joint ASR and SER model to capture emotional information from speech. Specifically, a conformer-based joint ASR and SER model pre-trained on a large-scale dataset is jointly fine-tuned with style embedding to improve both ASR and SER. The experimental results on the IEMOCAP dataset showed that the proposed model achieves a word error rate of 15.8% and four emotion classification weighted and unweighted accuracy of 75.1% and 76.3%, respectively.	-
dc.language	영어	-
dc.language.iso	en	-
dc.publisher	International Speech Communication Association	-
dc.title	Improving Joint Speech and Emotion Recognition Using Global Style Tokens	-
dc.type	Article	-
dc.contributor.affiliatedAuthor	Chang, Joon-Hyuk	-
dc.identifier.doi	10.21437/Interspeech.2023-2375	-
dc.identifier.scopusid	2-s2.0-85171588429	-
dc.identifier.bibliographicCitation	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, v.2023, pp.4528 - 4532	-
dc.relation.isPartOf	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH	-
dc.citation.title	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH	-
dc.citation.volume	2023	-
dc.citation.startPage	4528	-
dc.citation.endPage	4532	-
dc.type.rims	ART	-
dc.type.docType	Conference paper	-
dc.description.journalClass	1	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scopus	-
dc.subject.keywordPlus	Character recognition	-
dc.subject.keywordPlus	Classification (of information)	-
dc.subject.keywordPlus	Continuous speech recognition	-
dc.subject.keywordPlus	Embeddings	-
dc.subject.keywordPlus	Emotion Recognition	-
dc.subject.keywordPlus	Speech communication	-
dc.subject.keywordPlus	Acoustic features	-
dc.subject.keywordPlus	Automatic speech recognition	-
dc.subject.keywordPlus	Embeddings	-
dc.subject.keywordPlus	Emotion recognition	-
dc.subject.keywordPlus	Emotional state	-
dc.subject.keywordPlus	Global style token	-
dc.subject.keywordPlus	Performance	-
dc.subject.keywordPlus	Recognition models	-
dc.subject.keywordPlus	Speech emotion recognition	-
dc.subject.keywordPlus	Training methods	-
dc.subject.keywordPlus	Large dataset	-
dc.subject.keywordAuthor	automatic speech recognition	-
dc.subject.keywordAuthor	global style tokens	-
dc.subject.keywordAuthor	speech emotion recognition	-
dc.identifier.url	https://www.isca-speech.org/archive/interspeech_2023/kyung23_interspeech.html	-

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk: COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :6,007,935; Today View :34,677

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1365

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE