Improving Joint Speech and Emotion Recognition Using Global Style Tokens
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Kyung, Jehyun | - |
dc.contributor.author | Seong, Ju-Seok | - |
dc.contributor.author | Choi, Jeong-Hwan | - |
dc.contributor.author | Jeoung, Ye-Rin | - |
dc.contributor.author | Chang, Joon-Hyuk | - |
dc.date.accessioned | 2023-10-10T02:35:37Z | - |
dc.date.available | 2023-10-10T02:35:37Z | - |
dc.date.created | 2023-10-04 | - |
dc.date.issued | 2023-08 | - |
dc.identifier.issn | 2308-457X | - |
dc.identifier.uri | https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/191792 | - |
dc.description.abstract | Automatic speech recognition (ASR) and speech emotion recognition (SER) are closely related in that the acoustic features of speech, such as pitch, tone, and intensity, can vary according to the speaker's emotional state. Our study focuses on a joint ASR and SER task, in which an emotion token is tagged and recognized along with the text. To further improve the joint recognition performance, we propose a novel training method that adopts the global style tokens (GSTs). The style embedding is extracted from the GSTs module to enhance the joint ASR and SER model to capture emotional information from speech. Specifically, a conformer-based joint ASR and SER model pre-trained on a large-scale dataset is jointly fine-tuned with style embedding to improve both ASR and SER. The experimental results on the IEMOCAP dataset showed that the proposed model achieves a word error rate of 15.8% and four emotion classification weighted and unweighted accuracy of 75.1% and 76.3%, respectively. | - |
dc.language | 영어 | - |
dc.language.iso | en | - |
dc.publisher | International Speech Communication Association | - |
dc.title | Improving Joint Speech and Emotion Recognition Using Global Style Tokens | - |
dc.type | Article | - |
dc.contributor.affiliatedAuthor | Chang, Joon-Hyuk | - |
dc.identifier.doi | 10.21437/Interspeech.2023-2375 | - |
dc.identifier.scopusid | 2-s2.0-85171588429 | - |
dc.identifier.bibliographicCitation | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, v.2023, pp.4528 - 4532 | - |
dc.relation.isPartOf | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH | - |
dc.citation.title | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH | - |
dc.citation.volume | 2023 | - |
dc.citation.startPage | 4528 | - |
dc.citation.endPage | 4532 | - |
dc.type.rims | ART | - |
dc.type.docType | Conference paper | - |
dc.description.journalClass | 1 | - |
dc.description.isOpenAccess | N | - |
dc.description.journalRegisteredClass | scopus | - |
dc.subject.keywordPlus | Character recognition | - |
dc.subject.keywordPlus | Classification (of information) | - |
dc.subject.keywordPlus | Continuous speech recognition | - |
dc.subject.keywordPlus | Embeddings | - |
dc.subject.keywordPlus | Emotion Recognition | - |
dc.subject.keywordPlus | Speech communication | - |
dc.subject.keywordPlus | Acoustic features | - |
dc.subject.keywordPlus | Automatic speech recognition | - |
dc.subject.keywordPlus | Embeddings | - |
dc.subject.keywordPlus | Emotion recognition | - |
dc.subject.keywordPlus | Emotional state | - |
dc.subject.keywordPlus | Global style token | - |
dc.subject.keywordPlus | Performance | - |
dc.subject.keywordPlus | Recognition models | - |
dc.subject.keywordPlus | Speech emotion recognition | - |
dc.subject.keywordPlus | Training methods | - |
dc.subject.keywordPlus | Large dataset | - |
dc.subject.keywordAuthor | automatic speech recognition | - |
dc.subject.keywordAuthor | global style tokens | - |
dc.subject.keywordAuthor | speech emotion recognition | - |
dc.identifier.url | https://www.isca-speech.org/archive/interspeech_2023/kyung23_interspeech.html | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1365
COPYRIGHT © 2021 HANYANG UNIVERSITY.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.