Cited 0 time in
Improving Joint Speech and Emotion Recognition Using Global Style Tokens
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | 경제현 | - |
| dc.contributor.author | 성주석 | - |
| dc.contributor.author | 최정환 | - |
| dc.contributor.author | 정예린 | - |
| dc.contributor.author | Chang, Joon-Hyuk | - |
| dc.date.accessioned | 2023-10-10T02:35:37Z | - |
| dc.date.available | 2023-10-10T02:35:37Z | - |
| dc.date.issued | 2023-08 | - |
| dc.identifier.issn | 1990-9772 | - |
| dc.identifier.uri | https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/191792 | - |
| dc.description.abstract | Automatic speech recognition (ASR) and speech emotion recognition (SER) are closely related in that the acoustic features of speech, such as pitch, tone, and intensity, can vary according to the speaker's emotional state. Our study focuses on a joint ASR and SER task, in which an emotion token is tagged and recognized along with the text. To further improve the joint recognition performance, we propose a novel training method that adopts the global style tokens (GSTs). The style embedding is extracted from the GSTs module to enhance the joint ASR and SER model to capture emotional information from speech. Specifically, a conformer-based joint ASR and SER model pre-trained on a large-scale dataset is jointly fine-tuned with style embedding to improve both ASR and SER. The experimental results on the IEMOCAP dataset showed that the proposed model achieves a word error rate of 15.8% and four emotion classification weighted and unweighted accuracy of 75.1% and 76.3%, respectively. | - |
| dc.format.extent | 5 | - |
| dc.language | 영어 | - |
| dc.language.iso | ENG | - |
| dc.title | Improving Joint Speech and Emotion Recognition Using Global Style Tokens | - |
| dc.type | Article | - |
| dc.identifier.doi | 10.21437/Interspeech.2023-2375 | - |
| dc.identifier.scopusid | 2-s2.0-85171588429 | - |
| dc.identifier.wosid | 001186650304138 | - |
| dc.identifier.bibliographicCitation | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, v.2023, pp 4528 - 4532 | - |
| dc.citation.title | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH | - |
| dc.citation.volume | 2023 | - |
| dc.citation.startPage | 4528 | - |
| dc.citation.endPage | 4532 | - |
| dc.type.docType | Proceedings Paper | - |
| dc.description.isOpenAccess | N | - |
| dc.description.journalRegisteredClass | scopus | - |
| dc.relation.journalResearchArea | Acoustics | - |
| dc.relation.journalResearchArea | Audiology & Speech-Language Pathology | - |
| dc.relation.journalResearchArea | Computer Science | - |
| dc.relation.journalWebOfScienceCategory | Acoustics | - |
| dc.relation.journalWebOfScienceCategory | Audiology & Speech-Language Pathology | - |
| dc.relation.journalWebOfScienceCategory | Computer Science, Artificial Intelligence | - |
| dc.relation.journalWebOfScienceCategory | Computer Science, Software Engineering | - |
| dc.subject.keywordPlus | Character recognition | - |
| dc.subject.keywordPlus | Classification (of information) | - |
| dc.subject.keywordPlus | Continuous speech recognition | - |
| dc.subject.keywordPlus | Embeddings | - |
| dc.subject.keywordPlus | Emotion Recognition | - |
| dc.subject.keywordPlus | Speech communication | - |
| dc.subject.keywordPlus | Acoustic features | - |
| dc.subject.keywordPlus | Automatic speech recognition | - |
| dc.subject.keywordPlus | Embeddings | - |
| dc.subject.keywordPlus | Emotion recognition | - |
| dc.subject.keywordPlus | Emotional state | - |
| dc.subject.keywordPlus | Global style token | - |
| dc.subject.keywordPlus | Performance | - |
| dc.subject.keywordPlus | Recognition models | - |
| dc.subject.keywordPlus | Speech emotion recognition | - |
| dc.subject.keywordPlus | Training methods | - |
| dc.subject.keywordPlus | Large dataset | - |
| dc.subject.keywordAuthor | automatic speech recognition | - |
| dc.subject.keywordAuthor | global style tokens | - |
| dc.subject.keywordAuthor | speech emotion recognition | - |
| dc.identifier.url | https://www.isca-speech.org/archive/interspeech_2023/kyung23_interspeech.html | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366
COPYRIGHT © 2024 HANYANG UNIVERSITY.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.
