Intra-ensemble: A New Method for Combining Intermediate Outputs in Transformer-based Automatic Speech Recognition
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Kim, DoHee | - |
dc.contributor.author | Choi, Jieun | - |
dc.contributor.author | Chang, Joon-Hyuk | - |
dc.date.accessioned | 2023-10-10T02:36:06Z | - |
dc.date.available | 2023-10-10T02:36:06Z | - |
dc.date.created | 2023-10-04 | - |
dc.date.issued | 2023-08 | - |
dc.identifier.issn | 2308-457X | - |
dc.identifier.uri | https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/191795 | - |
dc.description.abstract | Deep learning models employ various regularization techniques to prevent overfitting and enhance generalization. In particular, an auxiliary loss, as proposed for connectionist temporal classification (CTC) models, demonstrated the potential for intermediate prediction to be useful by enabling sub-models to recognize speech accurately. We propose a new method called Intra-ensemble, which combines these accurate intermediate outputs into a single output for both training and inference, considering the importance of the intermediate layer using learnable parameters. Our approach is applicable to CTC models, attention-based encoder-decoder models, and transducer structures and demonstrated performance improvements of 13.5%, 3.0%, and 4.1% respectively, in the LibriSpeech evaluation. Furthermore, through various analytical experiments, we found that the sub-models contributed significantly to performance improvement. | - |
dc.language | 영어 | - |
dc.language.iso | en | - |
dc.publisher | International Speech Communication Association | - |
dc.title | Intra-ensemble: A New Method for Combining Intermediate Outputs in Transformer-based Automatic Speech Recognition | - |
dc.type | Article | - |
dc.contributor.affiliatedAuthor | Chang, Joon-Hyuk | - |
dc.identifier.doi | 10.21437/Interspeech.2023-1255 | - |
dc.identifier.scopusid | 2-s2.0-85171529529 | - |
dc.identifier.bibliographicCitation | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, v.2023-August, pp.2203 - 2207 | - |
dc.relation.isPartOf | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH | - |
dc.citation.title | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH | - |
dc.citation.volume | 2023-August | - |
dc.citation.startPage | 2203 | - |
dc.citation.endPage | 2207 | - |
dc.type.rims | ART | - |
dc.type.docType | Conference paper | - |
dc.description.journalClass | 1 | - |
dc.description.isOpenAccess | N | - |
dc.description.journalRegisteredClass | scopus | - |
dc.subject.keywordPlus | Deep learning | - |
dc.subject.keywordPlus | Speech communication | - |
dc.subject.keywordPlus | Automatic speech recognition | - |
dc.subject.keywordPlus | Classification models | - |
dc.subject.keywordPlus | Ensemble | - |
dc.subject.keywordPlus | Generalisation | - |
dc.subject.keywordPlus | Learning models | - |
dc.subject.keywordPlus | Overfitting | - |
dc.subject.keywordPlus | Performance | - |
dc.subject.keywordPlus | Regularization technique | - |
dc.subject.keywordPlus | Submodels | - |
dc.subject.keywordPlus | Temporal classification | - |
dc.subject.keywordPlus | Speech recognition | - |
dc.subject.keywordAuthor | ensemble | - |
dc.subject.keywordAuthor | speech recognition | - |
dc.identifier.url | https://www.isca-speech.org/archive/interspeech_2023/kim23e_interspeech.html | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1365
COPYRIGHT © 2021 HANYANG UNIVERSITY.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.