Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Intra-ensemble: A New Method for Combining Intermediate Outputs in Transformer-based Automatic Speech Recognition

Full metadata record
DC Field Value Language
dc.contributor.authorKim, DoHee-
dc.contributor.authorChoi, Jieun-
dc.contributor.authorChang, Joon-Hyuk-
dc.date.accessioned2023-10-10T02:36:06Z-
dc.date.available2023-10-10T02:36:06Z-
dc.date.created2023-10-04-
dc.date.issued2023-08-
dc.identifier.issn2308-457X-
dc.identifier.urihttps://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/191795-
dc.description.abstractDeep learning models employ various regularization techniques to prevent overfitting and enhance generalization. In particular, an auxiliary loss, as proposed for connectionist temporal classification (CTC) models, demonstrated the potential for intermediate prediction to be useful by enabling sub-models to recognize speech accurately. We propose a new method called Intra-ensemble, which combines these accurate intermediate outputs into a single output for both training and inference, considering the importance of the intermediate layer using learnable parameters. Our approach is applicable to CTC models, attention-based encoder-decoder models, and transducer structures and demonstrated performance improvements of 13.5%, 3.0%, and 4.1% respectively, in the LibriSpeech evaluation. Furthermore, through various analytical experiments, we found that the sub-models contributed significantly to performance improvement.-
dc.language영어-
dc.language.isoen-
dc.publisherInternational Speech Communication Association-
dc.titleIntra-ensemble: A New Method for Combining Intermediate Outputs in Transformer-based Automatic Speech Recognition-
dc.typeArticle-
dc.contributor.affiliatedAuthorChang, Joon-Hyuk-
dc.identifier.doi10.21437/Interspeech.2023-1255-
dc.identifier.scopusid2-s2.0-85171529529-
dc.identifier.bibliographicCitationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, v.2023-August, pp.2203 - 2207-
dc.relation.isPartOfProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH-
dc.citation.titleProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH-
dc.citation.volume2023-August-
dc.citation.startPage2203-
dc.citation.endPage2207-
dc.type.rimsART-
dc.type.docTypeConference paper-
dc.description.journalClass1-
dc.description.isOpenAccessN-
dc.description.journalRegisteredClassscopus-
dc.subject.keywordPlusDeep learning-
dc.subject.keywordPlusSpeech communication-
dc.subject.keywordPlusAutomatic speech recognition-
dc.subject.keywordPlusClassification models-
dc.subject.keywordPlusEnsemble-
dc.subject.keywordPlusGeneralisation-
dc.subject.keywordPlusLearning models-
dc.subject.keywordPlusOverfitting-
dc.subject.keywordPlusPerformance-
dc.subject.keywordPlusRegularization technique-
dc.subject.keywordPlusSubmodels-
dc.subject.keywordPlusTemporal classification-
dc.subject.keywordPlusSpeech recognition-
dc.subject.keywordAuthorensemble-
dc.subject.keywordAuthorspeech recognition-
dc.identifier.urlhttps://www.isca-speech.org/archive/interspeech_2023/kim23e_interspeech.html-
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk
COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)
Read more

Altmetrics

Total Views & Downloads

BROWSE