Cited 0 time in
Compress, Align, and Transfer: A new method for transferring pre-trained language models knowledge to CTC-based speech recognition
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Choi, Jieun | - |
| dc.contributor.author | Kim, Dohee | - |
| dc.contributor.author | Chang, Joon-Hyuk | - |
| dc.date.accessioned | 2026-02-11T02:00:30Z | - |
| dc.date.available | 2026-02-11T02:00:30Z | - |
| dc.date.issued | 2026-03 | - |
| dc.identifier.issn | 0885-2308 | - |
| dc.identifier.issn | 1095-8363 | - |
| dc.identifier.uri | https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/210759 | - |
| dc.description.abstract | Connectionist temporal classification (CTC) model is a leading approach for end-to-end (E2E) automatic speech recognition (ASR), known for its simplicity and fast speed, enabled by non-autoregressive decoding and conditional independence assumptions. However, they often struggle to model token sequence relationships accurately due to its underlying assumptions, leading to lower recognition performance compared to attention-based encoder–decoder (AED) and transducer. This issue becomes particularly pronounced when the training data is limited or model size is small, leading to frequent spelling errors and reduced overall accuracy. In this study, we propose a new distillation approach named “Compress, Align, and Transfer” (COMAT) aimed at enhancing CTC-based ASR systems. COMAT addresses these challenges by integrating knowledge from pre-trained language models (PLMs) into CTC-based ASR systems. Our method involves a compressing module that adjusts speech embeddings to condense with the length of PLM embeddings, enabling a more effective and direct knowledge transfer and a monotonic alignment search (MAS) to align for two different embeddings. COMAT not only preserves the rapid decoding benefits of CTC-based models but also significantly enhances their ability to model complex tokens by linking the CTC-based models and the linguistic depth of PLMs. | - |
| dc.format.extent | 11 | - |
| dc.language | 영어 | - |
| dc.language.iso | ENG | - |
| dc.publisher | ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD | - |
| dc.title | Compress, Align, and Transfer: A new method for transferring pre-trained language models knowledge to CTC-based speech recognition | - |
| dc.type | Article | - |
| dc.publisher.location | 영국 | - |
| dc.identifier.doi | 10.1016/j.csl.2025.101900 | - |
| dc.identifier.scopusid | 2-s2.0-105022178232 | - |
| dc.identifier.wosid | 001608329900001 | - |
| dc.identifier.bibliographicCitation | COMPUTER SPEECH AND LANGUAGE, v.97, pp 1 - 11 | - |
| dc.citation.title | COMPUTER SPEECH AND LANGUAGE | - |
| dc.citation.volume | 97 | - |
| dc.citation.startPage | 1 | - |
| dc.citation.endPage | 11 | - |
| dc.type.docType | Article | - |
| dc.description.isOpenAccess | N | - |
| dc.description.journalRegisteredClass | scie | - |
| dc.description.journalRegisteredClass | scopus | - |
| dc.relation.journalResearchArea | Computer Science | - |
| dc.relation.journalWebOfScienceCategory | Computer Science, Artificial Intelligence | - |
| dc.subject.keywordPlus | Computational linguistics | - |
| dc.subject.keywordPlus | Decoding | - |
| dc.subject.keywordPlus | Embeddings | - |
| dc.subject.keywordPlus | Knowledge management | - |
| dc.subject.keywordPlus | Knowledge transfer | - |
| dc.subject.keywordPlus | Speech communication | - |
| dc.subject.keywordAuthor | Automatic speech recognition | - |
| dc.subject.keywordAuthor | Connectionist temporal classification | - |
| dc.subject.keywordAuthor | Knowledge transfer | - |
| dc.subject.keywordAuthor | Language models | - |
| dc.identifier.url | https://www.sciencedirect.com/science/article/pii/S0885230825001251?via%3Dihub | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366
COPYRIGHT © 2024 HANYANG UNIVERSITY.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.
