Compress, Align, and Transfer: A new method for transferring pre-trained language models knowledge to CTC-based speech recognition

Choi, Jieun; Kim, Dohee; Chang, Joon-Hyuk

doi:10.1016/j.csl.2025.101900

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Compress, Align, and Transfer: A new method for transferring pre-trained language models knowledge to CTC-based speech recognition

Full metadata record

DC Field	Value	Language
dc.contributor.author	Choi, Jieun	-
dc.contributor.author	Kim, Dohee	-
dc.contributor.author	Chang, Joon-Hyuk	-
dc.date.accessioned	2026-02-11T02:00:30Z	-
dc.date.available	2026-02-11T02:00:30Z	-
dc.date.issued	2026-03	-
dc.identifier.issn	0885-2308	-
dc.identifier.issn	1095-8363	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/210759	-
dc.description.abstract	Connectionist temporal classification (CTC) model is a leading approach for end-to-end (E2E) automatic speech recognition (ASR), known for its simplicity and fast speed, enabled by non-autoregressive decoding and conditional independence assumptions. However, they often struggle to model token sequence relationships accurately due to its underlying assumptions, leading to lower recognition performance compared to attention-based encoder–decoder (AED) and transducer. This issue becomes particularly pronounced when the training data is limited or model size is small, leading to frequent spelling errors and reduced overall accuracy. In this study, we propose a new distillation approach named “Compress, Align, and Transfer” (COMAT) aimed at enhancing CTC-based ASR systems. COMAT addresses these challenges by integrating knowledge from pre-trained language models (PLMs) into CTC-based ASR systems. Our method involves a compressing module that adjusts speech embeddings to condense with the length of PLM embeddings, enabling a more effective and direct knowledge transfer and a monotonic alignment search (MAS) to align for two different embeddings. COMAT not only preserves the rapid decoding benefits of CTC-based models but also significantly enhances their ability to model complex tokens by linking the CTC-based models and the linguistic depth of PLMs.	-
dc.format.extent	11	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD	-
dc.title	Compress, Align, and Transfer: A new method for transferring pre-trained language models knowledge to CTC-based speech recognition	-
dc.type	Article	-
dc.publisher.location	영국	-
dc.identifier.doi	10.1016/j.csl.2025.101900	-
dc.identifier.scopusid	2-s2.0-105022178232	-
dc.identifier.wosid	001608329900001	-
dc.identifier.bibliographicCitation	COMPUTER SPEECH AND LANGUAGE, v.97, pp 1 - 11	-
dc.citation.title	COMPUTER SPEECH AND LANGUAGE	-
dc.citation.volume	97	-
dc.citation.startPage	1	-
dc.citation.endPage	11	-
dc.type.docType	Article	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Computer Science	-
dc.relation.journalWebOfScienceCategory	Computer Science, Artificial Intelligence	-
dc.subject.keywordPlus	Computational linguistics	-
dc.subject.keywordPlus	Decoding	-
dc.subject.keywordPlus	Embeddings	-
dc.subject.keywordPlus	Knowledge management	-
dc.subject.keywordPlus	Knowledge transfer	-
dc.subject.keywordPlus	Speech communication	-
dc.subject.keywordAuthor	Automatic speech recognition	-
dc.subject.keywordAuthor	Connectionist temporal classification	-
dc.subject.keywordAuthor	Knowledge transfer	-
dc.subject.keywordAuthor	Language models	-
dc.identifier.url	https://www.sciencedirect.com/science/article/pii/S0885230825001251?via%3Dihub	-

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk: COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE