KNOWLEDGE DISTILLATION FROM LANGUAGE MODEL TO ACOUSTIC MODEL: A HIERARCHICAL MULTI-TASK LEARNING APPROACH
- Authors
- Lee, Mun-Hak; Chang, Joon-Hyuk
- Issue Date
- May-2022
- Publisher
- IEEE
- Keywords
- automatic speech recognition; knowledge distillation; multi-task learning; cross-modal distillation; language model; acoustic model
- Citation
- 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), pp.8392 - 8396
- Indexed
- SCOPUS
- Journal Title
- 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)
- Start Page
- 8392
- End Page
- 8396
- URI
- https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/182326
- DOI
- 10.1109/ICASSP43922.2022.9747082
- ISSN
- 0736-7791
- Abstract
- The remarkable performance of the pre-trained language model (LM) using self-supervised learning has led to a major paradigm shift in the study of natural language processing. In line with these changes, leveraging the performance of speech recognition systems with massive deep learning-based LMs is a major topic of speech recognition research. Among the various methods of applying LMs to speech recognition systems, in this paper, we focus on a cross-modal knowledge distillation method that transfers knowledge between two types of deep neural networks with different modalities. We propose an acoustic model structure with multiple auxiliary output layers for cross-modal distillation and demonstrate that the proposed method effectively compensates for the shortcomings of the existing label-interpolation-based distillation method. In addition, we extend the proposed method to a hierarchical distillation method using LMs trained in different units (senones, monophones, and subwords) and reveal the effectiveness of the hierarchical distillation method through an ablation study.
- Files in This Item
-
Go to Link
- Appears in
Collections - 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.