Improving Domain-Specific ASR with LLM-Generated Contextual Descriptions
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Suh,Jiwon | - |
dc.contributor.author | Na,Injae | - |
dc.contributor.author | Jung,Woohwan | - |
dc.date.accessioned | 2024-12-27T02:00:17Z | - |
dc.date.available | 2024-12-27T02:00:17Z | - |
dc.date.issued | 2024-09 | - |
dc.identifier.issn | 2308-457X | - |
dc.identifier.uri | https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/121431 | - |
dc.description.abstract | End-to-end automatic speech recognition (E2E ASR) systems have significantly improved speech recognition through training on extensive datasets. Despite these advancements, they still struggle to accurately recognize domain specific words, such as proper nouns and technical terminologies. To address this problem, we propose a method to utilize the state-of-the-art Whisper without modifying its architecture, preserving its generalization performance while enabling it to leverage descriptions effectively. Moreover, we propose two additional training techniques to improve the domain specific ASR: decoder fine-tuning, and context perturbation. We also propose a method to use a Large Language Model (LLM) to generate descriptions with simple metadata, when descriptions are unavailable. Our experiments demonstrate that proposed methods notably enhance domain-specific ASR accuracy on real-life datasets, with LLMgenerated descriptions outperforming human-crafted ones in effectiveness | - |
dc.format.extent | 5 | - |
dc.language | 영어 | - |
dc.language.iso | ENG | - |
dc.publisher | International Speech Communication Association | - |
dc.title | Improving Domain-Specific ASR with LLM-Generated Contextual Descriptions | - |
dc.type | Article | - |
dc.publisher.location | 미국 | - |
dc.identifier.doi | 10.21437/Interspeech.2024-377 | - |
dc.identifier.scopusid | 2-s2.0-85214825081 | - |
dc.identifier.wosid | 001331850101080 | - |
dc.identifier.bibliographicCitation | Conference of the International Speech Communication Association, v.Interspeech 2024, pp 1255 - 1259 | - |
dc.citation.title | Conference of the International Speech Communication Association | - |
dc.citation.volume | Interspeech 2024 | - |
dc.citation.startPage | 1255 | - |
dc.citation.endPage | 1259 | - |
dc.type.docType | Proceedings Paper | - |
dc.description.isOpenAccess | N | - |
dc.description.journalRegisteredClass | foreign | - |
dc.relation.journalResearchArea | Computer Science | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Artificial Intelligence | - |
dc.subject.keywordAuthor | automatic speech recognition | - |
dc.subject.keywordAuthor | contextual biasing | - |
dc.subject.keywordAuthor | large language model | - |
dc.identifier.url | https://www.isca-archive.org/interspeech_2024/suh24_interspeech.html | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
55 Hanyangdeahak-ro, Sangnok-gu, Ansan, Gyeonggi-do, 15588, Korea+82-31-400-4269 sweetbrain@hanyang.ac.kr
COPYRIGHT © 2021 HANYANG UNIVERSITY. ALL RIGHTS RESERVED.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.