Improving Domain-Specific ASR with LLM-Generated Contextual Descriptions

Suh,Jiwon; Na,Injae; Jung,Woohwan

doi:10.21437/Interspeech.2024-377

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Improving Domain-Specific ASR with LLM-Generated Contextual Descriptions

Full metadata record

DC Field	Value	Language
dc.contributor.author	Suh,Jiwon	-
dc.contributor.author	Na,Injae	-
dc.contributor.author	Jung,Woohwan	-
dc.date.accessioned	2024-12-27T02:00:17Z	-
dc.date.available	2024-12-27T02:00:17Z	-
dc.date.issued	2024-09	-
dc.identifier.issn	2308-457X	-
dc.identifier.uri	https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/121431	-
dc.description.abstract	End-to-end automatic speech recognition (E2E ASR) systems have significantly improved speech recognition through training on extensive datasets. Despite these advancements, they still struggle to accurately recognize domain specific words, such as proper nouns and technical terminologies. To address this problem, we propose a method to utilize the state-of-the-art Whisper without modifying its architecture, preserving its generalization performance while enabling it to leverage descriptions effectively. Moreover, we propose two additional training techniques to improve the domain specific ASR: decoder fine-tuning, and context perturbation. We also propose a method to use a Large Language Model (LLM) to generate descriptions with simple metadata, when descriptions are unavailable. Our experiments demonstrate that proposed methods notably enhance domain-specific ASR accuracy on real-life datasets, with LLMgenerated descriptions outperforming human-crafted ones in effectiveness	-
dc.format.extent	5	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	International Speech Communication Association	-
dc.title	Improving Domain-Specific ASR with LLM-Generated Contextual Descriptions	-
dc.type	Article	-
dc.publisher.location	미국	-
dc.identifier.doi	10.21437/Interspeech.2024-377	-
dc.identifier.scopusid	2-s2.0-85214825081	-
dc.identifier.wosid	001331850101080	-
dc.identifier.bibliographicCitation	Conference of the International Speech Communication Association, v.Interspeech 2024, pp 1255 - 1259	-
dc.citation.title	Conference of the International Speech Communication Association	-
dc.citation.volume	Interspeech 2024	-
dc.citation.startPage	1255	-
dc.citation.endPage	1259	-
dc.type.docType	Proceedings Paper	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	foreign	-
dc.relation.journalResearchArea	Computer Science	-
dc.relation.journalWebOfScienceCategory	Computer Science, Artificial Intelligence	-
dc.subject.keywordAuthor	automatic speech recognition	-
dc.subject.keywordAuthor	contextual biasing	-
dc.subject.keywordAuthor	large language model	-
dc.identifier.url	https://www.isca-archive.org/interspeech_2024/suh24_interspeech.html	-

Files in This Item: Go to Link

Appears in Collections: COLLEGE OF COMPUTING > DEPARTMENT OF ARTIFICIAL INTELLIGENCE > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Jung, Woohwan photo

Jung, Woohwan: ERICA 소프트웨어융합대학 (DEPARTMENT OF ARTIFICIAL INTELLIGENCE)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

55 Hanyangdeahak-ro, Sangnok-gu, Ansan, Gyeonggi-do, 15588, Korea+82-31-400-4269 sweetbrain@hanyang.ac.kr

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE