Exploring the Impact of Corpus Diversity on Financial Pretrained Language Models

Choe, Jaeyoung; Noh,  Keonwoong; Kim,  Nayeon; Ahn, Seyun; Jung, Woohwan

doi:10.18653/v1/2023.findings-emnlp.138

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Exploring the Impact of Corpus Diversity on Financial Pretrained Language Models

Full metadata record

DC Field	Value	Language
dc.contributor.author	Choe, Jaeyoung	-
dc.contributor.author	Noh, Keonwoong	-
dc.contributor.author	Kim, Nayeon	-
dc.contributor.author	Ahn, Seyun	-
dc.contributor.author	Jung, Woohwan	-
dc.date.accessioned	2024-01-20T09:03:29Z	-
dc.date.available	2024-01-20T09:03:29Z	-
dc.date.issued	2023-12	-
dc.identifier.uri	https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/117866	-
dc.description.abstract	Over the past few years, various domain-specific pretrained language models (PLMs) have been proposed and have outperformed general-domain PLMs in specialized areas such as biomedical, scientific, and clinical domains. In addition, financial PLMs have been studied because of the high economic impact of financial data analysis. However, we found that financial PLMs were not pretrained on sufficiently diverse financial data. This lack of diverse training data leads to a subpar generalization performance, resulting in general-purpose PLMs, including BERT, often outperforming financial PLMs on many downstream tasks. To address this issue, we collected a broad range of financial corpus and trained the Financial Language Model (FiLM) on these diverse datasets. Our experimental results confirm that FiLM outperforms not only existing financial PLMs but also general domain PLMs. Furthermore, we provide empirical evidence that this improvement can be achieved even for unseen corpus groups.	-
dc.format.extent	12	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	Association for Computational Linguistics	-
dc.title	Exploring the Impact of Corpus Diversity on Financial Pretrained Language Models	-
dc.type	Article	-
dc.publisher.location	싱가폴	-
dc.identifier.doi	10.18653/v1/2023.findings-emnlp.138	-
dc.identifier.bibliographicCitation	Findings of the Association for Computational Linguistics: EMNLP 2023, pp 2101 - 2112	-
dc.citation.title	Findings of the Association for Computational Linguistics: EMNLP 2023	-
dc.citation.startPage	2101	-
dc.citation.endPage	2112	-
dc.type.docType	Proceeding	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	foreign	-
dc.identifier.url	https://aclanthology.org/2023.findings-emnlp.138/	-

Files in This Item: Go to Link

Appears in Collections: COLLEGE OF COMPUTING > DEPARTMENT OF ARTIFICIAL INTELLIGENCE > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Jung, Woohwan photo

Jung, Woohwan: ERICA 소프트웨어융합대학 (DEPARTMENT OF ARTIFICIAL INTELLIGENCE)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

55 Hanyangdeahak-ro, Sangnok-gu, Ansan, Gyeonggi-do, 15588, Korea+82-31-400-4269 sweetbrain@hanyang.ac.kr

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE