앙상블 알고리즘과 BERT를 이용한 연구논문 주제영역 분류Topic Classification of Research Paper Using Ensemble Algorithms and BERT
- Other Titles
- Topic Classification of Research Paper Using Ensemble Algorithms and BERT
- Authors
- 김성현; 김영민
- Issue Date
- Mar-2024
- Publisher
- 한국경영공학회
- Keywords
- Research Paper; Classification; Machine Learning; Ensemble Algorithms; BERT
- Citation
- 한국경영공학회지, v.29, no.1, pp 19 - 33
- Pages
- 15
- Indexed
- KCI
- Journal Title
- 한국경영공학회지
- Volume
- 29
- Number
- 1
- Start Page
- 19
- End Page
- 33
- URI
- https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/196933
- ISSN
- 2005-7776
2713-573X
- Abstract
- Purpose Developing and comparing a model to classify the topic of research paper using abstract text.
Methods Abstract data from 120,000 papers on arXiv was collected, and classification models were developed using ensemble algorithms and BERT. For feature extraction in the ensemble algorithm, TF-IDF, LDA, and Doc2Vec methods were used to create seven feature sets. A total of 22 models were developed using various feature sets and algorithms, and their performance was compared.
Results The BERT model exhibited the highest performance with an accuracy of 0.848 and an f1-score of 0.808. Among the ensemble algorithms, LightGBM performed exceptionally well, and the direct reflection of word importance through the TF-IDF vectorization method proved to be effective.
Conclusion Developing a model that automatically classifies paper topics by analyzing text offers researchers the opportunity to swiftly access the latest information and identify their research interests. This enhances accessibility to information in research fields and presents the possibility for researchers across diverse domains to gain new insights.
- Files in This Item
-
Go to Link
- Appears in
Collections - 서울 산업융합학부 > 서울 산업융합학부 > 1. Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.