LAME: Layout-Aware Metadata Extraction Approach for Research Articles
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Choi, Jongyun | - |
dc.contributor.author | Kong, Hyesoo | - |
dc.contributor.author | Yoon, Hwamook | - |
dc.contributor.author | Oh, Heungseon | - |
dc.contributor.author | Jung, Yuchul | - |
dc.date.accessioned | 2022-05-17T02:05:46Z | - |
dc.date.available | 2022-05-17T02:05:46Z | - |
dc.date.created | 2022-05-17 | - |
dc.date.issued | 2022-03 | - |
dc.identifier.issn | 1546-2218 | - |
dc.identifier.uri | https://scholarworks.bwise.kr/kumoh/handle/2020.sw.kumoh/21096 | - |
dc.description.abstract | The volume of academic literature, such as academic conference papers and journals, has increased rapidly worldwide, and research on metadata extraction is ongoing. However, high-performing metadata extraction is still challenging due to diverse layout formats according to journal publishers. To accommodate the diversity of the layouts of academic journals, we propose a novel LAyout-aware Metadata Extraction (LAME) framework equipped with the three characteristics (e.g., design of automatic layout analysis, construction of a large meta-data training set, and implementation of metadata extractor). In the framework, we designed an automatic layout analysis using PDFMiner. Based on the layout analysis, a large volume of metadata-separated training data, including the title, abstract, author name, author affiliated organization, and keywords, were automatically extracted. Moreover, we constructed a pre-trained model, Layout-MetaBERT, to extract the metadata from academic journals with varying layout formats. The experimental results with our metadata extractor exhibited robust performance (Macro-F1, 93.27%) in metadata extraction for unseen journals with different layout formats. | - |
dc.language | 영어 | - |
dc.language.iso | en | - |
dc.publisher | TECH SCIENCE PRESS | - |
dc.title | LAME: Layout-Aware Metadata Extraction Approach for Research Articles | - |
dc.type | Article | - |
dc.contributor.affiliatedAuthor | Choi, Jongyun | - |
dc.contributor.affiliatedAuthor | Jung, Yuchul | - |
dc.identifier.doi | 10.32604/cmc.2022.025711 | - |
dc.identifier.wosid | 000779567700001 | - |
dc.identifier.bibliographicCitation | CMC-COMPUTERS MATERIALS & CONTINUA, v.72, no.2, pp.4019 - 4037 | - |
dc.relation.isPartOf | CMC-COMPUTERS MATERIALS & CONTINUA | - |
dc.citation.title | CMC-COMPUTERS MATERIALS & CONTINUA | - |
dc.citation.volume | 72 | - |
dc.citation.number | 2 | - |
dc.citation.startPage | 4019 | - |
dc.citation.endPage | 4037 | - |
dc.type.rims | ART | - |
dc.type.docType | Article | - |
dc.description.journalClass | 1 | - |
dc.description.isOpenAccess | Y | - |
dc.description.journalRegisteredClass | scie | - |
dc.description.journalRegisteredClass | scopus | - |
dc.relation.journalResearchArea | Computer Science | - |
dc.relation.journalResearchArea | Materials Science | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Information Systems | - |
dc.relation.journalWebOfScienceCategory | Materials Science, Multidisciplinary | - |
dc.subject.keywordAuthor | Automatic layout analysis | - |
dc.subject.keywordAuthor | layout-MetaBERT | - |
dc.subject.keywordAuthor | metadata extrac-tion | - |
dc.subject.keywordAuthor | research article | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
350-27, Gumi-daero, Gumi-si, Gyeongsangbuk-do, Republic of Korea (39253)054-478-7170
COPYRIGHT 2020 Kumoh University All Rights Reserved.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.