Layout Aware Semantic Element Extraction for Sustainable Science & Technology Decision Supportopen access
- Authors
- Kim, Hyuntae; Choi, Jongyun; Park, Soyoung; Jung, Yuchul
- Issue Date
- Mar-2022
- Publisher
- MDPI
- Keywords
- multi-modal; document layout analysis; metadata; document structure; document object; semantic elements; knowledge graph; transformer; decision support
- Citation
- SUSTAINABILITY, v.14, no.5
- Journal Title
- SUSTAINABILITY
- Volume
- 14
- Number
- 5
- URI
- https://scholarworks.bwise.kr/kumoh/handle/2020.sw.kumoh/21026
- DOI
- 10.3390/su14052802
- ISSN
- 2071-1050
- Abstract
- New scientific and technological (S&T) knowledge is being introduced rapidly, and hence, analysis efforts to understand and analyze new published S&T documents are increasing daily. Automated text mining and vision recognition techniques alleviate the burden somewhat, but the various document layout formats and knowledge content granularities across the S&T field make it challenging. Therefore, this paper proposes LA-SEE (LAME and Vi-SEE), a knowledge graph construction framework that simultaneously extracts meta-information and useful image objects from S&T documents in various layout formats. We adopt Layout-aware Metadata Extraction (LAME), which can accurately extract metadata from various layout formats, and implement a transformer-based instance segmentation (i.e., Vision based Semantic Elements Extraction (Vi-SEE)) to maximize the vision-based semantic element recognition. Moreover, to constructing a scientific knowledge graph consisting of multiple S&T documents, we newly defined an extensible Semantic Elements Knowledge Graph (SEKG) structure. For now, we succeeded in extracting about 6 million semantic elements from 49,649 PDFs. In addition, to illustrate the potential power of our SEKG, we provide two promising application scenarios, such as a scientific knowledge guide across multiple S&T documents and questions and answering over scientific tables.
- Files in This Item
-
- Appears in
Collections - ETC > 1. Journal Articles
![qrcode](https://api.qrserver.com/v1/create-qr-code/?size=55x55&data=https://scholarworks.bwise.kr/kumoh/handle/2020.sw.kumoh/21026)
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.