Detailed Information

Cited 6 time in webofscience Cited 11 time in scopus
Metadata Downloads

A rule-based method for table detection in website images

Full metadata record
DC Field Value Language
dc.contributor.authorKim J.-
dc.contributor.authorHwang H.-
dc.date.available2020-07-30T00:36:03Z-
dc.date.created2020-05-28-
dc.date.issued2020-04-
dc.identifier.issn2169-3536-
dc.identifier.urihttps://scholarworks.bwise.kr/gachon/handle/2020.sw.gachon/71664-
dc.description.abstractTable detection is an essential part of a document analysis because tables are among the most efficient methods for systematically summarizing information. Therefore, numerous studies on detecting tables not only from documents but also from websites have been conducted. Although, the number of websites has been growing explosively recently, most of these studies suffer from detecting tables which are image types rather than tagging due to the variability of size, contents, color, and shapes. In this paper, we propose an efficient yet robust method for detecting tables in image formats, which can apply to both documents and websites. Instead of employing recently developed deep learning methods, which require extensive training for diversity, we apply a rule-based detection method by using key features of many tables, namely, the grid format of the text provided in the tables. The proposed method consists of two stages: a feature extraction stage and a grid pattern recognition stage. In the first stage, we extract the features of the contents in the tables. We then remove the features of non-text objects and texts not included in tables. In the second stage, we build tree structures from the features and apply a novel algorithm for determining the grid pattern. When we applied our method to a website dataset, the experimental results showed a precision, recall, and F1-measure of 84.5%, 72%, and 0.778, which are improvements of 3.6%, 24.16%, and 0.276 over a previous method, respectively, while also achieving the fastest processing time. In addition, the proposed rule-based method allows the structure of the contents in the table to be easily restored. © 2013 IEEE.-
dc.language영어-
dc.language.isoen-
dc.publisherInstitute of Electrical and Electronics Engineers Inc.-
dc.relation.isPartOfIEEE Access-
dc.titleA rule-based method for table detection in website images-
dc.typeArticle-
dc.type.rimsART-
dc.description.journalClass1-
dc.identifier.wosid000549479700018-
dc.identifier.doi10.1109/ACCESS.2020.2990901-
dc.identifier.bibliographicCitationIEEE Access, v.8, pp.81022 - 81033-
dc.description.isOpenAccessN-
dc.identifier.scopusid2-s2.0-85084953091-
dc.citation.endPage81033-
dc.citation.startPage81022-
dc.citation.titleIEEE Access-
dc.citation.volume8-
dc.contributor.affiliatedAuthorKim J.-
dc.contributor.affiliatedAuthorHwang H.-
dc.type.docTypeArticle-
dc.subject.keywordAuthordocument analysis-
dc.subject.keywordAuthorTable~detection-
dc.subject.keywordAuthorWeb information extraction-
dc.subject.keywordPlusDeep learning-
dc.subject.keywordPlusLearning systems-
dc.subject.keywordPlusTrees (mathematics)-
dc.subject.keywordPlusWebsites-
dc.subject.keywordPlusDocument analysis-
dc.subject.keywordPlusLearning methods-
dc.subject.keywordPlusNovel algorithm-
dc.subject.keywordPlusProcessing time-
dc.subject.keywordPlusRule based detection-
dc.subject.keywordPlusRule-based method-
dc.subject.keywordPlusTable detection-
dc.subject.keywordPlusTree structures-
dc.subject.keywordPlusFeature extraction-
dc.description.journalRegisteredClassscie-
dc.description.journalRegisteredClassscopus-
Files in This Item
There are no files associated with this item.
Appears in
Collections
ETC > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetrics

Total Views & Downloads

BROWSE