Detailed Information

Cited 0 time in webofscience Cited 3 time in scopus
Metadata Downloads

Main content extraction from web documents using text block context

Full metadata record
DC Field Value Language
dc.contributor.authorKim, M.-
dc.contributor.authorKim, Y.-
dc.contributor.authorSong, W.-
dc.contributor.authorKhil, A.-
dc.date.available2019-04-10T10:23:32Z-
dc.date.created2018-04-17-
dc.date.issued2013-
dc.identifier.isbn9783642401725-
dc.identifier.issn0302-9743-
dc.identifier.urihttp://scholarworks.bwise.kr/ssu/handle/2018.sw.ssu/32881-
dc.description.abstractDue to various Web authoring tools, the new web standards, and improved web accessibility, a wide variety of Web contents are being produced very quickly. In such an environment, in order to provide appropriate Web services to users' needs it is important to quickly and accurately extract relevant information from Web documents and remove irrelevant contents such as advertisements. In this paper, we propose a method that extracts main content accurately from HTML Web documents. In the method, a decision tree is built and used to classify each block of text whether it is a part of the main content. For classification we use contextual features around text blocks including word density, link density, HTML tag distribution, and distances between text blocks. We experimented with our method using a published data set and a data set that we collected. The experiment results show that our method performs 19% better in F-measure compared to the existing best performing method. © 2013 Springer-Verlag.-
dc.relation.isPartOfLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)-
dc.titleMain content extraction from web documents using text block context-
dc.typeConference-
dc.identifier.doi10.1007/978-3-642-40173-2_10-
dc.type.rimsCONF-
dc.identifier.bibliographicCitation24th International Conference on Database and Expert Systems Applications, DEXA 2013, v.8056 LNCS, no.PART 2, pp.81 - 93-
dc.description.journalClass2-
dc.identifier.scopusid2-s2.0-84884405875-
dc.citation.conferenceDate2013-08-26-
dc.citation.conferencePlacePrague-
dc.citation.endPage93-
dc.citation.numberPART 2-
dc.citation.startPage81-
dc.citation.title24th International Conference on Database and Expert Systems Applications, DEXA 2013-
dc.citation.volume8056 LNCS-
dc.contributor.affiliatedAuthorKim, M.-
dc.contributor.affiliatedAuthorKhil, A.-
dc.type.docTypeConference Paper-
Files in This Item
There are no files associated with this item.
Appears in
Collections
College of Information Technology > School of Computer Science and Engineering > 2. Conference Papers

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Khil, A Ra photo

Khil, A Ra
College of Information Technology (School of Computer Science and Engineering)
Read more

Altmetrics

Total Views & Downloads

BROWSE