Main content extraction from web documents using text block context

Kim, M.; Kim, Y.; Song, W.; Khil, A.

doi:10.1007/978-3-642-40173-2_10

Detailed Information

Cited 0 time in webofscience

Cited 3 time in scopus

Metadata Downloads

Main content extraction from web documents using text block context

Full metadata record

DC Field	Value	Language
dc.contributor.author	Kim, M.	-
dc.contributor.author	Kim, Y.	-
dc.contributor.author	Song, W.	-
dc.contributor.author	Khil, A.	-
dc.date.available	2019-04-10T10:23:32Z	-
dc.date.created	2018-04-17	-
dc.date.issued	2013	-
dc.identifier.isbn	9783642401725	-
dc.identifier.issn	0302-9743	-
dc.identifier.uri	http://scholarworks.bwise.kr/ssu/handle/2018.sw.ssu/32881	-
dc.description.abstract	Due to various Web authoring tools, the new web standards, and improved web accessibility, a wide variety of Web contents are being produced very quickly. In such an environment, in order to provide appropriate Web services to users' needs it is important to quickly and accurately extract relevant information from Web documents and remove irrelevant contents such as advertisements. In this paper, we propose a method that extracts main content accurately from HTML Web documents. In the method, a decision tree is built and used to classify each block of text whether it is a part of the main content. For classification we use contextual features around text blocks including word density, link density, HTML tag distribution, and distances between text blocks. We experimented with our method using a published data set and a data set that we collected. The experiment results show that our method performs 19% better in F-measure compared to the existing best performing method. © 2013 Springer-Verlag.	-
dc.relation.isPartOf	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)	-
dc.title	Main content extraction from web documents using text block context	-
dc.type	Conference	-
dc.identifier.doi	10.1007/978-3-642-40173-2_10	-
dc.type.rims	CONF	-
dc.identifier.bibliographicCitation	24th International Conference on Database and Expert Systems Applications, DEXA 2013, v.8056 LNCS, no.PART 2, pp.81 - 93	-
dc.description.journalClass	2	-
dc.identifier.scopusid	2-s2.0-84884405875	-
dc.citation.conferenceDate	2013-08-26	-
dc.citation.conferencePlace	Prague	-
dc.citation.endPage	93	-
dc.citation.number	PART 2	-
dc.citation.startPage	81	-
dc.citation.title	24th International Conference on Database and Expert Systems Applications, DEXA 2013	-
dc.citation.volume	8056 LNCS	-
dc.contributor.affiliatedAuthor	Kim, M.	-
dc.contributor.affiliatedAuthor	Khil, A.	-
dc.type.docType	Conference Paper	-

Files in This Item: There are no files associated with this item.

Appears in Collections: College of Information Technology > School of Computer Science and Engineering > 2. Conference Papers

Show simple item record

qrcode

Related Researcher

Researcher Khil, A Ra photo

Khil, A Ra: College of Information Technology (School of Computer Science and Engineering)

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :8,275,310; Today View :18,577

RSS_1.0 RSS_2.0 ATOM_1.0

Soongsil University Library 369 Sangdo-Ro, Dongjak-Gu, Seoul, Korea (06978)02-820-0733

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE