Classifying web pages using information extraction patterns - Preliminary results and findings
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Soon, L.-K. | - |
dc.contributor.author | Lee, S.H. | - |
dc.date.available | 2019-04-10T10:59:48Z | - |
dc.date.created | 2018-04-17 | - |
dc.date.issued | 2010 | - |
dc.identifier.isbn | 9780769543192 | - |
dc.identifier.uri | http://scholarworks.bwise.kr/ssu/handle/2018.sw.ssu/33335 | - |
dc.description.abstract | Web page classification plays an essential role in facilitating more efficient information retrieval and information processing. Conventionally, web text documents are represented by term frequency matrix for classification purpose. However, considering the limitations of representing documents using terms or keywords, we propose to represent web pages using information extraction patterns that are identified within the pages respectively. In this paper, we present the results as well as the findings obtained from our preliminary experiments. Our experimental results indicate that the existence of a word in different contexts has different impact to the classification task. Thus, the extraction patterns used to represent each document are more semantically meaningful and give better insight to web classification in comparison with keywords. © 2010 IEEE. | - |
dc.relation.isPartOf | Proceedings of the 6th International Conference on Signal Image Technology and Internet Based Systems, SITIS 2010 | - |
dc.title | Classifying web pages using information extraction patterns - Preliminary results and findings | - |
dc.type | Conference | - |
dc.identifier.doi | 10.1109/SITIS.2010.42 | - |
dc.type.rims | CONF | - |
dc.identifier.bibliographicCitation | 6th International Conference on Signal Image Technology and Internet Based Systems, SITIS 2010, pp.195 - 202 | - |
dc.description.journalClass | 2 | - |
dc.identifier.scopusid | 2-s2.0-79952541239 | - |
dc.citation.conferenceDate | 2010-12-15 | - |
dc.citation.conferencePlace | Kuala Lumpur | - |
dc.citation.endPage | 202 | - |
dc.citation.startPage | 195 | - |
dc.citation.title | 6th International Conference on Signal Image Technology and Internet Based Systems, SITIS 2010 | - |
dc.contributor.affiliatedAuthor | Lee, S.H. | - |
dc.type.docType | Conference Paper | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
Soongsil University Library 369 Sangdo-Ro, Dongjak-Gu, Seoul, Korea (06978)02-820-0733
COPYRIGHT ⓒ SOONGSIL UNIVERSITY, ALL RIGHTS RESERVED.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.