Implementation of Efficient Distributed Crawler through Stepwise Crawling Node Allocation

김현태; 변준형; 나요셉; 정유철

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Implementation of Efficient Distributed Crawler through Stepwise Crawling Node Allocation

Full metadata record

DC Field	Value	Language
dc.contributor.author	김현태	-
dc.contributor.author	변준형	-
dc.contributor.author	나요셉	-
dc.contributor.author	정유철	-
dc.date.available	2021-01-05T09:40:23Z	-
dc.date.created	2021-01-05	-
dc.date.issued	2020	-
dc.identifier.issn	2234-1072	-
dc.identifier.uri	https://scholarworks.bwise.kr/kumoh/handle/2020.sw.kumoh/18544	-
dc.description.abstract	Various websites have been created due to the increased use of the Internet, and the number of documents distributed through these websites has increased proportionally. However, it is not easy to collect newly updated documents rapidly. Web crawling methods have been used to continuously collect and manage new documents, whereas existing crawling systems applying a single node demonstrate limited performances. Furthermore, crawlers applying distribution methods exhibit a problem related to effective node management for crawling. This study proposes an efficient distributed crawler through stepwise crawling node allocation, which identifies websites' properties and establishes crawling policies based on the properties identified to collect a large number of documents from multiple websites. The proposed crawler can calculate the number of documents included in a website, compare data collection time and the amount of data collected based on the number of nodes allocated to a specific website by repeatedly visiting the website, and automatically allocate the optimal number of nodes to each website for crawling. An experiment is conducted where the proposed and single-node methods are applied to 12 different websites; the experimental result indicates that the proposed crawler's data collection time decreased significantly compared with that of a single node crawler. This result is obtained because the proposed crawler applied data collection policies according to websites. Besides, it is confirmed that the work rate of the proposed model increased.	-
dc.language	영어	-
dc.language.iso	en	-
dc.publisher	한국정보기술학회	-
dc.title	Implementation of Efficient Distributed Crawler through Stepwise Crawling Node Allocation	-
dc.title.alternative	Implementation of Efficient Distributed Crawler through Stepwise Crawling Node Allocation	-
dc.type	Article	-
dc.contributor.affiliatedAuthor	김현태	-
dc.contributor.affiliatedAuthor	변준형	-
dc.contributor.affiliatedAuthor	나요셉	-
dc.contributor.affiliatedAuthor	정유철	-
dc.identifier.bibliographicCitation	한국정보기술학회 영문논문지, v.10, no.2, pp.15 - 31	-
dc.relation.isPartOf	한국정보기술학회 영문논문지	-
dc.citation.title	한국정보기술학회 영문논문지	-
dc.citation.volume	10	-
dc.citation.number	2	-
dc.citation.startPage	15	-
dc.citation.endPage	31	-
dc.type.rims	ART	-
dc.identifier.kciid	ART002666817	-
dc.description.journalClass	2	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	kci	-
dc.description.journalRegisteredClass	other	-
dc.subject.keywordAuthor	Web crawling	-
dc.subject.keywordAuthor	docker wwarm	-
dc.subject.keywordAuthor	virtual nodes	-
dc.subject.keywordAuthor	documents	-
dc.subject.keywordAuthor	scrapy	-
dc.subject.keywordAuthor	efficiency	-

Files in This Item: There are no files associated with this item.

Appears in Collections: Department of Computer Engineering > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher JUNG, YU CHUL photo

JUNG, YU CHUL: College of Engineering (Department of Computer Engineering)

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :1,703,947; Today View :9

RSS_1.0 RSS_2.0 ATOM_1.0

350-27, Gumi-daero, Gumi-si, Gyeongsangbuk-do, Republic of Korea (39253)054-478-7170

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE