Detection Method for Distributed Web-Crawlers: A Long-Tail Threshold Model

Ro, Inwoo; Han, Joong Soo; Im, Eul Gyu

doi:10.1155/2018/9065424

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Detection Method for Distributed Web-Crawlers: A Long-Tail Threshold Modelopen access

Authors: Ro, Inwoo; Han, Joong Soo; Im, Eul Gyu

Issue Date: Oct-2018

Publisher: WILEY-HINDAWI

Citation: SECURITY AND COMMUNICATION NETWORKS, v.2018

Indexed: SCIE
SCOPUS

Journal Title: SECURITY AND COMMUNICATION NETWORKS

Volume: 2018

URI: https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/149220

DOI: 10.1155/2018/9065424

ISSN: 1939-0114

Abstract: This paper proposes an advanced countermeasure against distributed web-crawlers. We investigated other methods for crawler detection and analyzed how distributed crawlers can bypass these methods. Our method can detect distributed crawlers by focusing on the property that web traffic follows the power distribution. When we sort web pages by the number of requests, most of requests are concentrated on the most frequently requested web pages. In addition, there will be some web pages that normal users do not generally request. But crawlers will request for these web pages because their algorithms are intended to request iteratively by parsing web pages to collect every item the crawlers encounter. Therefore, we can assume that if some IP addresses are frequently used to request the web pages that are located in the long-tail area of a power distribution graph, those IP addresses can be classified as crawler nodes. The experimental results with NASA web traffic data showed that our method was effective in identifying distributed crawlers with 0.0275% false positives when a conventional frequency-based detection method shows 2.882% false positives with an equal access threshold.

Files in This Item

9065424.pdf 2.09 MB

Appears in Collections: 서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles

Show full item record

qrcode

Related Researcher

Researcher Im, Eul Gyu photo

Im, Eul Gyu: COLLEGE OF ENGINEERING (SCHOOL OF COMPUTER SCIENCE)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE