웹 검색 엔진을 위한 중복문서 검색 알고리즘 분석 및 비교

안수한; 채용석; 박희진

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

웹 검색 엔진을 위한 중복문서 검색 알고리즘 분석 및 비교Analysis of Algorithms for Detecting the Blog-duplicate documents for the Web search engines.

Other Titles: Analysis of Algorithms for Detecting the Blog-duplicate documents for the Web search engines.

Authors: 안수한; 채용석; 박희진

Issue Date: Nov-2011

Publisher: 한국정보과학회

Keywords: 웹검색엔진; 중복문서; 검색알고리즘

Citation: 한국정보과학회 가을 학술발표논문집(A), v.38, no.2, pp.341 - 344

Indexed: OTHER

Journal Title: 한국정보과학회 가을 학술발표논문집(A)

Volume: 38

Number: 2

Start Page: 341

End Page: 344

URI: https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/167111

ISSN: 2466-0825

Abstract: 블로그 환경에서는 대부분의 중복문서들이 부분 중복문서인 경우가 많으며, 앞서 실험해 본 결과 이러한 부분 중복문서들 중 대략 99%가 처음과 끝에서 조금 차이를 보이고 오직 1%만이 문서 중간에서 차이를 보였다. 이것을 찾아내는데 효율적인 알고리즘 중에 하나가 Central-match 알고리즘이다. 중복문서를 식별하는 것과 유사한 문제로는 원본문서 탐색문제(Origin detection problem)가 있다. 이 문제는 특정 문서가 들어오면, 이 문서는 주로 어느 문서에서 추출되었는지를 가리키는 문제이다. 이 문제에 대한 효율적인 알고리즘으로는 Hailstorm&BE 알고리즘이 있다. 우리는 원본문서 탐색문제에 사용되는 Hailstorm&BE 알고리즘이 블로그 환경에서 중복문서를 찾아내는 블로그 중복 식별문제도 효과적으로 해결할 수 있을 것으로 예상하였고, 이에 Hailstorm&BE 알고리즘을 중복문서를 식별할 수 있도록 변형, Central-match 알고리즘과 비교 분석하여 두 알고리즘 중 어느 알고리즘이 블로그 중복문서를 찾아내는데 더 적합한가를 실험해보았다. 그 결과, Hailstrom&BE 알고리즘이 원본문서 탐색문제뿐만 아니라 블로그 중복 식별문제에서도 효과적인 알고리즘이 될 수 있음을 보였다.

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles

Show full item record

qrcode

Related Researcher

Researcher Park, Hee jin photo

Park, Hee jin: COLLEGE OF ENGINEERING (SCHOOL OF COMPUTER SCIENCE)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE