C-Affinity: A Novel Similarity Measure for Effective Data Clustering
- Authors
- Hong, Jiwon; Kim, Sang-Wook
- Issue Date
- Apr-2023
- Publisher
- Association for Computing Machinery, Inc
- Keywords
- clustering; clustering affinity; nearest neighbor graph; similarity measure
- Citation
- ACM Web Conference 2023 - Companion of the World Wide Web Conference, WWW 2023, pp 41 - 44
- Pages
- 4
- Indexed
- SCOPUS
- Journal Title
- ACM Web Conference 2023 - Companion of the World Wide Web Conference, WWW 2023
- Start Page
- 41
- End Page
- 44
- URI
- https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/185837
- DOI
- 10.1145/3543873.3587307
- Abstract
- Clustering is widely employed in various applications as it is one of the most useful data mining techniques. In performing clustering, a similarity measure, which defines how similar a pair of data objects are, plays an important role. A similarity measure is employed by considering a target dataset's characteristics. Current similarity measures (or distances) do not reflect the distribution of data objects in a dataset at all. From the clustering point of view, this fact may limit the clustering accuracy. In this paper, we propose c-affinity, a new notion of a similarity measure that reflects the distribution of objects in the given dataset from a clustering point of view. We design c-affinity between any two objects to have a higher value as they are more likely to belong to the same cluster by learning the data distribution. We use random walk with restart (RWR) on the k-nearest neighbor graph of the given dataset to measure (1) how similar a pair of objects are and (2) how densely other objects are distributed between them. Via extensive experiments on sixteen synthetic and real-world datasets, we verify that replacing the existing similarity measure with our c-affinity improves the clustering accuracy significantly.
- Files in This Item
-
- Appears in
Collections - 서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.