Cited 0 time in
A semantic similarity measure in document databases: An Earth mover's distance-based approach
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Jang, Min-Hee | - |
| dc.contributor.author | Eom, Tae-Hwan | - |
| dc.contributor.author | Kim, Sang-Wook | - |
| dc.contributor.author | Hwang, Young-Sup | - |
| dc.date.accessioned | 2022-07-16T07:54:56Z | - |
| dc.date.available | 2022-07-16T07:54:56Z | - |
| dc.date.created | 2021-05-13 | - |
| dc.date.issued | 2013-10 | - |
| dc.identifier.issn | 0000-0000 | - |
| dc.identifier.uri | https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/161776 | - |
| dc.description.abstract | Measuring document similarity is important in order to find documents which are similar to a given query document from a user. Text-based document similarity is measured by comparing the words in two documents. The representative text-based document similarity is the cosine similarity. Since the cosine similarity computes document similarity by estimating the frequency of common words, it cannot reflect word similarity. To solve this problem, we propose a new document similarity measure based on the earth mover's distance (EMD). The EMD is one of the most popular distance functions used to search similar multimedia contents and is known to provide good search results. To apply the EMD to compute document similarity, we have to solve two problems: (1) The EMD is too time consuming to be used in a document database, (2) the distance between words should be defined. Our proposed approach first extracts topics as new features of a document by applying the latent Dirichlet allocation, which is a generative model of a document. It can decrease the computational cost of the EMD because the number of topics is much smaller than the number of words in a document. After extracting the topics, the proposed approach calculates the distance between topics based on the relation between the topics and the words in a document database, thereby making computing document similarity based on the EMD possible. Our approach searches documents more accurately since we can consider the semantic similarity by using the EMD. Experimental results on a real-world document database indicate that the proposed approach outperforms the cosine similarity in terms of the accuracy and the performance. ? 2013 ACM. | - |
| dc.language | 영어 | - |
| dc.language.iso | en | - |
| dc.publisher | Association for Computing Machinary, Inc. | - |
| dc.title | A semantic similarity measure in document databases: An Earth mover's distance-based approach | - |
| dc.type | Article | - |
| dc.contributor.affiliatedAuthor | Kim, Sang-Wook | - |
| dc.identifier.doi | 10.1145/2513228.2513245 | - |
| dc.identifier.scopusid | 2-s2.0-84891444867 | - |
| dc.identifier.bibliographicCitation | Proceedings of the 2013 Research in Adaptive and Convergent Systems, RACS 2013, pp.94 - 99 | - |
| dc.relation.isPartOf | Proceedings of the 2013 Research in Adaptive and Convergent Systems, RACS 2013 | - |
| dc.citation.title | Proceedings of the 2013 Research in Adaptive and Convergent Systems, RACS 2013 | - |
| dc.citation.startPage | 94 | - |
| dc.citation.endPage | 99 | - |
| dc.type.rims | ART | - |
| dc.type.docType | Conference Paper | - |
| dc.description.journalClass | 1 | - |
| dc.description.isOpenAccess | N | - |
| dc.description.journalRegisteredClass | scopus | - |
| dc.subject.keywordPlus | Computational costs | - |
| dc.subject.keywordPlus | Document search | - |
| dc.subject.keywordPlus | Earth Mover&apos | - |
| dc.subject.keywordPlus | s distance | - |
| dc.subject.keywordPlus | Latent Dirichlet allocation | - |
| dc.subject.keywordPlus | Multimedia contents | - |
| dc.subject.keywordPlus | Semantic similarity | - |
| dc.subject.keywordPlus | Semantic similarity measures | - |
| dc.subject.keywordPlus | Text-based documents | - |
| dc.subject.keywordPlus | Query processing | - |
| dc.subject.keywordPlus | Statistics | - |
| dc.subject.keywordPlus | Database systems | - |
| dc.subject.keywordAuthor | document search | - |
| dc.subject.keywordAuthor | Earth Mover&apos | - |
| dc.subject.keywordAuthor | s distance | - |
| dc.subject.keywordAuthor | latent dirichlet allocation | - |
| dc.identifier.url | https://dl.acm.org/doi/10.1145/2513228.2513245 | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366
COPYRIGHT © 2024 HANYANG UNIVERSITY.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.
