Dirichlet Process Mixture Model for Document Clustering with Feature Partition
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Huang, Ruizhang | - |
dc.contributor.author | Yu, Guan | - |
dc.contributor.author | Wang, Zhaojun | - |
dc.contributor.author | Zhang, Jun | - |
dc.contributor.author | Shi, Liangxing | - |
dc.date.accessioned | 2023-12-08T09:32:15Z | - |
dc.date.available | 2023-12-08T09:32:15Z | - |
dc.date.issued | 2013-08 | - |
dc.identifier.issn | 1041-4347 | - |
dc.identifier.issn | 1558-2191 | - |
dc.identifier.uri | https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/115853 | - |
dc.description.abstract | Finding the appropriate number of clusters to which documents should be partitioned is crucial in document clustering. In this paper, we propose a novel approach, namely DPMFP, to discover the latent cluster structure based on the DPM model without requiring the number of clusters as input. Document features are automatically partitioned into two groups, in particular, discriminative words and nondiscriminative words, and contribute differently to document clustering. A variational inference algorithm is investigated to infer the document collection structure as well as the partition of document words at the same time. Our experiments indicate that our proposed approach performs well on the synthetic data set as well as real data sets. The comparison between our approach and state-of-the-art document clustering approaches shows that our approach is robust and effective for document clustering. | - |
dc.format.extent | 12 | - |
dc.language | 영어 | - |
dc.language.iso | ENG | - |
dc.publisher | Institute of Electrical and Electronics Engineers | - |
dc.title | Dirichlet Process Mixture Model for Document Clustering with Feature Partition | - |
dc.type | Article | - |
dc.publisher.location | 미국 | - |
dc.identifier.doi | 10.1109/TKDE.2012.27 | - |
dc.identifier.scopusid | 2-s2.0-84897584095 | - |
dc.identifier.wosid | 000321261000006 | - |
dc.identifier.bibliographicCitation | IEEE Transactions on Knowledge and Data Engineering, v.25, no.8, pp 1748 - 1759 | - |
dc.citation.title | IEEE Transactions on Knowledge and Data Engineering | - |
dc.citation.volume | 25 | - |
dc.citation.number | 8 | - |
dc.citation.startPage | 1748 | - |
dc.citation.endPage | 1759 | - |
dc.type.docType | Article | - |
dc.description.isOpenAccess | N | - |
dc.description.journalRegisteredClass | sci | - |
dc.description.journalRegisteredClass | scie | - |
dc.description.journalRegisteredClass | scopus | - |
dc.relation.journalResearchArea | Computer Science | - |
dc.relation.journalResearchArea | Engineering | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Artificial Intelligence | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Information Systems | - |
dc.relation.journalWebOfScienceCategory | Engineering, Electrical & Electronic | - |
dc.subject.keywordPlus | CLASSIFICATION | - |
dc.subject.keywordPlus | INFERENCE | - |
dc.subject.keywordPlus | SELECTION | - |
dc.subject.keywordAuthor | Database management | - |
dc.subject.keywordAuthor | database applications-text mining | - |
dc.subject.keywordAuthor | pattern recognition | - |
dc.subject.keywordAuthor | clustering document clustering | - |
dc.subject.keywordAuthor | Dirichlet process mixture model | - |
dc.subject.keywordAuthor | feature partition | - |
dc.identifier.url | https://ieeexplore.ieee.org/document/6152106 | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
55 Hanyangdeahak-ro, Sangnok-gu, Ansan, Gyeonggi-do, 15588, Korea+82-31-400-4269 sweetbrain@hanyang.ac.kr
COPYRIGHT © 2021 HANYANG UNIVERSITY. ALL RIGHTS RESERVED.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.