Incremental Semi-Supervised Clustering Ensemble for High Dimensional Data Clustering
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Yu, Zhiwen | - |
dc.contributor.author | Luo, Peinan | - |
dc.contributor.author | You, Jane | - |
dc.contributor.author | Wong, Hau-San | - |
dc.contributor.author | Leung, Hareton | - |
dc.contributor.author | Wu, Si | - |
dc.contributor.author | Zhang, Jun | - |
dc.contributor.author | Han, Guoqiang | - |
dc.date.accessioned | 2024-04-09T03:01:46Z | - |
dc.date.available | 2024-04-09T03:01:46Z | - |
dc.date.issued | 2016-03 | - |
dc.identifier.issn | 1041-4347 | - |
dc.identifier.issn | 1558-2191 | - |
dc.identifier.uri | https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/118558 | - |
dc.description.abstract | Traditional cluster ensemble approaches have three limitations: (1) They do not make use of prior knowledge of the datasets given by experts. (2) Most of the conventional cluster ensemble methods cannot obtain satisfactory results when handling high dimensional data. (3) All the ensemble members are considered, even the ones without positive contributions. In order to address the limitations of conventional cluster ensemble approaches, we first propose an incremental semi-supervised clustering ensemble framework (ISSCE) which makes use of the advantage of the random subspace technique, the constraint propagation approach, the proposed incremental ensemble member selection process, and the normalized cut algorithm to perform high dimensional data clustering. The random subspace technique is effective for handling high dimensional data, while the constraint propagation approach is useful for incorporating prior knowledge. The incremental ensemble member selection process is newly designed to judiciously remove redundant ensemble members based on a newly proposed local cost function and a global cost function, and the normalized cut algorithm is adopted to serve as the consensus function for providing more stable, robust, and accurate results. Then, a measure is proposed to quantify the similarity between two sets of attributes, and is used for computing the local cost function in ISSCE. Next, we analyze the time complexity of ISSCE theoretically. Finally, a set of nonparametric tests are adopted to compare multiple semi-supervised clustering ensemble approaches over different datasets. The experiments on 18 real-world datasets, which include six UCI datasets and 12 cancer gene expression profiles, confirm that ISSCE works well on datasets with very high dimensionality, and outperforms the state-of-the-art semi-supervised clustering ensemble approaches. | - |
dc.format.extent | 14 | - |
dc.language | 영어 | - |
dc.language.iso | ENG | - |
dc.publisher | Institute of Electrical and Electronics Engineers | - |
dc.title | Incremental Semi-Supervised Clustering Ensemble for High Dimensional Data Clustering | - |
dc.type | Article | - |
dc.publisher.location | 미국 | - |
dc.identifier.doi | 10.1109/TKDE.2015.2499200 | - |
dc.identifier.scopusid | 2-s2.0-84962429330 | - |
dc.identifier.wosid | 000370755300008 | - |
dc.identifier.bibliographicCitation | IEEE Transactions on Knowledge and Data Engineering, v.28, no.3, pp 701 - 714 | - |
dc.citation.title | IEEE Transactions on Knowledge and Data Engineering | - |
dc.citation.volume | 28 | - |
dc.citation.number | 3 | - |
dc.citation.startPage | 701 | - |
dc.citation.endPage | 714 | - |
dc.type.docType | Article | - |
dc.description.isOpenAccess | N | - |
dc.description.journalRegisteredClass | sci | - |
dc.description.journalRegisteredClass | scie | - |
dc.description.journalRegisteredClass | scopus | - |
dc.relation.journalResearchArea | Computer Science | - |
dc.relation.journalResearchArea | Engineering | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Artificial Intelligence | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Information Systems | - |
dc.relation.journalWebOfScienceCategory | Engineering, Electrical & Electronic | - |
dc.subject.keywordPlus | CLASS DISCOVERY | - |
dc.subject.keywordPlus | CONSENSUS | - |
dc.subject.keywordPlus | FRAMEWORK | - |
dc.subject.keywordAuthor | Cluster ensemble | - |
dc.subject.keywordAuthor | semi-supervised clustering | - |
dc.subject.keywordAuthor | random subspace | - |
dc.subject.keywordAuthor | cancer gene expression profile | - |
dc.subject.keywordAuthor | clustering analysis | - |
dc.identifier.url | https://ieeexplore.ieee.org/document/7323847 | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
55 Hanyangdeahak-ro, Sangnok-gu, Ansan, Gyeonggi-do, 15588, Korea+82-31-400-4269 sweetbrain@hanyang.ac.kr
COPYRIGHT © 2021 HANYANG UNIVERSITY. ALL RIGHTS RESERVED.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.