Detailed Information

Cited 0 time in webofscience Cited 41 time in scopus
Metadata Downloads

PLPD: reliable protein localization prediction from imbalanced and overlapped datasetsopen access

Authors
Lee, KiYoungKim, Dae-WonNa, DoKyunLee, Kwang H.Lee, Doheon
Issue Date
Oct-2006
Publisher
OXFORD UNIV PRESS
Citation
NUCLEIC ACIDS RESEARCH, v.34, no.17, pp 4655 - 4666
Pages
12
Journal Title
NUCLEIC ACIDS RESEARCH
Volume
34
Number
17
Start Page
4655
End Page
4666
URI
https://scholarworks.bwise.kr/cau/handle/2019.sw.cau/24266
DOI
10.1093/nar/gkl638
ISSN
0305-1048
1362-4962
Abstract
Subcellular localization is one of the key functional characteristics of proteins. An automatic and efficient prediction method for the protein subcellular localization is highly required owing to the need for large-scale genome analysis. From a machine learning point of view, a dataset of protein localization has several characteristics: the dataset has too many classes (there are more than 10 localizations in a cell), it is a multi-label dataset (a protein may occur in several different subcellular locations), and it is too imbalanced (the number of proteins in each localization is remarkably different). Even though many previous works have been done for the prediction of protein subcellular localization, none of them tackles effectively these characteristics at the same time. Thus, a new computational method for protein localization is eventually needed for more reliable outcomes. To address the issue, we present a protein localization predictor based on D-SVDD (PLPD) for the prediction of protein localization, which can find the likelihood of a specific localization of a protein more easily and more correctly. Moreover, we introduce three measurements for the more precise evaluation of a protein localization predictor. As the results of various datasets which are made from the experiments of Huh et al. (2003), the proposed PLPD method represents a different approach that might play a complimentary role to the existing methods, such as Nearest Neighbor method and discriminate covariant method. Finally, after finding a good boundary for each localization using the 5184 classified proteins as training data, we predicted 138 proteins whose subcellular localizations could not be clearly observed by the experiments of Huh et al. (2003).
Files in This Item
Appears in
Collections
College of ICT Engineering > School of Integrative Engineering > 1. Journal Articles
College of Software > School of Computer Science and Engineering > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Na, Dokyun photo

Na, Dokyun
창의ICT공과대학 (융합공학부)
Read more

Altmetrics

Total Views & Downloads

BROWSE