Investigating the Performance of Machine Learning Methods in Predicting Functional Properties of the Hydrogenase Variants
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Choi, G. | - |
dc.contributor.author | Kim, W. | - |
dc.contributor.author | Koo, J. | - |
dc.date.accessioned | 2023-02-17T05:40:06Z | - |
dc.date.available | 2023-02-17T05:40:06Z | - |
dc.date.issued | 2023-02-01 | - |
dc.identifier.issn | 1226-8372 | - |
dc.identifier.issn | 1976-3816 | - |
dc.identifier.uri | https://scholarworks.bwise.kr/hongik/handle/2020.sw.hongik/30845 | - |
dc.description.abstract | Improving a functional property of an enzyme via mutagenesis is still a challenging problem due to vast search space and difficulty of predicting the effects of mutation(s). Machine learning has proven to be proficient in solving similar problems with unprecedented speed owing to the latest advances in computing power and analytical algorithms. In this study, we investigate the performance of machine learning methods in predicting the H2 production activity and O2 tolerance of the hydrogenase variants. Experimentally measured activities and tolerance of 377 variants having single or double amino acid replacements are used to train and test seven types of machine learning models. Binary representation of amino acid sequence as well as the series of vectors quantifying physicochemical properties of amino acids, namely VHSE, are employed as features representing each variant. The results show that the VHSE enable higher performance, especially with respect to correlation coefficient and coefficient of determination in addition to the root mean square error. Next, the analysis of model performance with respect to changes in the data size and heterogeneity is conducted to provide insights on designing effective mutagenesis library for applying machine learning. The best performance was obtained when support vector machine or ridge regression was trained using a large, homogeneous data. In this manner, our study reveals the factors affecting the performance of machine learning in identifying the enzyme variants with enhanced function. © 2023, The Korean Society for Biotechnology and Bioengineering and Springer. | - |
dc.format.extent | 9 | - |
dc.language | 영어 | - |
dc.language.iso | ENG | - |
dc.publisher | Korean Society for Biotechnology and Bioengineering | - |
dc.title | Investigating the Performance of Machine Learning Methods in Predicting Functional Properties of the Hydrogenase Variants | - |
dc.title.alternative | Investigating the Performance of Machine Learning Methods in Predicting Functional Properties of the Hydrogenase Variants | - |
dc.type | Article | - |
dc.publisher.location | 대한민국 | - |
dc.identifier.doi | 10.1007/s12257-022-0330-3 | - |
dc.identifier.scopusid | 2-s2.0-85147768440 | - |
dc.identifier.wosid | 000929504600003 | - |
dc.identifier.bibliographicCitation | Biotechnology and Bioprocess Engineering, v.28, no.1, pp 143 - 151 | - |
dc.citation.title | Biotechnology and Bioprocess Engineering | - |
dc.citation.volume | 28 | - |
dc.citation.number | 1 | - |
dc.citation.startPage | 143 | - |
dc.citation.endPage | 151 | - |
dc.type.docType | Article | - |
dc.identifier.kciid | ART002940924 | - |
dc.description.isOpenAccess | N | - |
dc.description.journalRegisteredClass | scie | - |
dc.description.journalRegisteredClass | scopus | - |
dc.description.journalRegisteredClass | kci | - |
dc.relation.journalResearchArea | Biotechnology & Applied Microbiology | - |
dc.relation.journalWebOfScienceCategory | Biotechnology & Applied Microbiology | - |
dc.subject.keywordPlus | PROTEIN STABILITY | - |
dc.subject.keywordPlus | CLASSIFICATION | - |
dc.subject.keywordPlus | SCHIZOPHRENIA | - |
dc.subject.keywordPlus | BIOMARKERS | - |
dc.subject.keywordAuthor | hydrogenase | - |
dc.subject.keywordAuthor | machine learning | - |
dc.subject.keywordAuthor | metalloprotein | - |
dc.subject.keywordAuthor | O2 sensitivity | - |
dc.subject.keywordAuthor | protein engineering | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
94, Wausan-ro, Mapo-gu, Seoul, 04066, Korea02-320-1314
COPYRIGHT 2020 HONGIK UNIVERSITY. ALL RIGHTS RESERVED.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.