Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Investigating the Performance of Machine Learning Methods in Predicting Functional Properties of the Hydrogenase VariantsInvestigating the Performance of Machine Learning Methods in Predicting Functional Properties of the Hydrogenase Variants

Other Titles
Investigating the Performance of Machine Learning Methods in Predicting Functional Properties of the Hydrogenase Variants
Authors
Choi, G.Kim, W.Koo, J.
Issue Date
1-Feb-2023
Publisher
Korean Society for Biotechnology and Bioengineering
Keywords
hydrogenase; machine learning; metalloprotein; O2 sensitivity; protein engineering
Citation
Biotechnology and Bioprocess Engineering, v.28, no.1, pp 143 - 151
Pages
9
Journal Title
Biotechnology and Bioprocess Engineering
Volume
28
Number
1
Start Page
143
End Page
151
URI
https://scholarworks.bwise.kr/hongik/handle/2020.sw.hongik/30845
DOI
10.1007/s12257-022-0330-3
ISSN
1226-8372
1976-3816
Abstract
Improving a functional property of an enzyme via mutagenesis is still a challenging problem due to vast search space and difficulty of predicting the effects of mutation(s). Machine learning has proven to be proficient in solving similar problems with unprecedented speed owing to the latest advances in computing power and analytical algorithms. In this study, we investigate the performance of machine learning methods in predicting the H2 production activity and O2 tolerance of the hydrogenase variants. Experimentally measured activities and tolerance of 377 variants having single or double amino acid replacements are used to train and test seven types of machine learning models. Binary representation of amino acid sequence as well as the series of vectors quantifying physicochemical properties of amino acids, namely VHSE, are employed as features representing each variant. The results show that the VHSE enable higher performance, especially with respect to correlation coefficient and coefficient of determination in addition to the root mean square error. Next, the analysis of model performance with respect to changes in the data size and heterogeneity is conducted to provide insights on designing effective mutagenesis library for applying machine learning. The best performance was obtained when support vector machine or ridge regression was trained using a large, homogeneous data. In this manner, our study reveals the factors affecting the performance of machine learning in identifying the enzyme variants with enhanced function. © 2023, The Korean Society for Biotechnology and Bioengineering and Springer.
Files in This Item
There are no files associated with this item.
Appears in
Collections
College of Engineering > Chemical Engineering Major > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Koo, Jamin photo

Koo, Jamin
Engineering (Chemical Engineering)
Read more

Altmetrics

Total Views & Downloads

BROWSE