Investigating the Performance of Machine Learning Methods in Predicting Functional Properties of the Hydrogenase VariantsInvestigating the Performance of Machine Learning Methods in Predicting Functional Properties of the Hydrogenase Variants
- Other Titles
- Investigating the Performance of Machine Learning Methods in Predicting Functional Properties of the Hydrogenase Variants
- Authors
- Choi, G.; Kim, W.; Koo, J.
- Issue Date
- 1-Feb-2023
- Publisher
- Korean Society for Biotechnology and Bioengineering
- Keywords
- hydrogenase; machine learning; metalloprotein; O2 sensitivity; protein engineering
- Citation
- Biotechnology and Bioprocess Engineering, v.28, no.1, pp 143 - 151
- Pages
- 9
- Journal Title
- Biotechnology and Bioprocess Engineering
- Volume
- 28
- Number
- 1
- Start Page
- 143
- End Page
- 151
- URI
- https://scholarworks.bwise.kr/hongik/handle/2020.sw.hongik/30845
- DOI
- 10.1007/s12257-022-0330-3
- ISSN
- 1226-8372
1976-3816
- Abstract
- Improving a functional property of an enzyme via mutagenesis is still a challenging problem due to vast search space and difficulty of predicting the effects of mutation(s). Machine learning has proven to be proficient in solving similar problems with unprecedented speed owing to the latest advances in computing power and analytical algorithms. In this study, we investigate the performance of machine learning methods in predicting the H2 production activity and O2 tolerance of the hydrogenase variants. Experimentally measured activities and tolerance of 377 variants having single or double amino acid replacements are used to train and test seven types of machine learning models. Binary representation of amino acid sequence as well as the series of vectors quantifying physicochemical properties of amino acids, namely VHSE, are employed as features representing each variant. The results show that the VHSE enable higher performance, especially with respect to correlation coefficient and coefficient of determination in addition to the root mean square error. Next, the analysis of model performance with respect to changes in the data size and heterogeneity is conducted to provide insights on designing effective mutagenesis library for applying machine learning. The best performance was obtained when support vector machine or ridge regression was trained using a large, homogeneous data. In this manner, our study reveals the factors affecting the performance of machine learning in identifying the enzyme variants with enhanced function. © 2023, The Korean Society for Biotechnology and Bioengineering and Springer.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - College of Engineering > Chemical Engineering Major > 1. Journal Articles
![qrcode](https://api.qrserver.com/v1/create-qr-code/?size=55x55&data=https://scholarworks.bwise.kr/hongik/handle/2020.sw.hongik/30845)
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.