Transformer-based embedding applied to classify bacterial species using sequencing reads
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Gwak, Ho-Jin | - |
dc.contributor.author | Rho, Mina | - |
dc.date.accessioned | 2022-07-06T07:42:41Z | - |
dc.date.available | 2022-07-06T07:42:41Z | - |
dc.date.created | 2022-05-04 | - |
dc.date.issued | 2022-03 | - |
dc.identifier.issn | 2375-933X | - |
dc.identifier.uri | https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/139168 | - |
dc.description.abstract | With the emergence of next-generation sequencing and metagenomic approaches, the necessity for read-level taxonomy classifiers has increased. Although the 16S rRNA gene sequence has been widely employed as a taxonomic marker, recent studies have revealed that 16S rRNA is not sufficient to assign species. Therefore, an accurate classifier is required to classify whole-genome sequencing reads into species. With the advancement of deep learning methods and natural language processing technologies, several studies attempted to apply these methods to genomic data and successfully achieved state-of-the-art performance. In this study, we applied transformer-based embedding into bacterial genomes to accurately classify species using sequencing reads. As a case study, we classified Staphylococcus species using sequencing reads. Our model achieved ROC-AUC values of over 0.98 and 0.99 for 151 bp and 251bp paired-end reads, respectively. Compared with a cutting-edge method Kraken2, our model classified significantly more S. aureus reads while maintaining comparable precision. | - |
dc.language | 영어 | - |
dc.language.iso | en | - |
dc.publisher | Institute of Electrical and Electronics Engineers Inc. | - |
dc.title | Transformer-based embedding applied to classify bacterial species using sequencing reads | - |
dc.type | Article | - |
dc.contributor.affiliatedAuthor | Rho, Mina | - |
dc.identifier.doi | 10.1109/BigComp54360.2022.00084 | - |
dc.identifier.scopusid | 2-s2.0-85127542276 | - |
dc.identifier.wosid | 000835722100075 | - |
dc.identifier.bibliographicCitation | Proceedings - 2022 IEEE International Conference on Big Data and Smart Computing, BigComp 2022, pp.374 - 377 | - |
dc.relation.isPartOf | Proceedings - 2022 IEEE International Conference on Big Data and Smart Computing, BigComp 2022 | - |
dc.citation.title | Proceedings - 2022 IEEE International Conference on Big Data and Smart Computing, BigComp 2022 | - |
dc.citation.startPage | 374 | - |
dc.citation.endPage | 377 | - |
dc.type.rims | ART | - |
dc.type.docType | Proceedings Paper | - |
dc.description.journalClass | 1 | - |
dc.description.isOpenAccess | N | - |
dc.description.journalRegisteredClass | scopus | - |
dc.relation.journalResearchArea | Computer Science | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Artificial Intelligence | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Theory & Methods | - |
dc.subject.keywordPlus | Bacteria | - |
dc.subject.keywordPlus | Deep learning | - |
dc.subject.keywordPlus | Natural language processing systems | - |
dc.subject.keywordPlus | RNA | - |
dc.subject.keywordPlus | Embeddings | - |
dc.subject.keywordPlus | 16S rRNA | - |
dc.subject.keywordPlus | 16S rRNA gene sequence | - |
dc.subject.keywordPlus | Bacterial species | - |
dc.subject.keywordPlus | Classifieds | - |
dc.subject.keywordPlus | Deep learning | - |
dc.subject.keywordPlus | Embeddings | - |
dc.subject.keywordPlus | Metagenomics | - |
dc.subject.keywordPlus | Next-generation sequencing | - |
dc.subject.keywordPlus | Staphylococcus species | - |
dc.subject.keywordPlus | Transformer | - |
dc.subject.keywordAuthor | classification | - |
dc.subject.keywordAuthor | deep learning | - |
dc.subject.keywordAuthor | embedding | - |
dc.subject.keywordAuthor | Staphylococcus species | - |
dc.subject.keywordAuthor | transformer | - |
dc.identifier.url | https://ieeexplore.ieee.org/document/9736470 | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1365
COPYRIGHT © 2021 HANYANG UNIVERSITY.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.