Transformer-based embedding applied to classify bacterial species using sequencing reads
- Authors
- Gwak, Ho-Jin; Rho, Mina
- Issue Date
- Mar-2022
- Publisher
- Institute of Electrical and Electronics Engineers Inc.
- Keywords
- classification; deep learning; embedding; Staphylococcus species; transformer
- Citation
- Proceedings - 2022 IEEE International Conference on Big Data and Smart Computing, BigComp 2022, pp.374 - 377
- Indexed
- SCOPUS
- Journal Title
- Proceedings - 2022 IEEE International Conference on Big Data and Smart Computing, BigComp 2022
- Start Page
- 374
- End Page
- 377
- URI
- https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/139168
- DOI
- 10.1109/BigComp54360.2022.00084
- ISSN
- 2375-933X
- Abstract
- With the emergence of next-generation sequencing and metagenomic approaches, the necessity for read-level taxonomy classifiers has increased. Although the 16S rRNA gene sequence has been widely employed as a taxonomic marker, recent studies have revealed that 16S rRNA is not sufficient to assign species. Therefore, an accurate classifier is required to classify whole-genome sequencing reads into species. With the advancement of deep learning methods and natural language processing technologies, several studies attempted to apply these methods to genomic data and successfully achieved state-of-the-art performance. In this study, we applied transformer-based embedding into bacterial genomes to accurately classify species using sequencing reads. As a case study, we classified Staphylococcus species using sequencing reads. Our model achieved ROC-AUC values of over 0.98 and 0.99 for 151 bp and 251bp paired-end reads, respectively. Compared with a cutting-edge method Kraken2, our model classified significantly more S. aureus reads while maintaining comparable precision.
- Files in This Item
-
Go to Link
- Appears in
Collections - 서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles
![qrcode](https://api.qrserver.com/v1/create-qr-code/?size=55x55&data=https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/139168)
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.