Classification of Highly Divergent Viruses from DNA/RNA Sequence Using Transformer-Based Models
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Sadad, Tariq | - |
dc.contributor.author | Aurangzeb, Raja Atif | - |
dc.contributor.author | Safran, Mejdl | - |
dc.contributor.author | Imran, | - |
dc.contributor.author | Alfarhood, Sultan | - |
dc.contributor.author | Kim, Jungsuk | - |
dc.date.accessioned | 2023-07-02T01:40:08Z | - |
dc.date.available | 2023-07-02T01:40:08Z | - |
dc.date.created | 2023-05-02 | - |
dc.date.issued | 2023-04 | - |
dc.identifier.issn | 2227-9059 | - |
dc.identifier.uri | https://scholarworks.bwise.kr/gachon/handle/2020.sw.gachon/88356 | - |
dc.description.abstract | Viruses infect millions of people worldwide each year, and some can lead to cancer or increase the risk of cancer. As viruses have highly mutable genomes, new viruses may emerge in the future, such as COVID-19 and influenza. Traditional virology relies on predefined rules to identify viruses, but new viruses may be completely or partially divergent from the reference genome, rendering statistical methods and similarity calculations insufficient for all genome sequences. Identifying DNA/RNA-based viral sequences is a crucial step in differentiating different types of lethal pathogens, including their variants and strains. While various tools in bioinformatics can align them, expert biologists are required to interpret the results. Computational virology is a scientific field that studies viruses, their origins, and drug discovery, where machine learning plays a crucial role in extracting domain- and task-specific features to tackle this challenge. This paper proposes a genome analysis system that uses advanced deep learning to identify dozens of viruses. The system uses nucleotide sequences from the NCBI GenBank database and a BERT tokenizer to extract features from the sequences by breaking them down into tokens. We also generated synthetic data for viruses with small sample sizes. The proposed system has two components: a scratch BERT architecture specifically designed for DNA analysis, which is used to learn the next codons unsupervised, and a classifier that identifies important features and understands the relationship between genotype and phenotype. Our system achieved an accuracy of 97.69% in identifying viral sequences. | - |
dc.language | 영어 | - |
dc.language.iso | en | - |
dc.publisher | MDPI | - |
dc.relation.isPartOf | Biomedicines | - |
dc.title | Classification of Highly Divergent Viruses from DNA/RNA Sequence Using Transformer-Based Models | - |
dc.type | Article | - |
dc.type.rims | ART | - |
dc.description.journalClass | 1 | - |
dc.identifier.wosid | 001011273800001 | - |
dc.identifier.doi | 10.3390/biomedicines11051323 | - |
dc.identifier.bibliographicCitation | Biomedicines, v.11, no.5 | - |
dc.description.isOpenAccess | N | - |
dc.identifier.scopusid | 2-s2.0-85160688689 | - |
dc.citation.title | Biomedicines | - |
dc.citation.volume | 11 | - |
dc.citation.number | 5 | - |
dc.contributor.affiliatedAuthor | Imran, | - |
dc.contributor.affiliatedAuthor | Kim, Jungsuk | - |
dc.type.docType | Article | - |
dc.subject.keywordAuthor | BERT | - |
dc.subject.keywordAuthor | deep learning | - |
dc.subject.keywordAuthor | DNA/RNA sequence | - |
dc.subject.keywordAuthor | K-MERS | - |
dc.relation.journalResearchArea | Biochemistry & Molecular Biology | - |
dc.relation.journalResearchArea | Research & Experimental Medicine | - |
dc.relation.journalResearchArea | Pharmacology & Pharmacy | - |
dc.relation.journalWebOfScienceCategory | Biochemistry & Molecular Biology | - |
dc.relation.journalWebOfScienceCategory | Medicine, Research & Experimental | - |
dc.relation.journalWebOfScienceCategory | Pharmacology & Pharmacy | - |
dc.description.journalRegisteredClass | scie | - |
dc.description.journalRegisteredClass | scopus | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
1342, Seongnam-daero, Sujeong-gu, Seongnam-si, Gyeonggi-do, Republic of Korea(13120)031-750-5114
COPYRIGHT 2020 Gachon University All Rights Reserved.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.