DeepCOI: a large language model-driven framework for fast and accurate taxonomic assignment in animal metabarcodingopen access
- Authors
- Gwak, Ho-Jin; Rho, Mina
- Issue Date
- Mar-2026
- Publisher
- BioMed Central
- Keywords
- Metabarcoding; Metagenomics; COI genes; Language model; Self-supervised learning; Explainable AI
- Citation
- Genome Biology, v.26, no.1, pp 1 - 20
- Pages
- 20
- Indexed
- SCIE
SCOPUS
- Journal Title
- Genome Biology
- Volume
- 26
- Number
- 1
- Start Page
- 1
- End Page
- 20
- URI
- https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/209403
- DOI
- 10.1186/s13059-025-03861-7
- ISSN
- 1474-7596
1474-760X
- Abstract
- Metabarcoding remains challenging due to incomplete taxonomic annotations and computationally intensive processes. We present DeepCOI, a large language model-based classifier pre-trained on seven million cytochrome c oxidase I gene sequences. DeepCOI enables fast and accurate taxonomic assignment across eight major phyla, achieving an AU-ROC of 0.958 and AU-PR of 0.897-outperforming existing methods while significantly reducing inference time. Additionally, DeepCOI demonstrates interpretability by identifying taxonomically informative sequence positions. By integrating large-scale datasets and self-supervised learning, DeepCOI enhances both the accuracy and efficiency of metabarcoding processes, providing a scalable solution for biodiversity assessment and environmental monitoring.
- Files in This Item
-
Go to Link
- Appears in
Collections - 서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.