Proformer: a hybrid macaron transformer model predicts expression values from promoter sequences
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Kwak, Il-Youp | - |
dc.contributor.author | Kim, Byeong-Chan | - |
dc.contributor.author | Lee, Juhyun | - |
dc.contributor.author | Kang, Taein | - |
dc.contributor.author | Garry, Daniel J. | - |
dc.contributor.author | Zhang, Jianyi | - |
dc.contributor.author | Gong, Wuming | - |
dc.date.accessioned | 2024-03-19T07:30:41Z | - |
dc.date.available | 2024-03-19T07:30:41Z | - |
dc.date.issued | 2024-02 | - |
dc.identifier.issn | 1471-2105 | - |
dc.identifier.uri | https://scholarworks.bwise.kr/cau/handle/2019.sw.cau/72933 | - |
dc.description.abstract | The breakthrough high-throughput measurement of the cis-regulatory activity of millions of randomly generated promoters provides an unprecedented opportunity to systematically decode the cis-regulatory logic that determines the expression values. We developed an end-to-end transformer encoder architecture named Proformer to predict the expression values from DNA sequences. Proformer used a Macaron-like Transformer encoder architecture, where two half-step feed forward (FFN) layers were placed at the beginning and the end of each encoder block, and a separable 1D convolution layer was inserted after the first FFN layer and in front of the multi-head attention layer. The sliding k-mers from one-hot encoded sequences were mapped onto a continuous embedding, combined with the learned positional embedding and strand embedding (forward strand vs. reverse complemented strand) as the sequence input. Moreover, Proformer introduced multiple expression heads with mask filling to prevent the transformer models from collapsing when training on relatively small amount of data. We empirically determined that this design had significantly better performance than the conventional design such as using the global pooling layer as the output layer for the regression task. These analyses support the notion that Proformer provides a novel method of learning and enhances our understanding of how cis-regulatory sequences determine the expression values. © The Author(s) 2024. | - |
dc.language | 영어 | - |
dc.language.iso | ENG | - |
dc.publisher | BioMed Central Ltd | - |
dc.title | Proformer: a hybrid macaron transformer model predicts expression values from promoter sequences | - |
dc.type | Article | - |
dc.identifier.doi | 10.1186/s12859-024-05645-5 | - |
dc.identifier.bibliographicCitation | BMC Bioinformatics, v.25, no.1 | - |
dc.description.isOpenAccess | Y | - |
dc.identifier.wosid | 001169097400004 | - |
dc.identifier.scopusid | 2-s2.0-85185523971 | - |
dc.citation.number | 1 | - |
dc.citation.title | BMC Bioinformatics | - |
dc.citation.volume | 25 | - |
dc.type.docType | Article | - |
dc.publisher.location | 영국 | - |
dc.subject.keywordAuthor | Enhancer | - |
dc.subject.keywordAuthor | Expression prediction | - |
dc.subject.keywordAuthor | Macaron Transformer | - |
dc.subject.keywordAuthor | Passively Parallel Reporter Assay (MPRA) | - |
dc.subject.keywordAuthor | Sequence model | - |
dc.subject.keywordPlus | ENHANCER ACTIVITY MAPS | - |
dc.subject.keywordPlus | GENE-REGULATORY LOGIC | - |
dc.subject.keywordPlus | SYSTEMATIC DISSECTION | - |
dc.subject.keywordPlus | FUNCTIONAL DISSECTION | - |
dc.subject.keywordPlus | BINDING PROTEINS | - |
dc.subject.keywordPlus | GENOME | - |
dc.subject.keywordPlus | VARIANTS | - |
dc.relation.journalResearchArea | Biochemistry & Molecular Biology | - |
dc.relation.journalResearchArea | Biotechnology & Applied Microbiology | - |
dc.relation.journalResearchArea | Mathematical & Computational Biology | - |
dc.relation.journalWebOfScienceCategory | Biochemical Research Methods | - |
dc.relation.journalWebOfScienceCategory | Biotechnology & Applied Microbiology | - |
dc.relation.journalWebOfScienceCategory | Mathematical & Computational Biology | - |
dc.description.journalRegisteredClass | scie | - |
dc.description.journalRegisteredClass | scopus | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
84, Heukseok-ro, Dongjak-gu, Seoul, Republic of Korea (06974)02-820-6194
COPYRIGHT 2019 Chung-Ang University All Rights Reserved.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.