Correlation Between Attention Heads of BERT
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Yang, Seungmo | - |
dc.contributor.author | Kang, Mincheal | - |
dc.contributor.author | Seo, Jiwon | - |
dc.contributor.author | Kim, Younghoon | - |
dc.date.accessioned | 2023-05-03T09:35:09Z | - |
dc.date.available | 2023-05-03T09:35:09Z | - |
dc.date.issued | 2022-04 | - |
dc.identifier.issn | 0000-0000 | - |
dc.identifier.uri | https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/112579 | - |
dc.description.abstract | Recently, as deep learning achieves tremendous success in a variety of application domains, natural language processing adopting deep learning also has become very widespread in research. The performance of typical such models like Transformer, BERT and GPT models is quite excellent and near human performance. However, due to its complicate structure of operations such as self-attention, the role of internal outputs between layers or the relationship between latent vectors has been seldomly studied compared to CNNs. In this work, we calculate the correlation between the output of multiple self-attention heads in each layer of a pre-trained BERT model and investigate if there exist redundantly trained ones, that is, we test if the output latent vectors of an attention head can be linearly transformed to those of the other head. By experiments, we show that there are heads with high correlation and the result implies that such examination on the correlation between heads may help us to optimize the structure of BERT. | - |
dc.format.extent | 3 | - |
dc.language | 영어 | - |
dc.language.iso | ENG | - |
dc.publisher | IEEE | - |
dc.title | Correlation Between Attention Heads of BERT | - |
dc.type | Article | - |
dc.publisher.location | 미국 | - |
dc.identifier.doi | 10.1109/ICEIC54506.2022.9748643 | - |
dc.identifier.scopusid | 2-s2.0-85128836688 | - |
dc.identifier.wosid | 000942023400099 | - |
dc.identifier.bibliographicCitation | 2022 International Conference on Electronics, Information, and Communication (ICEIC), pp 1 - 3 | - |
dc.citation.title | 2022 International Conference on Electronics, Information, and Communication (ICEIC) | - |
dc.citation.startPage | 1 | - |
dc.citation.endPage | 3 | - |
dc.type.docType | Proceedings Paper | - |
dc.description.isOpenAccess | N | - |
dc.description.journalRegisteredClass | scie | - |
dc.description.journalRegisteredClass | scopus | - |
dc.relation.journalResearchArea | Engineering | - |
dc.relation.journalResearchArea | Telecommunications | - |
dc.relation.journalWebOfScienceCategory | Engineering, Electrical & Electronic | - |
dc.relation.journalWebOfScienceCategory | Telecommunications | - |
dc.subject.keywordPlus | Computer vision | - |
dc.subject.keywordPlus | Deep learning | - |
dc.subject.keywordPlus | Applications domains | - |
dc.subject.keywordPlus | BERT | - |
dc.subject.keywordPlus | Correlation | - |
dc.subject.keywordPlus | Human performance | - |
dc.subject.keywordPlus | Latent vectors | - |
dc.subject.keywordPlus | Performance | - |
dc.subject.keywordPlus | Self-attention head | - |
dc.subject.keywordPlus | Structure of operations | - |
dc.subject.keywordPlus | Natural language processing systems | - |
dc.subject.keywordAuthor | BERT | - |
dc.subject.keywordAuthor | self-attention head | - |
dc.subject.keywordAuthor | correlation | - |
dc.identifier.url | https://ieeexplore.ieee.org/document/9748643 | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
55 Hanyangdeahak-ro, Sangnok-gu, Ansan, Gyeonggi-do, 15588, Korea+82-31-400-4269 sweetbrain@hanyang.ac.kr
COPYRIGHT © 2021 HANYANG UNIVERSITY. ALL RIGHTS RESERVED.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.