Cited 0 time in
IterL2Norm: Fast Iterative L2-Normalization
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Ye, ChangMin | - |
| dc.contributor.author | Sim, Yonguk | - |
| dc.contributor.author | Kim, Youngchae | - |
| dc.contributor.author | Jin, SeongMin | - |
| dc.contributor.author | Jeong, Doo Seok | - |
| dc.date.accessioned | 2025-08-27T07:30:21Z | - |
| dc.date.available | 2025-08-27T07:30:21Z | - |
| dc.date.issued | 2025-03 | - |
| dc.identifier.issn | 1530-1591 | - |
| dc.identifier.issn | 1558-1101 | - |
| dc.identifier.uri | https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/208602 | - |
| dc.description.abstract | Transformer-based large language models are a memory-bound model whose operation is based on a large amount of data that are marginally reused. Thus, the data movement between a host and accelerator likely dictates the total wall-clock time. Layer normalization is one of the key workloads in the transformer model, following each of multi-head attention and feed-forward network blocks. To reduce data movement, layer normalization needs to be performed on the same chip as the matrix-matrix multiplication engine. To this end, we introduce an iterative L2-normalization method for 1D input (IterL2Norm), ensuring fast convergence to the steady-state solution within five iteration steps and high precision, outperforming the fast inverse square root algorithm in six out of nine cases for FP32 and five out of nine for BFloat16 across the embedding lengths used in the OPT models. Implemented in 32/28nm CMOS, the IterL2Norm macro normalizes d-dimensional vectors, where 64 <= d <= 1024, with a latency of 116-227 cycles at 100MHz/1.05V. | - |
| dc.format.extent | 7 | - |
| dc.language | 영어 | - |
| dc.language.iso | ENG | - |
| dc.publisher | Institute of Electrical and Electronics Engineers Inc. | - |
| dc.title | IterL2Norm: Fast Iterative L2-Normalization | - |
| dc.type | Article | - |
| dc.publisher.location | 미국 | - |
| dc.identifier.doi | 10.23919/DATE64628.2025.10992867 | - |
| dc.identifier.scopusid | 2-s2.0-105006907396 | - |
| dc.identifier.wosid | 001506972600125 | - |
| dc.identifier.bibliographicCitation | Proceedings -Design, Automation and Test in Europe, DATE, pp 1 - 7 | - |
| dc.citation.title | Proceedings -Design, Automation and Test in Europe, DATE | - |
| dc.citation.startPage | 1 | - |
| dc.citation.endPage | 7 | - |
| dc.type.docType | Proceedings Paper | - |
| dc.description.isOpenAccess | N | - |
| dc.description.journalRegisteredClass | scopus | - |
| dc.relation.journalResearchArea | Automation & Control Systems | - |
| dc.relation.journalResearchArea | Computer Science | - |
| dc.relation.journalResearchArea | Engineering | - |
| dc.relation.journalWebOfScienceCategory | Automation & Control Systems | - |
| dc.relation.journalWebOfScienceCategory | Computer Science, Artificial Intelligence | - |
| dc.relation.journalWebOfScienceCategory | Engineering, Industrial | - |
| dc.relation.journalWebOfScienceCategory | Engineering, Electrical & Electronic | - |
| dc.subject.keywordPlus | Digital arithmetic | - |
| dc.subject.keywordPlus | Inverse problems | - |
| dc.subject.keywordPlus | Inverse transforms | - |
| dc.subject.keywordAuthor | IterL2Norm | - |
| dc.subject.keywordAuthor | layer normalization | - |
| dc.subject.keywordAuthor | fast convergence | - |
| dc.subject.keywordAuthor | large language models | - |
| dc.identifier.url | https://ieeexplore.ieee.org/document/10992867 | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366
COPYRIGHT © 2024 HANYANG UNIVERSITY.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.
