Cited 1 time in
A 7-nm Four-Core Mixed-Precision AI Chip With 26.2-TFLOPS Hybrid-FP8 Training, 104.9-TOPS INT4 Inference, and Workload-Aware Throttling
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Lee, Sae Kyu | - |
| dc.contributor.author | Agrawal, Ankur | - |
| dc.contributor.author | Silberman, Joel | - |
| dc.contributor.author | Ziegler, Matthew | - |
| dc.contributor.author | Kang, Mingu | - |
| dc.contributor.author | Venkataramani, Swagath | - |
| dc.contributor.author | Cao, Nianzheng | - |
| dc.contributor.author | Fleischer, Bruce | - |
| dc.contributor.author | Guillorn, Michael | - |
| dc.contributor.author | Cohen, Matthew | - |
| dc.contributor.author | Mueller, Silvia M. | - |
| dc.contributor.author | Oh, Jinwook | - |
| dc.contributor.author | Lutz, Martin | - |
| dc.contributor.author | Jung, Jinwook | - |
| dc.contributor.author | Koswatta, Siyu | - |
| dc.contributor.author | Zhou, Ching | - |
| dc.contributor.author | Zalani, Vidhi | - |
| dc.contributor.author | Kar, Monodeep | - |
| dc.contributor.author | Bonanno, James | - |
| dc.contributor.author | Casatuta, Robert | - |
| dc.contributor.author | Chen, Chia-Yu | - |
| dc.contributor.author | Choi, Jungwook | - |
| dc.contributor.author | Haynie, Howard | - |
| dc.contributor.author | Herbert, Alyssa | - |
| dc.contributor.author | Jain, Radhika | - |
| dc.contributor.author | Kim, Kyu-Hyoun | - |
| dc.contributor.author | Li, Yulong | - |
| dc.contributor.author | Ren, Zhibin | - |
| dc.contributor.author | Rider, Scot | - |
| dc.contributor.author | Schaal, Marcel | - |
| dc.contributor.author | Schelm, Kerstin | - |
| dc.contributor.author | Scheuermann, Michael R. | - |
| dc.contributor.author | Sun, Xiao | - |
| dc.contributor.author | Tran, Hung | - |
| dc.contributor.author | Wang, Naigang | - |
| dc.contributor.author | Wang, Wei | - |
| dc.contributor.author | Zhang, Xin | - |
| dc.contributor.author | Shah, Vinay | - |
| dc.contributor.author | Curran, Brian | - |
| dc.contributor.author | Srinivasan, Vijayalakshmi | - |
| dc.contributor.author | Lu, Pong-Fei | - |
| dc.contributor.author | Shukla, Sunil | - |
| dc.contributor.author | Gopalakrishnan, Kailash | - |
| dc.contributor.author | Chang, Leland | - |
| dc.date.accessioned | 2022-07-06T10:44:52Z | - |
| dc.date.available | 2022-07-06T10:44:52Z | - |
| dc.date.issued | 2022-01 | - |
| dc.identifier.issn | 0018-9200 | - |
| dc.identifier.issn | 1558-173X | - |
| dc.identifier.uri | https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/139877 | - |
| dc.description.abstract | Reduced precision computation is a key enabling factor for energy-efficient acceleration of deep learning (DL) applications. This article presents a 7-nm four-core mixed-precision artificial intelligence (AI) chip that supports four compute precisions--FP16, Hybrid-FP8 (HFP8), INT4, and INT2--to support diverse application demands for training and inference. The chip leverages cutting-edge algorithmic advances to demonstrate leading-edge power efficiency for 8-bit floating-point (FP8) training and INT4 inference without model accuracy degradation. A new HFP8 format combined with separation of the floating- and fixed-point pipelines and aggressive circuit/architecture optimization enables performance improvements while maintaining high compute utilization. A high-bandwidth ring protocol enables efficient data communication, while power management using workload-aware clock throttling maximizes performance within a given power budget. The AI chip demonstrates 3.58-TFLOPS/W peak energy efficiency and 26.2-TFLOPS peak performance for HFP8 iso-accuracy training, and 16.9-TOPS/W peak energy efficiency and 104.9-TOPS peak performance for INT4 iso-accuracy inference. | - |
| dc.format.extent | 16 | - |
| dc.language | 영어 | - |
| dc.language.iso | ENG | - |
| dc.publisher | Institute of Electrical and Electronics Engineers | - |
| dc.title | A 7-nm Four-Core Mixed-Precision AI Chip With 26.2-TFLOPS Hybrid-FP8 Training, 104.9-TOPS INT4 Inference, and Workload-Aware Throttling | - |
| dc.type | Article | - |
| dc.publisher.location | 미국 | - |
| dc.identifier.doi | 10.1109/JSSC.2021.3120113 | - |
| dc.identifier.scopusid | 2-s2.0-85122332491 | - |
| dc.identifier.wosid | 000732308600001 | - |
| dc.identifier.bibliographicCitation | IEEE Journal of Solid-State Circuits, v.57, no.1, pp 182 - 197 | - |
| dc.citation.title | IEEE Journal of Solid-State Circuits | - |
| dc.citation.volume | 57 | - |
| dc.citation.number | 1 | - |
| dc.citation.startPage | 182 | - |
| dc.citation.endPage | 197 | - |
| dc.type.docType | Article; Early Access | - |
| dc.description.isOpenAccess | N | - |
| dc.description.journalRegisteredClass | scie | - |
| dc.description.journalRegisteredClass | scopus | - |
| dc.relation.journalResearchArea | Engineering | - |
| dc.relation.journalWebOfScienceCategory | Engineering, Electrical & Electronic | - |
| dc.subject.keywordPlus | PROCESSOR | - |
| dc.subject.keywordAuthor | Training | - |
| dc.subject.keywordAuthor | Artificial intelligence | - |
| dc.subject.keywordAuthor | AI accelerators | - |
| dc.subject.keywordAuthor | Inference algorithms | - |
| dc.subject.keywordAuthor | Computer architecture | - |
| dc.subject.keywordAuthor | Bandwidth | - |
| dc.subject.keywordAuthor | System-on-chip | - |
| dc.subject.keywordAuthor | Approximate computing | - |
| dc.subject.keywordAuthor | artificial intelligence (AI) | - |
| dc.subject.keywordAuthor | deep neural networks (DNNs) | - |
| dc.subject.keywordAuthor | hardware accelerators | - |
| dc.subject.keywordAuthor | machine learning (ML) | - |
| dc.subject.keywordAuthor | reduced precision computation | - |
| dc.identifier.url | https://ieeexplore.ieee.org/document/9610618 | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366
COPYRIGHT © 2024 HANYANG UNIVERSITY.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.
