A 7-nm Four-Core Mixed-Precision AI Chip With 26.2-TFLOPS Hybrid-FP8 Training, 104.9-TOPS INT4 Inference, and Workload-Aware Throttling

Lee, Sae Kyu; Agrawal, Ankur; Silberman, Joel; Ziegler, Matthew; Kang, Mingu; Venkataramani, Swagath; Cao, Nianzheng; Fleischer, Bruce; Guillorn, Michael; Cohen, Matthew; Mueller, Silvia M.; Oh, Jinwook; Lutz, Martin; Jung, Jinwook; Koswatta, Siyu; Zhou, Ching; Zalani, Vidhi; Kar, Monodeep; Bonanno, James; Casatuta, Robert; Chen, Chia-Yu; Choi, Jungwook; Haynie, Howard; Herbert, Alyssa; Jain, Radhika; Kim, Kyu-Hyoun; Li, Yulong; Ren, Zhibin; Rider, Scot; Schaal, Marcel; Schelm, Kerstin; Scheuermann, Michael R.; Sun, Xiao; Tran, Hung; Wang, Naigang; Wang, Wei; Zhang, Xin; Shah, Vinay; Curran, Brian; Srinivasan, Vijayalakshmi; Lu, Pong-Fei; Shukla, Sunil; Gopalakrishnan, Kailash; Chang, Leland

doi:10.1109/JSSC.2021.3120113

Detailed Information

Cited 0 time in webofscience

Cited 1 time in scopus

Metadata Downloads

A 7-nm Four-Core Mixed-Precision AI Chip With 26.2-TFLOPS Hybrid-FP8 Training, 104.9-TOPS INT4 Inference, and Workload-Aware Throttling

Full metadata record

DC Field	Value	Language
dc.contributor.author	Lee, Sae Kyu	-
dc.contributor.author	Agrawal, Ankur	-
dc.contributor.author	Silberman, Joel	-
dc.contributor.author	Ziegler, Matthew	-
dc.contributor.author	Kang, Mingu	-
dc.contributor.author	Venkataramani, Swagath	-
dc.contributor.author	Cao, Nianzheng	-
dc.contributor.author	Fleischer, Bruce	-
dc.contributor.author	Guillorn, Michael	-
dc.contributor.author	Cohen, Matthew	-
dc.contributor.author	Mueller, Silvia M.	-
dc.contributor.author	Oh, Jinwook	-
dc.contributor.author	Lutz, Martin	-
dc.contributor.author	Jung, Jinwook	-
dc.contributor.author	Koswatta, Siyu	-
dc.contributor.author	Zhou, Ching	-
dc.contributor.author	Zalani, Vidhi	-
dc.contributor.author	Kar, Monodeep	-
dc.contributor.author	Bonanno, James	-
dc.contributor.author	Casatuta, Robert	-
dc.contributor.author	Chen, Chia-Yu	-
dc.contributor.author	Choi, Jungwook	-
dc.contributor.author	Haynie, Howard	-
dc.contributor.author	Herbert, Alyssa	-
dc.contributor.author	Jain, Radhika	-
dc.contributor.author	Kim, Kyu-Hyoun	-
dc.contributor.author	Li, Yulong	-
dc.contributor.author	Ren, Zhibin	-
dc.contributor.author	Rider, Scot	-
dc.contributor.author	Schaal, Marcel	-
dc.contributor.author	Schelm, Kerstin	-
dc.contributor.author	Scheuermann, Michael R.	-
dc.contributor.author	Sun, Xiao	-
dc.contributor.author	Tran, Hung	-
dc.contributor.author	Wang, Naigang	-
dc.contributor.author	Wang, Wei	-
dc.contributor.author	Zhang, Xin	-
dc.contributor.author	Shah, Vinay	-
dc.contributor.author	Curran, Brian	-
dc.contributor.author	Srinivasan, Vijayalakshmi	-
dc.contributor.author	Lu, Pong-Fei	-
dc.contributor.author	Shukla, Sunil	-
dc.contributor.author	Gopalakrishnan, Kailash	-
dc.contributor.author	Chang, Leland	-
dc.date.accessioned	2022-07-06T10:44:52Z	-
dc.date.available	2022-07-06T10:44:52Z	-
dc.date.issued	2022-01	-
dc.identifier.issn	0018-9200	-
dc.identifier.issn	1558-173X	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/139877	-
dc.description.abstract	Reduced precision computation is a key enabling factor for energy-efficient acceleration of deep learning (DL) applications. This article presents a 7-nm four-core mixed-precision artificial intelligence (AI) chip that supports four compute precisions--FP16, Hybrid-FP8 (HFP8), INT4, and INT2--to support diverse application demands for training and inference. The chip leverages cutting-edge algorithmic advances to demonstrate leading-edge power efficiency for 8-bit floating-point (FP8) training and INT4 inference without model accuracy degradation. A new HFP8 format combined with separation of the floating- and fixed-point pipelines and aggressive circuit/architecture optimization enables performance improvements while maintaining high compute utilization. A high-bandwidth ring protocol enables efficient data communication, while power management using workload-aware clock throttling maximizes performance within a given power budget. The AI chip demonstrates 3.58-TFLOPS/W peak energy efficiency and 26.2-TFLOPS peak performance for HFP8 iso-accuracy training, and 16.9-TOPS/W peak energy efficiency and 104.9-TOPS peak performance for INT4 iso-accuracy inference.	-
dc.format.extent	16	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	Institute of Electrical and Electronics Engineers	-
dc.title	A 7-nm Four-Core Mixed-Precision AI Chip With 26.2-TFLOPS Hybrid-FP8 Training, 104.9-TOPS INT4 Inference, and Workload-Aware Throttling	-
dc.type	Article	-
dc.publisher.location	미국	-
dc.identifier.doi	10.1109/JSSC.2021.3120113	-
dc.identifier.scopusid	2-s2.0-85122332491	-
dc.identifier.wosid	000732308600001	-
dc.identifier.bibliographicCitation	IEEE Journal of Solid-State Circuits, v.57, no.1, pp 182 - 197	-
dc.citation.title	IEEE Journal of Solid-State Circuits	-
dc.citation.volume	57	-
dc.citation.number	1	-
dc.citation.startPage	182	-
dc.citation.endPage	197	-
dc.type.docType	Article; Early Access	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Engineering	-
dc.relation.journalWebOfScienceCategory	Engineering, Electrical & Electronic	-
dc.subject.keywordPlus	PROCESSOR	-
dc.subject.keywordAuthor	Training	-
dc.subject.keywordAuthor	Artificial intelligence	-
dc.subject.keywordAuthor	AI accelerators	-
dc.subject.keywordAuthor	Inference algorithms	-
dc.subject.keywordAuthor	Computer architecture	-
dc.subject.keywordAuthor	Bandwidth	-
dc.subject.keywordAuthor	System-on-chip	-
dc.subject.keywordAuthor	Approximate computing	-
dc.subject.keywordAuthor	artificial intelligence (AI)	-
dc.subject.keywordAuthor	deep neural networks (DNNs)	-
dc.subject.keywordAuthor	hardware accelerators	-
dc.subject.keywordAuthor	machine learning (ML)	-
dc.subject.keywordAuthor	reduced precision computation	-
dc.identifier.url	https://ieeexplore.ieee.org/document/9610618	-

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Choi, Jung wook photo

Choi, Jung wook: COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE