Not All Layers Are Equal: A Layer-Wise Adaptive Approach Toward Large-Scale DNN Training

Ko, Yunyong; Lee, Dongwon; Kim, Ssng Wook

doi:10.1145/3485447.3511989

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Not All Layers Are Equal: A Layer-Wise Adaptive Approach Toward Large-Scale DNN Training

Full metadata record

DC Field	Value	Language
dc.contributor.author	Ko, Yunyong	-
dc.contributor.author	Lee, Dongwon	-
dc.contributor.author	Kim, Ssng Wook	-
dc.date.accessioned	2022-07-06T04:12:02Z	-
dc.date.available	2022-07-06T04:12:02Z	-
dc.date.created	2022-06-03	-
dc.date.issued	2022-04	-
dc.identifier.issn	0000-0000	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/138794	-
dc.description.abstract	A large-batch training with data parallelism is a widely adopted approach to efficiently train a large deep neural network (DNN) model. Large-batch training, however, often suffers from the problem of the model quality degradation because of its fewer iterations. To alleviate this problem, in general, learning rate (lr) scaling methods have been applied, which increases the learning rate to make an update larger at each iteration. Unfortunately, however, we observe that large-batch training with state-of-the-art lr scaling methods still often degrade the model quality when a batch size crosses a specific limit, rendering such lr methods less useful. To this phenomenon, we hypothesize that existing lr scaling methods overlook the subtle but important differences across layersin training, which results in the degradation of the overall model quality. From this hypothesis, we propose a novel approach (LENA) toward the learning rate scaling for large-scale DNN training, employing: (1) a layer-wise adaptive lr scaling to adjust lr for each layer individually, and (2) a layer-wise state-aware warm-up to track the state of the training for each layer and finish its warm-up automatically. The comprehensive evaluation with variations of batch sizes demonstrates that LENA achieves the target accuracy (i.e., the accuracy of single-worker training): (1) within the fewest iterations across different batch sizes (up to 45.2% fewer iterations and 44.7% shorter time than the existing state-of-the-art method), and (2) for training very large-batch sizes, surpassing the limits of all baselines.	-
dc.language	영어	-
dc.language.iso	en	-
dc.publisher	Association for Computing Machinery, Inc	-
dc.title	Not All Layers Are Equal: A Layer-Wise Adaptive Approach Toward Large-Scale DNN Training	-
dc.type	Article	-
dc.contributor.affiliatedAuthor	Kim, Ssng Wook	-
dc.identifier.doi	10.1145/3485447.3511989	-
dc.identifier.scopusid	2-s2.0-85129795467	-
dc.identifier.wosid	000852713001087	-
dc.identifier.bibliographicCitation	WWW 2022 - Proceedings of the ACM Web Conference 2022, pp.1851 - 1859	-
dc.relation.isPartOf	WWW 2022 - Proceedings of the ACM Web Conference 2022	-
dc.citation.title	WWW 2022 - Proceedings of the ACM Web Conference 2022	-
dc.citation.startPage	1851	-
dc.citation.endPage	1859	-
dc.type.rims	ART	-
dc.type.docType	Proceedings Paper	-
dc.description.journalClass	1	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Computer Science	-
dc.relation.journalWebOfScienceCategory	Computer Science, Cybernetics	-
dc.relation.journalWebOfScienceCategory	Computer Science, Software Engineering	-
dc.relation.journalWebOfScienceCategory	Computer Science, Theory & Methods	-
dc.subject.keywordAuthor	large batch training	-
dc.subject.keywordAuthor	layer-wise approach	-
dc.subject.keywordAuthor	learning rate scaling	-
dc.identifier.url	https://dl.acm.org/doi/10.1145/3485447.3511989	-

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Kim, Sang-Wook photo

Kim, Sang-Wook: COLLEGE OF ENGINEERING (SCHOOL OF COMPUTER SCIENCE)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1365

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE