ALADDIN: Asymmetric Centralized Training for Distributed Deep Learning

Ko, Yunyong; Choi, Kibong; Jei, Hyunseung; Lee, Dongwon; Kim, Sang-Wook

doi:10.1145/3459637.3482412

Detailed Information

Cited 0 time in webofscience

Cited 2 time in scopus

Metadata Downloads

ALADDIN: Asymmetric Centralized Training for Distributed Deep Learning

Full metadata record

DC Field	Value	Language
dc.contributor.author	Ko, Yunyong	-
dc.contributor.author	Choi, Kibong	-
dc.contributor.author	Jei, Hyunseung	-
dc.contributor.author	Lee, Dongwon	-
dc.contributor.author	Kim, Sang-Wook	-
dc.date.accessioned	2022-07-06T11:57:16Z	-
dc.date.available	2022-07-06T11:57:16Z	-
dc.date.created	2021-12-08	-
dc.date.issued	2021-10	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/140699	-
dc.description.abstract	To speed up the training of massive deep neural network (DNN) models, distributed training has been widely studied. In general, a centralized training, a type of distributed training, suffers from the communication bottleneck between a parameter server (PS) and workers. On the other hand, a decentralized training suffers from increased parameter variance among workers that causes slower model convergence. Addressing this dilemma, in this work, we propose a novel centralized training algorithm, ALADDIN, employing asymmetriccommunication between PS and workers for the PS bottleneck problem and novel updating strategies for both local and global parameters to mitigate the increased variance problem. Through a convergence analysis, we show that the convergence rate of ALADDIN is O(1 ønk ) on the non-convex problem, where n is the number of workers and k is the number of training iterations. The empirical evaluation using ResNet-50 and VGG-16 models demonstrates that (1) ALADDIN shows significantly better training throughput with up to 191% and 34% improvement compared to a synchronous algorithm and the state-of-the-art decentralized algorithm, respectively, (2) models trained by ALADDIN converge to the accuracies, comparable to those of the synchronous algorithm, within the shortest time, and (3) the convergence of ALADDIN is robust under various heterogeneous environments.	-
dc.language	영어	-
dc.language.iso	en	-
dc.publisher	Association for Computing Machinery	-
dc.title	ALADDIN: Asymmetric Centralized Training for Distributed Deep Learning	-
dc.type	Article	-
dc.contributor.affiliatedAuthor	Kim, Sang-Wook	-
dc.identifier.doi	10.1145/3459637.3482412	-
dc.identifier.scopusid	2-s2.0-85119205605	-
dc.identifier.bibliographicCitation	International Conference on Information and Knowledge Management, Proceedings, pp.863 - 872	-
dc.relation.isPartOf	International Conference on Information and Knowledge Management, Proceedings	-
dc.citation.title	International Conference on Information and Knowledge Management, Proceedings	-
dc.citation.startPage	863	-
dc.citation.endPage	872	-
dc.type.rims	ART	-
dc.type.docType	Conference Paper	-
dc.description.journalClass	1	-
dc.description.isOpenAccess	Y	-
dc.description.journalRegisteredClass	scopus	-
dc.subject.keywordAuthor	centralized training	-
dc.subject.keywordAuthor	distributed deep learning	-
dc.subject.keywordAuthor	heterogeneous systems	-

Files in This Item

3459637.3482412.pdf 1.18 MB

Appears in Collections: 서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Kim, Sang-Wook photo

Kim, Sang-Wook: COLLEGE OF ENGINEERING (SCHOOL OF COMPUTER SCIENCE)

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :5,992,738; Today View :19,557

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1365

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE