ALADDIN: Asymmetric Centralized Training for Distributed Deep Learning
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Ko, Yunyong | - |
dc.contributor.author | Choi, Kibong | - |
dc.contributor.author | Jei, Hyunseung | - |
dc.contributor.author | Lee, Dongwon | - |
dc.contributor.author | Kim, Sang-Wook | - |
dc.date.accessioned | 2022-07-06T11:57:16Z | - |
dc.date.available | 2022-07-06T11:57:16Z | - |
dc.date.created | 2021-12-08 | - |
dc.date.issued | 2021-10 | - |
dc.identifier.uri | https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/140699 | - |
dc.description.abstract | To speed up the training of massive deep neural network (DNN) models, distributed training has been widely studied. In general, a centralized training, a type of distributed training, suffers from the communication bottleneck between a parameter server (PS) and workers. On the other hand, a decentralized training suffers from increased parameter variance among workers that causes slower model convergence. Addressing this dilemma, in this work, we propose a novel centralized training algorithm, ALADDIN, employing asymmetriccommunication between PS and workers for the PS bottleneck problem and novel updating strategies for both local and global parameters to mitigate the increased variance problem. Through a convergence analysis, we show that the convergence rate of ALADDIN is O(1 ønk ) on the non-convex problem, where n is the number of workers and k is the number of training iterations. The empirical evaluation using ResNet-50 and VGG-16 models demonstrates that (1) ALADDIN shows significantly better training throughput with up to 191% and 34% improvement compared to a synchronous algorithm and the state-of-the-art decentralized algorithm, respectively, (2) models trained by ALADDIN converge to the accuracies, comparable to those of the synchronous algorithm, within the shortest time, and (3) the convergence of ALADDIN is robust under various heterogeneous environments. | - |
dc.language | 영어 | - |
dc.language.iso | en | - |
dc.publisher | Association for Computing Machinery | - |
dc.title | ALADDIN: Asymmetric Centralized Training for Distributed Deep Learning | - |
dc.type | Article | - |
dc.contributor.affiliatedAuthor | Kim, Sang-Wook | - |
dc.identifier.doi | 10.1145/3459637.3482412 | - |
dc.identifier.scopusid | 2-s2.0-85119205605 | - |
dc.identifier.bibliographicCitation | International Conference on Information and Knowledge Management, Proceedings, pp.863 - 872 | - |
dc.relation.isPartOf | International Conference on Information and Knowledge Management, Proceedings | - |
dc.citation.title | International Conference on Information and Knowledge Management, Proceedings | - |
dc.citation.startPage | 863 | - |
dc.citation.endPage | 872 | - |
dc.type.rims | ART | - |
dc.type.docType | Conference Paper | - |
dc.description.journalClass | 1 | - |
dc.description.isOpenAccess | Y | - |
dc.description.journalRegisteredClass | scopus | - |
dc.subject.keywordAuthor | centralized training | - |
dc.subject.keywordAuthor | distributed deep learning | - |
dc.subject.keywordAuthor | heterogeneous systems | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1365
COPYRIGHT © 2021 HANYANG UNIVERSITY.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.