An In-Depth Analysis of Distributed Training of  Deep Neural Networks

Ko, Yunyong; Choi, Kibong; Seo, Jiwon; Kim, Sangwook

doi:10.1109/IPDPS49936.2021.00108

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

An In-Depth Analysis of Distributed Training of Deep Neural Networks

Full metadata record

DC Field	Value	Language
dc.contributor.author	Ko, Yunyong	-
dc.contributor.author	Choi, Kibong	-
dc.contributor.author	Seo, Jiwon	-
dc.contributor.author	Kim, Sangwook	-
dc.date.accessioned	2022-07-06T17:45:27Z	-
dc.date.available	2022-07-06T17:45:27Z	-
dc.date.created	2021-07-15	-
dc.date.issued	2021-05	-
dc.identifier.issn	0000-0000	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/141886	-
dc.description.abstract	As the popularity of deep learning in industry rapidly grows, efficient training of deep neural networks (DNNs) becomes important. To train a DNN with a large amount of data, distributed training with data parallelism has been widely adopted. However, the communication overhead limits the scalability of distributed training. To reduce the overhead, a number of distributed training algorithms have been proposed. The model accuracy and training performance of those algorithms can be different depending on various factors such as cluster settings, training models/datasets, and optimization techniques applied. In order for someone to adopt a distributed training algorithm appropriate for her/his situation, it is required for her/him to fully understand the model accuracy and training performance of these algorithms in various settings. Toward this end, this paper reviews and evaluates seven popular distributed training algorithms (BSP, ASP, SSP, EASGD, AR-SGD, GoSGD, and AD-PSGD) in terms of the model accuracy and training performance in various settings. Specifically, we evaluate those algorithms for two CNN models, in different cluster settings, and with three well-known optimization techniques. Through extensive evaluation and analysis, we made several interesting discoveries. For example, we found out that some distributed training algorithms (SSP, EASGD, and GoSGD) have highly negative impact on the model accuracy because they adopt intermittent and asymmetric communication to improve training performance; the communication overhead of some centralized algorithms (ASP and SSP) is much higher than we expected in a cluster setting with limited network bandwidth because of the PS bottleneck problem. These findings, and many more in the paper, can guide the adoption of proper distributed training algorithms in industry; our findings can be useful in academia as well for designing new distributed training algorithms.	-
dc.language	영어	-
dc.language.iso	en	-
dc.publisher	IEEE	-
dc.title	An In-Depth Analysis of Distributed Training of Deep Neural Networks	-
dc.type	Article	-
dc.contributor.affiliatedAuthor	Seo, Jiwon	-
dc.identifier.doi	10.1109/IPDPS49936.2021.00108	-
dc.identifier.scopusid	2-s2.0-85113461901	-
dc.identifier.wosid	000695273000100	-
dc.identifier.bibliographicCitation	Proceedings - 2021 IEEE 35th International Parallel and Distributed Processing Symposium, IPDPS 2021, pp.994 - 1003	-
dc.relation.isPartOf	Proceedings - 2021 IEEE 35th International Parallel and Distributed Processing Symposium, IPDPS 2021	-
dc.citation.title	Proceedings - 2021 IEEE 35th International Parallel and Distributed Processing Symposium, IPDPS 2021	-
dc.citation.startPage	994	-
dc.citation.endPage	1003	-
dc.type.rims	ART	-
dc.type.docType	Proceeding	-
dc.description.journalClass	1	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Computer Science	-
dc.relation.journalWebOfScienceCategory	Computer Science, Hardware & Architecture	-
dc.relation.journalWebOfScienceCategory	Computer Science, Software Engineering	-
dc.relation.journalWebOfScienceCategory	Computer Science, Theory & Methods	-
dc.subject.keywordPlus	Deep learning	-
dc.subject.keywordPlus	Deep neural networks	-
dc.subject.keywordPlus	Neural networks	-
dc.subject.keywordPlus	Bottleneck problem	-
dc.subject.keywordPlus	Centralized algorithms	-
dc.subject.keywordPlus	Communication overheads	-
dc.subject.keywordPlus	Evaluation and analysis	-
dc.subject.keywordPlus	In-depth analysis	-
dc.subject.keywordPlus	Network bandwidth	-
dc.subject.keywordPlus	Optimization techniques	-
dc.subject.keywordPlus	Training algorithms	-
dc.subject.keywordPlus	Clustering algorithms	-
dc.subject.keywordAuthor	deep learning	-
dc.subject.keywordAuthor	distributed training algorithm	-
dc.identifier.url	https://ieeexplore.ieee.org/document/9460556	-

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Seo, Ji won photo

Seo, Ji won: COLLEGE OF ENGINEERING (SCHOOL OF COMPUTER SCIENCE)

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :6,000,740; Today View :27,436

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1365

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE