Detailed Information

Cited 0 time in webofscience Cited 1 time in scopus
Metadata Downloads

SHAT: A Novel Asynchronous Training Algorithm That Provides Fast Model Convergence in Distributed Deep Learningopen access

Authors
Ko, YunyongKim, Sang-Wook
Issue Date
Jan-2022
Publisher
MDPI
Keywords
distributed deep learning; data parallelism; PS-based distributed training; heterogeneous environments
Citation
APPLIED SCIENCES-BASEL, v.12, no.1, pp.1 - 14
Indexed
SCIE
SCOPUS
Journal Title
APPLIED SCIENCES-BASEL
Volume
12
Number
1
Start Page
1
End Page
14
URI
https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/138459
DOI
10.3390/app12010292
Abstract
The recent unprecedented success of deep learning (DL) in various fields is underlied by its use of large-scale data and models. Training a large-scale deep neural network (DNN) model with large-scale data, however, is time-consuming. To speed up the training of massive DNN models, data-parallel distributed training based on the parameter server (PS) has been widely applied. In general, a synchronous PS-based training suffers from the synchronization overhead, especially in heterogeneous environments. To reduce the synchronization overhead, asynchronous PS-based training employs the asynchronous communication between PS and workers so that PS processes the request of each worker independently without waiting. Despite the performance improvement of asynchronous training, however, it inevitably incurs the difference among the local models of workers, where such a difference among workers may cause slower model convergence. Fro addressing this problem, in this work, we propose a novel asynchronous PS-based training algorithm, SHAT that considers (1) the scale of distributed training and (2) the heterogeneity among workers for successfully reducing the difference among the local models of workers. The extensive empirical evaluation demonstrates that (1) the model trained by SHAT converges to the higher accuracy up to 5.22% than state-of-the-art algorithms, and (2) the model convergence of SHAT is robust under various heterogeneous environments.
Files in This Item
Appears in
Collections
서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Kim, Sang-Wook photo

Kim, Sang-Wook
COLLEGE OF ENGINEERING (SCHOOL OF COMPUTER SCIENCE)
Read more

Altmetrics

Total Views & Downloads

BROWSE