Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

SkipReduce: (Interconnection) Network Sparsity to Accelerate Distributed Machine Learning

Full metadata record
DC Field Value Language
dc.contributor.authorKasan, Hans-
dc.contributor.authorAbts, Dennis-
dc.contributor.authorChoi, Jungwook-
dc.contributor.authorKim, John-
dc.date.accessioned2025-12-02T08:00:29Z-
dc.date.available2025-12-02T08:00:29Z-
dc.date.issued2025-10-
dc.identifier.issn1072-4451-
dc.identifier.urihttps://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/209447-
dc.description.abstractThe interconnection network is a critical component for building scalable systems, as its communication bandwidth directly impacts the collective communication performance of distributed training. In this work, we exploit interconnection network sparsity (or communication sparsity) to address challenges of communication performance and scalability. In particular, we identify how gradients (or packets) during communication can be randomly skipped with minimal impact on accuracy. However, skipping gradients in fine granularity (or individually) results in a loss of gradient information without improving communication performance, due to the synchronous nature of collective communication. Thus, we propose coarse-grained skipping where gradient slices are skipped, which enables skipping of some AllReduce steps to accelerate communication. In particular, we propose SkipReduce collective communication that intentionally skips random gradients during AllReduce. However, a naive implementation of SkipReduce can degrade accuracy by repeatedly skipping gradients from the same node, which introduces bias. To mitigate this accuracy loss, we show how randomizing the skipped gradient slices improves training accuracy with negligible additional runtime. We also observe that not all layers have similar communication sparsity and propose applying SkipReduce selectively where only the sparse layers (or gradients) are skipped to minimize the accuracy impact of SkipReduce. Compared to prior work on communication acceleration, SkipReduce can be seamlessly integrated into existing collective communication libraries with minimal overhead. We implement SkipReduce on top of NCCL's ring-based AllReduce algorithm. Our results show that this method accelerates collective communication while preserving final training accuracy. Compared to baseline AllReduce, SkipReduce provides up to a 1.58 × speedup in time-to-accuracy. Beyond this performance gain in data parallelism, this work also discusses the broader implications of SkipReduce, including its application to other parallelism strategies and logical topologies, as well as its benefits as a model regularizer.-
dc.format.extent16-
dc.language영어-
dc.language.isoENG-
dc.titleSkipReduce: (Interconnection) Network Sparsity to Accelerate Distributed Machine Learning-
dc.typeArticle-
dc.publisher.location미국-
dc.identifier.doi10.1145/3725843.3756092-
dc.identifier.scopusid2-s2.0-105021371376-
dc.identifier.bibliographicCitationIEEE/ACM International Symposium on Microarchitecture (MICRO), v.Part of 213862, pp 643 - 658-
dc.citation.titleIEEE/ACM International Symposium on Microarchitecture (MICRO)-
dc.citation.volumePart of 213862-
dc.citation.startPage643-
dc.citation.endPage658-
dc.type.docTypeConference paper-
dc.description.isOpenAccessN-
dc.description.journalRegisteredClassscopus-
dc.subject.keywordPlusArtificial intelligence-
dc.subject.keywordPlusDistributed computer systems-
dc.subject.keywordPlusIntegrated circuit interconnects-
dc.subject.keywordPlusLearning systems-
dc.subject.keywordPlusPersonnel training-
dc.subject.keywordPlusScalability-
dc.subject.keywordPlusTopology-
dc.identifier.urlhttps://dl.acm.org/doi/10.1145/3725843.3756092-
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Choi, Jung wook photo

Choi, Jung wook
COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)
Read more

Altmetrics

Total Views & Downloads

BROWSE