SkipReduce: (Interconnection) Network Sparsity to Accelerate Distributed Machine Learning

Kasan, Hans; Abts, Dennis; Choi, Jungwook; Kim, John

doi:10.1145/3725843.3756092

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

SkipReduce: (Interconnection) Network Sparsity to Accelerate Distributed Machine Learning

Authors: Kasan, Hans; Abts, Dennis; Choi, Jungwook; Kim, John

Issue Date: Oct-2025

Citation: IEEE/ACM International Symposium on Microarchitecture (MICRO), v.Part of 213862, pp 643 - 658

Pages: 16

Indexed: SCOPUS

Journal Title: IEEE/ACM International Symposium on Microarchitecture (MICRO)

Volume: Part of 213862

Start Page: 643

End Page: 658

URI: https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/209447

DOI: 10.1145/3725843.3756092

ISSN: 1072-4451

Abstract: The interconnection network is a critical component for building scalable systems, as its communication bandwidth directly impacts the collective communication performance of distributed training. In this work, we exploit interconnection network sparsity (or communication sparsity) to address challenges of communication performance and scalability. In particular, we identify how gradients (or packets) during communication can be randomly skipped with minimal impact on accuracy. However, skipping gradients in fine granularity (or individually) results in a loss of gradient information without improving communication performance, due to the synchronous nature of collective communication. Thus, we propose coarse-grained skipping where gradient slices are skipped, which enables skipping of some AllReduce steps to accelerate communication. In particular, we propose SkipReduce collective communication that intentionally skips random gradients during AllReduce. However, a naive implementation of SkipReduce can degrade accuracy by repeatedly skipping gradients from the same node, which introduces bias. To mitigate this accuracy loss, we show how randomizing the skipped gradient slices improves training accuracy with negligible additional runtime. We also observe that not all layers have similar communication sparsity and propose applying SkipReduce selectively where only the sparse layers (or gradients) are skipped to minimize the accuracy impact of SkipReduce. Compared to prior work on communication acceleration, SkipReduce can be seamlessly integrated into existing collective communication libraries with minimal overhead. We implement SkipReduce on top of NCCL's ring-based AllReduce algorithm. Our results show that this method accelerates collective communication while preserving final training accuracy. Compared to baseline AllReduce, SkipReduce provides up to a 1.58 × speedup in time-to-accuracy. Beyond this performance gain in data parallelism, this work also discusses the broader implications of SkipReduce, including its application to other parallelism strategies and logical topologies, as well as its benefits as a model regularizer.

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Show full item record

qrcode

Related Researcher

Researcher Choi, Jung wook photo

Choi, Jung wook: COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE