Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

SWAD: Domain Generalization by Seeking Flat Minima

Authors
Cha, JunbumChun, SanghyukLee, KyungjaeCho, Han-CheolPark, SeunghyunLee, YunsungPark, Sungrae
Issue Date
2021
Publisher
Neural information processing systems foundation
Citation
Advances in Neural Information Processing Systems, v.27, pp 22405 - 22418
Pages
14
Journal Title
Advances in Neural Information Processing Systems
Volume
27
Start Page
22405
End Page
22418
URI
https://scholarworks.bwise.kr/cau/handle/2019.sw.cau/62643
ISSN
1049-5258
Abstract
Domain generalization (DG) methods aim to achieve generalizability to an unseen target domain by using only training data from the source domains. Although a variety of DG methods have been proposed, a recent study shows that under a fair evaluation protocol, called DomainBed, the simple empirical risk minimization (ERM) approach works comparable to or even outperforms previous methods. Unfortunately, simply solving ERM on a complex, non-convex loss function can easily lead to sub-optimal generalizability by seeking sharp minima. In this paper, we theoretically show that finding flat minima results in a smaller domain generalization gap. We also propose a simple yet effective method, named Stochastic Weight Averaging Densely (SWAD), to find flat minima. SWAD finds flatter minima and suffers less from overfitting than does the vanilla SWA by a dense and overfit-aware stochastic weight sampling strategy. SWAD shows state-of-the-art performances on five DG benchmarks, namely PACS, VLCS, OfficeHome, TerraIncognita, and DomainNet, with consistent and large margins of +1.6% averagely on out-of-domain accuracy. We also compare SWAD with conventional generalization methods, such as data augmentation and consistency regularization methods, to verify that the remarkable performance improvements are originated from by seeking flat minima, not from better in-domain generalizability. Last but not least, SWAD is readily adaptable to existing DG methods without modification; the combination of SWAD and an existing DG method further improves DG performances. Source code is available at https://github.com/khanrc/swad. © 2021 Neural information processing systems foundation. All rights reserved.
Files in This Item
There are no files associated with this item.
Appears in
Collections
College of Software > School of Computer Science and Engineering > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetrics

Total Views & Downloads

BROWSE