Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

MITIGATING PARAMETER INTERFERENCE IN MODEL MERGING VIA SHARPNESS-AWARE FINE-TUNING

Authors
Lee, YeoreumJung, JinwookBaik, Sungyong
Issue Date
Apr-2025
Publisher
International Conference on Learning Representations, ICLR
Citation
13th International Conference on Learning Representations, ICLR 2025, pp 31270 - 31292
Pages
23
Indexed
SCOPUS
Journal Title
13th International Conference on Learning Representations, ICLR 2025
Start Page
31270
End Page
31292
URI
https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/208489
DOI
10.48550/arXiv.2504.14662
Abstract
Large-scale deep learning models with a pretraining-finetuning paradigm have led to a surge of numerous task-specific models fine-tuned from a common pre-trained model. Recently, several research efforts have been made on merging these large models into a single multi-task model, particularly with simple arithmetic on parameters. Such merging methodology faces a central challenge: interference between model parameters fine-tuned on different tasks. Few recent works have focused on designing a new fine-tuning scheme that can lead to small parameter interference, however at the cost of the performance of each task-specific fine-tuned model and thereby limiting that of a merged model. To improve the performance of a merged model, we note that a fine-tuning scheme should aim for (1) smaller parameter interference and (2) better performance of each fine-tuned model on the corresponding task. In this work, we aim to design a new fine-tuning objective function to work towards these two goals. In the course of this process, we find such objective function to be strikingly similar to sharpness-aware minimization (SAM) objective function, which aims to achieve generalization by finding flat minima. Drawing upon our observation, we propose to fine-tune pre-trained models via sharpness-aware minimization. The experimental and theoretical results showcase the effectiveness and orthogonality of our proposed approach, improving performance upon various merging and fine-tuning methods. Our code is available at https://github.com/baiklab/SAFT-Merge.
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > ETC > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Baik, Sungyong photo

Baik, Sungyong
COLLEGE OF ENGINEERING (DEPARTMENT OF INTELLIGENCE COMPUTING)
Read more

Altmetrics

Total Views & Downloads

BROWSE