Projected variable three-term conjugate gradient algorithm for enhancing generalization performance in deep neural network training

Kim, Sanghyuk; Kim, Hansu; Kang, Namwoo; Lee, Tae Hee

doi:10.1016/j.neucom.2025.131568

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Projected variable three-term conjugate gradient algorithm for enhancing generalization performance in deep neural network training

Full metadata record

DC Field	Value	Language
dc.contributor.author	Kim, Sanghyuk	-
dc.contributor.author	Kim, Hansu	-
dc.contributor.author	Kang, Namwoo	-
dc.contributor.author	Lee, Tae Hee	-
dc.date.accessioned	2025-10-22T01:00:08Z	-
dc.date.available	2025-10-22T01:00:08Z	-
dc.date.issued	2025-12	-
dc.identifier.issn	0925-2312	-
dc.identifier.issn	1872-8286	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/208933	-
dc.description.abstract	Deep learning optimization faces a fundamental trade-off between convergence efficiency and generalization. First-order methods such as stochastic gradient descent (SGD) and adaptive moment estimation (Adam) tend to find flatter minima but converge slowly, while higher-order methods converge rapidly but are often drawn to sharp minima that generalize poorly. To address this, we introduce the projected variable three-term conjugate gradient (PVTTCG) algorithm. Motivated by the geometric instabilities in modern networks that use techniques such as batch normalization (BN), PVTTCG integrates an orthogonal projection into the higher-order optimization framework. This mechanism eliminates radial components from the search direction, inherently guiding the optimization toward flatter regions without requiring additional regularization terms or hyperparameters. The effectiveness of PVTTCG is validated across diverse tasks, including language modeling, large-scale image classification, and a real-world engineering application. In complex scenarios, PVTTCG consistently improves upon its higher-order baseline, achieving up to a 3.92 percentage point gain on CIFAR-100 while remaining competitive with leading first-order methods. A systematic analysis reveals that PVTTCG demonstrates superior robustness to batch size variations, particularly excelling at larger batch sizes. This robustness enables the algorithm to process batch sizes up to 2,048 in engineering applications, achieving a 35.9% test loss reduction compared to Adam. These findings establish PVTTCG as an effective solution for bridging the convergence-generalization trade-off.	-
dc.format.extent	19	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	Elsevier BV	-
dc.title	Projected variable three-term conjugate gradient algorithm for enhancing generalization performance in deep neural network training	-
dc.type	Article	-
dc.publisher.location	네델란드	-
dc.identifier.doi	10.1016/j.neucom.2025.131568	-
dc.identifier.scopusid	2-s2.0-105016527430	-
dc.identifier.wosid	001578610400001	-
dc.identifier.bibliographicCitation	Neurocomputing, v.657, pp 1 - 19	-
dc.citation.title	Neurocomputing	-
dc.citation.volume	657	-
dc.citation.startPage	1	-
dc.citation.endPage	19	-
dc.type.docType	Article	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Computer Science	-
dc.relation.journalWebOfScienceCategory	Computer Science, Artificial Intelligence	-
dc.subject.keywordPlus	Conjugate gradient method	-
dc.subject.keywordPlus	Deep neural networks	-
dc.subject.keywordPlus	Image classification	-
dc.subject.keywordPlus	Modeling languages	-
dc.subject.keywordPlus	Optimization	-
dc.subject.keywordPlus	Stochastic systems	-
dc.subject.keywordAuthor	Optimization algorithm	-
dc.subject.keywordAuthor	Generalization performance	-
dc.subject.keywordAuthor	Conjugate gradient method	-
dc.subject.keywordAuthor	Vehicle crashworthiness	-
dc.subject.keywordAuthor	Image classification	-
dc.subject.keywordAuthor	Language modeling	-
dc.identifier.url	https://www.sciencedirect.com/science/article/pii/S0925231225022404?via%3Dihub	-

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 미래자동차공학과 > 1. Journal Articles

Show simple item record

qrcode

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Altmetrics

Total Views & Downloads

BROWSE