Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Projected variable three-term conjugate gradient algorithm for enhancing generalization performance in deep neural network training

Full metadata record
DC Field Value Language
dc.contributor.authorKim, Sanghyuk-
dc.contributor.authorKim, Hansu-
dc.contributor.authorKang, Namwoo-
dc.contributor.authorLee, Tae Hee-
dc.date.accessioned2025-10-22T01:00:08Z-
dc.date.available2025-10-22T01:00:08Z-
dc.date.issued2025-12-
dc.identifier.issn0925-2312-
dc.identifier.issn1872-8286-
dc.identifier.urihttps://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/208933-
dc.description.abstractDeep learning optimization faces a fundamental trade-off between convergence efficiency and generalization. First-order methods such as stochastic gradient descent (SGD) and adaptive moment estimation (Adam) tend to find flatter minima but converge slowly, while higher-order methods converge rapidly but are often drawn to sharp minima that generalize poorly. To address this, we introduce the projected variable three-term conjugate gradient (PVTTCG) algorithm. Motivated by the geometric instabilities in modern networks that use techniques such as batch normalization (BN), PVTTCG integrates an orthogonal projection into the higher-order optimization framework. This mechanism eliminates radial components from the search direction, inherently guiding the optimization toward flatter regions without requiring additional regularization terms or hyperparameters. The effectiveness of PVTTCG is validated across diverse tasks, including language modeling, large-scale image classification, and a real-world engineering application. In complex scenarios, PVTTCG consistently improves upon its higher-order baseline, achieving up to a 3.92 percentage point gain on CIFAR-100 while remaining competitive with leading first-order methods. A systematic analysis reveals that PVTTCG demonstrates superior robustness to batch size variations, particularly excelling at larger batch sizes. This robustness enables the algorithm to process batch sizes up to 2,048 in engineering applications, achieving a 35.9% test loss reduction compared to Adam. These findings establish PVTTCG as an effective solution for bridging the convergence-generalization trade-off.-
dc.format.extent19-
dc.language영어-
dc.language.isoENG-
dc.publisherElsevier BV-
dc.titleProjected variable three-term conjugate gradient algorithm for enhancing generalization performance in deep neural network training-
dc.typeArticle-
dc.publisher.location네델란드-
dc.identifier.doi10.1016/j.neucom.2025.131568-
dc.identifier.scopusid2-s2.0-105016527430-
dc.identifier.wosid001578610400001-
dc.identifier.bibliographicCitationNeurocomputing, v.657, pp 1 - 19-
dc.citation.titleNeurocomputing-
dc.citation.volume657-
dc.citation.startPage1-
dc.citation.endPage19-
dc.type.docTypeArticle-
dc.description.isOpenAccessN-
dc.description.journalRegisteredClassscie-
dc.description.journalRegisteredClassscopus-
dc.relation.journalResearchAreaComputer Science-
dc.relation.journalWebOfScienceCategoryComputer Science, Artificial Intelligence-
dc.subject.keywordPlusConjugate gradient method-
dc.subject.keywordPlusDeep neural networks-
dc.subject.keywordPlusImage classification-
dc.subject.keywordPlusModeling languages-
dc.subject.keywordPlusOptimization-
dc.subject.keywordPlusStochastic systems-
dc.subject.keywordAuthorOptimization algorithm-
dc.subject.keywordAuthorGeneralization performance-
dc.subject.keywordAuthorConjugate gradient method-
dc.subject.keywordAuthorVehicle crashworthiness-
dc.subject.keywordAuthorImage classification-
dc.subject.keywordAuthorLanguage modeling-
dc.identifier.urlhttps://www.sciencedirect.com/science/article/pii/S0925231225022404?via%3Dihub-
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > 서울 미래자동차공학과 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetrics

Total Views & Downloads

BROWSE