Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Weight vector selection methods by hypervolume maximization in the Pareto front for single policy multi-objective reinforcement learning

Full metadata record
DC Field Value Language
dc.contributor.authorLee, Seonjae-
dc.contributor.authorLee, Myoung Hoon-
dc.contributor.authorMoon, Jun-
dc.date.accessioned2025-12-26T05:00:42Z-
dc.date.available2025-12-26T05:00:42Z-
dc.date.issued2026-01-
dc.identifier.issn0957-4174-
dc.identifier.issn1873-6793-
dc.identifier.urihttps://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/210100-
dc.description.abstractEffectively solving Multi-Objective Reinforcement Learning (MORL) problems is crucial in real-world applications, such as robotics and autonomous systems, where multiple conflicting objectives must be optimized. Single-Policy MORL (SPMORL), which relies on a single policy network, struggles to learn a diverse set of weight vectors that represent the importance of each objective. In contrast, Multi-Policy MORL (MPMORL) trains multiple policy networks to handle different weight vectors. However, this approach is computationally expensive and can be inefficient, as the networks may not share experiences during training, leading to redundant learning and slower convergence. In this paper, we propose a Weight Vector Selection (WVS) algorithm for SPMORL, enhancing its ability to explore the weight vector space efficiently using the polar coordinate system, the Jacobian matrix, and the ellipsoid function. The key idea of WVS is to provide a weight vector selection criterion that enables SPMORL to approximate the Pareto front more effectively while maintaining computational efficiency. We introduce three novel WVS algorithms: WVS-Polar, WVS-Jacob, and WVS-Ellip. Specifically, WVS-Polar employs the polar coordinate system to estimate the Pareto front, WVS-Jacob utilizes the Jacobian matrix and the derivative of the Pareto front, and WVS-Ellip determines the center of an ellipsoid based on nearby Pareto-optimal points. Integrated with SPMORL, these WVS algorithms iteratively perform two learning stages. First, the agent selects a target weight vector using its respective approximation method. Then, the agent optimizes the selected weight vector, collects the corresponding Pareto-optimal point, and updates the Pareto front. This approach enables SPMORL to efficiently learn multiple weight vectors while maintaining the benefits of using a single policy network. To evaluate the effectiveness of WVS-SPMORL, we conduct extensive experiments in both simulated and real-world environments. In OpenAI-MuJoCo simulations, WVS-SPMORL outperforms baseline SPMORL algorithms in Pareto front approximation. In real-world robot arm tasks, we integrate an offline Reinforcement Learning (RL) mechanism, demonstrating that WVS-SPMORL successfully learns and manages the entire Pareto front. As shown in the attached experimental video, our approach allows the agent to continuously improve its performance in both online RL simulations and offline RL experiments. These results confirm that WVS-SPMORL significantly enhances learning efficiency and performance in MORL.-
dc.format.extent19-
dc.language영어-
dc.language.isoENG-
dc.publisherElsevier-
dc.titleWeight vector selection methods by hypervolume maximization in the Pareto front for single policy multi-objective reinforcement learning-
dc.typeArticle-
dc.publisher.location영국-
dc.identifier.doi10.1016/j.eswa.2025.129070-
dc.identifier.scopusid2-s2.0-105011500225-
dc.identifier.wosid001540737900001-
dc.identifier.bibliographicCitationExpert Systems with Applications, v.296, pp 1 - 19-
dc.citation.titleExpert Systems with Applications-
dc.citation.volume296-
dc.citation.startPage1-
dc.citation.endPage19-
dc.type.docTypeArticle-
dc.description.isOpenAccessN-
dc.description.journalRegisteredClassscie-
dc.description.journalRegisteredClassscopus-
dc.relation.journalResearchAreaComputer Science-
dc.relation.journalResearchAreaEngineering-
dc.relation.journalResearchAreaOperations Research & Management Science-
dc.relation.journalWebOfScienceCategoryComputer Science, Artificial Intelligence-
dc.relation.journalWebOfScienceCategoryEngineering, Electrical & Electronic-
dc.relation.journalWebOfScienceCategoryOperations Research & Management Science-
dc.subject.keywordPlusApproximation algorithms-
dc.subject.keywordPlusIterative methods-
dc.subject.keywordPlusJacobian matrices-
dc.subject.keywordPlusLearning algorithms-
dc.subject.keywordPlusMachine learning-
dc.subject.keywordPlusMultiobjective optimization-
dc.subject.keywordPlusPareto principle-
dc.subject.keywordPlusRobots-
dc.subject.keywordPlusVector spaces-
dc.subject.keywordPlusVectors-
dc.subject.keywordAuthorMulti-objective reinforcement learning-
dc.subject.keywordAuthorHypervolume maximization-
dc.subject.keywordAuthorPareto optimality-
dc.subject.keywordAuthorWeight vector selection-
dc.identifier.urlhttps://www.sciencedirect.com/science/article/pii/S0957417425026879?via%3Dihub-
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > 서울 전기공학전공 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Moon, Jun photo

Moon, Jun
COLLEGE OF ENGINEERING (MAJOR IN ELECTRICAL ENGINEERING)
Read more

Altmetrics

Total Views & Downloads

BROWSE