Weight vector selection methods by hypervolume maximization in the Pareto front for single policy multi-objective reinforcement learning

Lee, Seonjae; Lee, Myoung Hoon; Moon, Jun

doi:10.1016/j.eswa.2025.129070

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Weight vector selection methods by hypervolume maximization in the Pareto front for single policy multi-objective reinforcement learning

Full metadata record

DC Field	Value	Language
dc.contributor.author	Lee, Seonjae	-
dc.contributor.author	Lee, Myoung Hoon	-
dc.contributor.author	Moon, Jun	-
dc.date.accessioned	2025-12-26T05:00:42Z	-
dc.date.available	2025-12-26T05:00:42Z	-
dc.date.issued	2026-01	-
dc.identifier.issn	0957-4174	-
dc.identifier.issn	1873-6793	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/210100	-
dc.description.abstract	Effectively solving Multi-Objective Reinforcement Learning (MORL) problems is crucial in real-world applications, such as robotics and autonomous systems, where multiple conflicting objectives must be optimized. Single-Policy MORL (SPMORL), which relies on a single policy network, struggles to learn a diverse set of weight vectors that represent the importance of each objective. In contrast, Multi-Policy MORL (MPMORL) trains multiple policy networks to handle different weight vectors. However, this approach is computationally expensive and can be inefficient, as the networks may not share experiences during training, leading to redundant learning and slower convergence. In this paper, we propose a Weight Vector Selection (WVS) algorithm for SPMORL, enhancing its ability to explore the weight vector space efficiently using the polar coordinate system, the Jacobian matrix, and the ellipsoid function. The key idea of WVS is to provide a weight vector selection criterion that enables SPMORL to approximate the Pareto front more effectively while maintaining computational efficiency. We introduce three novel WVS algorithms: WVS-Polar, WVS-Jacob, and WVS-Ellip. Specifically, WVS-Polar employs the polar coordinate system to estimate the Pareto front, WVS-Jacob utilizes the Jacobian matrix and the derivative of the Pareto front, and WVS-Ellip determines the center of an ellipsoid based on nearby Pareto-optimal points. Integrated with SPMORL, these WVS algorithms iteratively perform two learning stages. First, the agent selects a target weight vector using its respective approximation method. Then, the agent optimizes the selected weight vector, collects the corresponding Pareto-optimal point, and updates the Pareto front. This approach enables SPMORL to efficiently learn multiple weight vectors while maintaining the benefits of using a single policy network. To evaluate the effectiveness of WVS-SPMORL, we conduct extensive experiments in both simulated and real-world environments. In OpenAI-MuJoCo simulations, WVS-SPMORL outperforms baseline SPMORL algorithms in Pareto front approximation. In real-world robot arm tasks, we integrate an offline Reinforcement Learning (RL) mechanism, demonstrating that WVS-SPMORL successfully learns and manages the entire Pareto front. As shown in the attached experimental video, our approach allows the agent to continuously improve its performance in both online RL simulations and offline RL experiments. These results confirm that WVS-SPMORL significantly enhances learning efficiency and performance in MORL.	-
dc.format.extent	19	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	Elsevier	-
dc.title	Weight vector selection methods by hypervolume maximization in the Pareto front for single policy multi-objective reinforcement learning	-
dc.type	Article	-
dc.publisher.location	영국	-
dc.identifier.doi	10.1016/j.eswa.2025.129070	-
dc.identifier.scopusid	2-s2.0-105011500225	-
dc.identifier.wosid	001540737900001	-
dc.identifier.bibliographicCitation	Expert Systems with Applications, v.296, pp 1 - 19	-
dc.citation.title	Expert Systems with Applications	-
dc.citation.volume	296	-
dc.citation.startPage	1	-
dc.citation.endPage	19	-
dc.type.docType	Article	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Computer Science	-
dc.relation.journalResearchArea	Engineering	-
dc.relation.journalResearchArea	Operations Research & Management Science	-
dc.relation.journalWebOfScienceCategory	Computer Science, Artificial Intelligence	-
dc.relation.journalWebOfScienceCategory	Engineering, Electrical & Electronic	-
dc.relation.journalWebOfScienceCategory	Operations Research & Management Science	-
dc.subject.keywordPlus	Approximation algorithms	-
dc.subject.keywordPlus	Iterative methods	-
dc.subject.keywordPlus	Jacobian matrices	-
dc.subject.keywordPlus	Learning algorithms	-
dc.subject.keywordPlus	Machine learning	-
dc.subject.keywordPlus	Multiobjective optimization	-
dc.subject.keywordPlus	Pareto principle	-
dc.subject.keywordPlus	Robots	-
dc.subject.keywordPlus	Vector spaces	-
dc.subject.keywordPlus	Vectors	-
dc.subject.keywordAuthor	Multi-objective reinforcement learning	-
dc.subject.keywordAuthor	Hypervolume maximization	-
dc.subject.keywordAuthor	Pareto optimality	-
dc.subject.keywordAuthor	Weight vector selection	-
dc.identifier.url	https://www.sciencedirect.com/science/article/pii/S0957417425026879?via%3Dihub	-

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 전기공학전공 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Moon, Jun photo

Moon, Jun: COLLEGE OF ENGINEERING (MAJOR IN ELECTRICAL ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE