Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Weight vector selection methods by hypervolume maximization in the Pareto front for single policy multi-objective reinforcement learning

Authors
Lee, SeonjaeLee, Myoung HoonMoon, Jun
Issue Date
Jan-2026
Publisher
Elsevier
Keywords
Multi-objective reinforcement learning; Hypervolume maximization; Pareto optimality; Weight vector selection
Citation
Expert Systems with Applications, v.296, pp 1 - 19
Pages
19
Indexed
SCIE
SCOPUS
Journal Title
Expert Systems with Applications
Volume
296
Start Page
1
End Page
19
URI
https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/210100
DOI
10.1016/j.eswa.2025.129070
ISSN
0957-4174
1873-6793
Abstract
Effectively solving Multi-Objective Reinforcement Learning (MORL) problems is crucial in real-world applications, such as robotics and autonomous systems, where multiple conflicting objectives must be optimized. Single-Policy MORL (SPMORL), which relies on a single policy network, struggles to learn a diverse set of weight vectors that represent the importance of each objective. In contrast, Multi-Policy MORL (MPMORL) trains multiple policy networks to handle different weight vectors. However, this approach is computationally expensive and can be inefficient, as the networks may not share experiences during training, leading to redundant learning and slower convergence. In this paper, we propose a Weight Vector Selection (WVS) algorithm for SPMORL, enhancing its ability to explore the weight vector space efficiently using the polar coordinate system, the Jacobian matrix, and the ellipsoid function. The key idea of WVS is to provide a weight vector selection criterion that enables SPMORL to approximate the Pareto front more effectively while maintaining computational efficiency. We introduce three novel WVS algorithms: WVS-Polar, WVS-Jacob, and WVS-Ellip. Specifically, WVS-Polar employs the polar coordinate system to estimate the Pareto front, WVS-Jacob utilizes the Jacobian matrix and the derivative of the Pareto front, and WVS-Ellip determines the center of an ellipsoid based on nearby Pareto-optimal points. Integrated with SPMORL, these WVS algorithms iteratively perform two learning stages. First, the agent selects a target weight vector using its respective approximation method. Then, the agent optimizes the selected weight vector, collects the corresponding Pareto-optimal point, and updates the Pareto front. This approach enables SPMORL to efficiently learn multiple weight vectors while maintaining the benefits of using a single policy network. To evaluate the effectiveness of WVS-SPMORL, we conduct extensive experiments in both simulated and real-world environments. In OpenAI-MuJoCo simulations, WVS-SPMORL outperforms baseline SPMORL algorithms in Pareto front approximation. In real-world robot arm tasks, we integrate an offline Reinforcement Learning (RL) mechanism, demonstrating that WVS-SPMORL successfully learns and manages the entire Pareto front. As shown in the attached experimental video, our approach allows the agent to continuously improve its performance in both online RL simulations and offline RL experiments. These results confirm that WVS-SPMORL significantly enhances learning efficiency and performance in MORL.
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > 서울 전기공학전공 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Moon, Jun photo

Moon, Jun
COLLEGE OF ENGINEERING (MAJOR IN ELECTRICAL ENGINEERING)
Read more

Altmetrics

Total Views & Downloads

BROWSE