Weight vector selection methods by hypervolume maximization in the Pareto front for single policy multi-objective reinforcement learning

Lee, Seonjae; Lee, Myoung Hoon; Moon, Jun

doi:10.1016/j.eswa.2025.129070

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Weight vector selection methods by hypervolume maximization in the Pareto front for single policy multi-objective reinforcement learning

Authors: Lee, Seonjae; Lee, Myoung Hoon; Moon, Jun

Issue Date: Jan-2026

Publisher: Elsevier

Keywords: Multi-objective reinforcement learning; Hypervolume maximization; Pareto optimality; Weight vector selection

Citation: Expert Systems with Applications, v.296, pp 1 - 19

Pages: 19

Indexed: SCIE
SCOPUS

Journal Title: Expert Systems with Applications

Volume: 296

Start Page: 1

End Page: 19

URI: https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/210100

DOI: 10.1016/j.eswa.2025.129070

ISSN: 0957-4174
1873-6793

Abstract: Effectively solving Multi-Objective Reinforcement Learning (MORL) problems is crucial in real-world applications, such as robotics and autonomous systems, where multiple conflicting objectives must be optimized. Single-Policy MORL (SPMORL), which relies on a single policy network, struggles to learn a diverse set of weight vectors that represent the importance of each objective. In contrast, Multi-Policy MORL (MPMORL) trains multiple policy networks to handle different weight vectors. However, this approach is computationally expensive and can be inefficient, as the networks may not share experiences during training, leading to redundant learning and slower convergence. In this paper, we propose a Weight Vector Selection (WVS) algorithm for SPMORL, enhancing its ability to explore the weight vector space efficiently using the polar coordinate system, the Jacobian matrix, and the ellipsoid function. The key idea of WVS is to provide a weight vector selection criterion that enables SPMORL to approximate the Pareto front more effectively while maintaining computational efficiency. We introduce three novel WVS algorithms: WVS-Polar, WVS-Jacob, and WVS-Ellip. Specifically, WVS-Polar employs the polar coordinate system to estimate the Pareto front, WVS-Jacob utilizes the Jacobian matrix and the derivative of the Pareto front, and WVS-Ellip determines the center of an ellipsoid based on nearby Pareto-optimal points. Integrated with SPMORL, these WVS algorithms iteratively perform two learning stages. First, the agent selects a target weight vector using its respective approximation method. Then, the agent optimizes the selected weight vector, collects the corresponding Pareto-optimal point, and updates the Pareto front. This approach enables SPMORL to efficiently learn multiple weight vectors while maintaining the benefits of using a single policy network. To evaluate the effectiveness of WVS-SPMORL, we conduct extensive experiments in both simulated and real-world environments. In OpenAI-MuJoCo simulations, WVS-SPMORL outperforms baseline SPMORL algorithms in Pareto front approximation. In real-world robot arm tasks, we integrate an offline Reinforcement Learning (RL) mechanism, demonstrating that WVS-SPMORL successfully learns and manages the entire Pareto front. As shown in the attached experimental video, our approach allows the agent to continuously improve its performance in both online RL simulations and offline RL experiments. These results confirm that WVS-SPMORL significantly enhances learning efficiency and performance in MORL.

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 전기공학전공 > 1. Journal Articles

Show full item record

qrcode

Related Researcher

Researcher Moon, Jun photo

Moon, Jun: COLLEGE OF ENGINEERING (MAJOR IN ELECTRICAL ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE