An enhanced deep reinforcement learning approach for efficient, effective, and equitable disaster relief distribution

Ahmad, Moiz; Tayyab, Muhammad; Habib, Muhammad Salman

doi:10.1016/j.engappai.2025.110002

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

An enhanced deep reinforcement learning approach for efficient, effective, and equitable disaster relief distribution

Authors: Ahmad, Moiz; Tayyab, Muhammad; Habib, Muhammad Salman

Issue Date: Mar-2025

Publisher: PERGAMON-ELSEVIER SCIENCE LTD

Keywords: Disaster response; Relief distribution; Proximal policy optimization; Q -learning; Reinforcement learning; Solution quality

Citation: ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, v.143, pp 1 - 28

Pages: 28

Indexed: SCIE
SCOPUS

Journal Title: ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE

Volume: 143

Start Page: 1

End Page: 28

URI: https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/125111

DOI: 10.1016/j.engappai.2025.110002

ISSN: 0952-1976
1873-6769

Abstract: Efficient disaster response, especially within the critical initial 72 h, is crucial for saving lives. However, allocating relief goods effectively to affected areas remains a complex challenge due to uncertainty, limited resources, and dynamic needs. This study addresses this challenge by proposing a multi-period integer nonlinear programming model for efficient, effective, and equitable distribution of relief goods during disaster response phase. To optimize relief allocation within entire 72-h, a novel decision-making approach is proposed that leverages the proximal policy optimization (PPO) algorithm. It uses deep residual neural networks for state-value and optimal action prediction with 5 value and 4 policy residual layers. Additionally, an algorithm-agnostic termination criterion based on episodic reward stall ensures effective convergence detection without requiring prior knowledge of optimal solution. The provided model and solution methods are validated through 30 hypothetical problem instances and a realistic earthquake response case study. The results demonstrate the superiority of proposed approach compared to traditional methods like dynamic programming, state-action-reward- state-action (SARSA), and Q-learning, in terms of both solution quality and sample efficiency. Notably, the deep residual networks and proposed termination criterion enable the PPO algorithm to achieve an average optimality gap of less than 10% for the majority of instances with consistent hyperparameters, while exhibiting significant sample efficiency gains, particularly for large-scale problems. This research empowers disaster managers with an efficient and timely relief delivery plan, ultimately contributing to saving lives in the face of disaster. Moreover, proposed termination criterion may improve the performance of reinforcement learning in other application areas.

Files in This Item: There are no files associated with this item.

Appears in Collections: ETC > 1. Journal Articles

Show full item record

qrcode

Related Researcher

Researcher HABIB, MUHAMMAD SALMAN photo

HABIB, MUHAMMAD SALMAN: ERICA부총장 한양인재개발원 (ERICA 창의융합교육원)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

55 Hanyangdeahak-ro, Sangnok-gu, Ansan, Gyeonggi-do, 15588, Korea+82-31-400-4269 sweetbrain@hanyang.ac.kr

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE