Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

An enhanced deep reinforcement learning approach for efficient, effective, and equitable disaster relief distribution

Authors
Ahmad, MoizTayyab, MuhammadHabib, Muhammad Salman
Issue Date
Mar-2025
Publisher
PERGAMON-ELSEVIER SCIENCE LTD
Keywords
Disaster response; Relief distribution; Proximal policy optimization; Q -learning; Reinforcement learning; Solution quality
Citation
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, v.143, pp 1 - 28
Pages
28
Indexed
SCIE
SCOPUS
Journal Title
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE
Volume
143
Start Page
1
End Page
28
URI
https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/125111
DOI
10.1016/j.engappai.2025.110002
ISSN
0952-1976
1873-6769
Abstract
Efficient disaster response, especially within the critical initial 72 h, is crucial for saving lives. However, allocating relief goods effectively to affected areas remains a complex challenge due to uncertainty, limited resources, and dynamic needs. This study addresses this challenge by proposing a multi-period integer nonlinear programming model for efficient, effective, and equitable distribution of relief goods during disaster response phase. To optimize relief allocation within entire 72-h, a novel decision-making approach is proposed that leverages the proximal policy optimization (PPO) algorithm. It uses deep residual neural networks for state-value and optimal action prediction with 5 value and 4 policy residual layers. Additionally, an algorithm-agnostic termination criterion based on episodic reward stall ensures effective convergence detection without requiring prior knowledge of optimal solution. The provided model and solution methods are validated through 30 hypothetical problem instances and a realistic earthquake response case study. The results demonstrate the superiority of proposed approach compared to traditional methods like dynamic programming, state-action-reward- state-action (SARSA), and Q-learning, in terms of both solution quality and sample efficiency. Notably, the deep residual networks and proposed termination criterion enable the PPO algorithm to achieve an average optimality gap of less than 10% for the majority of instances with consistent hyperparameters, while exhibiting significant sample efficiency gains, particularly for large-scale problems. This research empowers disaster managers with an efficient and timely relief delivery plan, ultimately contributing to saving lives in the face of disaster. Moreover, proposed termination criterion may improve the performance of reinforcement learning in other application areas.
Files in This Item
There are no files associated with this item.
Appears in
Collections
ETC > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher HABIB, MUHAMMAD SALMAN photo

HABIB, MUHAMMAD SALMAN
ERICA부총장 한양인재개발원 (ERICA 창의융합교육원)
Read more

Altmetrics

Total Views & Downloads

BROWSE