Deep Reinforcement Learning Multi-UAV Trajectory Control for Target Tracking

Moon, Jiseon; Papaioannou, Savvas; Laoudias, Christos; Kolios, Panayiotis; Kim, Sunwoo

doi:10.1109/JIOT.2021.3073973

Detailed Information

Cited 5 time in webofscience

Cited 5 time in scopus

Metadata Downloads

Deep Reinforcement Learning Multi-UAV Trajectory Control for Target Tracking

Authors: Moon, Jiseon; Papaioannou, Savvas; Laoudias, Christos; Kolios, Panayiotis; Kim, Sunwoo

Issue Date: Oct-2021

Publisher: IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords: Target tracking; Unmanned aerial vehicles; Reinforcement learning; Navigation; Location awareness; Time measurement; State estimation; Multiagent deep reinforcement learning (DRL); multitarget tracking; unmanned aerial vehicle (UAV)

Citation: IEEE INTERNET OF THINGS JOURNAL, v.8, no.20, pp.15441 - 15455

Indexed: SCIE
SCOPUS

Journal Title: IEEE INTERNET OF THINGS JOURNAL

Volume: 8

Number: 20

Start Page: 15441

End Page: 15455

URI: https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/140623

DOI: 10.1109/JIOT.2021.3073973

ISSN: 2327-4662

Abstract: In this article, we propose a novel deep reinforcement learning (DRL) approach for controlling multiple unmanned aerial vehicles (UAVs) with the ultimate purpose of tracking multiple first responders (FRs) in challenging 3-D environments in the presence of obstacles and occlusions. We assume that the UAVs receive noisy distance measurements from the FRs which are of two types, i.e., Line of Sight (LoS) and non-LoS (NLoS) measurements and which are used by the UAV agents in order to estimate the state (i.e., position) of the FRs. Subsequently, the proposed DRL-based controller selects the optimal joint control actions according to the Cramer-Rao lower bound (CRLB) of the joint measurement likelihood function to achieve high tracking performance. Specifically, the optimal UAV control actions are quantified by the proposed reward function, which considers both the CRLB of the entire system and each UAV's individual contribution to the system, called global reward and difference reward, respectively. Since the UAVs take actions that reduce the CRLB of the entire system, tracking accuracy is improved by ensuring the reception of high quality LoS measurements with high probability. Our simulation results show that the proposed DRL-based UAV controller provides a highly accurate target tracking solution with a very low runtime cost.

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Show full item record

qrcode

Related Researcher

Researcher Kim, Sunwoo photo

Kim, Sunwoo: COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :6,175,167; Today View :3,965

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1365

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE