Language and vision based person re-identification for surveillance systems using deep learning with LIP layers

Bukhari, M.; Yasmin, S.; Naz, S.; Maqsood, M.; Rew, J.; Rho, Seungmin

doi:10.1016/j.imavis.2023.104658

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Language and vision based person re-identification for surveillance systems using deep learning with LIP layers

Authors: Bukhari, M.; Yasmin, S.; Naz, S.; Maqsood, M.; Rew, J.; Rho, Seungmin

Issue Date: Apr-2023

Publisher: Elsevier Ltd

Keywords: Deep learning; Language and vision based Re-ID; Person re-identification; Surveillance

Citation: Image and Vision Computing, v.132

Journal Title: Image and Vision Computing

Volume: 132

URI: https://scholarworks.bwise.kr/cau/handle/2019.sw.cau/69976

DOI: 10.1016/j.imavis.2023.104658

ISSN: 0262-8856
1872-8138

Abstract: Real-time surveillance systems have become a necessity of today's life owing to their relevance in the contemporary era for security reasons to ensure a secure and safe environment. Presently, Person re-identification (Re-ID)-based surveillance systems are becoming increasingly more prevalent and sophisticated since they do not require human intervention and are more reliable to deploy in public spaces leveraging multi-camera networks. However, one of the major problems in Person ReID is the visual appearance i-e the appearance of a person in an image is greatly affected by different camera views. As a result, the discriminative set of features must be learned in a deep learning model in order to re-identify persons from opposing camera viewpoints. To address this challenge, we propose an image/text-retrieval-based Person ReId method in which both visual and text-based features are exploited to carry out person re-identification. More precisely, the textual descriptions of the images are taken into account as text features with Glove Word Embedding followed by 1D-MAPCNN and fused with image-level features extracted using the GoogLeNet model. In addition, the feature discriminability is enhanced using local importance-based pooling (LIP) layers in which adaptive significance weights are learned during downsampling. Moreover, from two different modalities, feature refinement is done during training with the help of attention mechanisms using the Convolutional Block Attention module (CBAM) and the proposed shared attention neural network. It is observed that LIP layers along with both vision and textual features are playing a key role in acquiring discriminative features even if the visual appearance of the same person is greatly affected due to camera pose conditions. The proposed method is validated on the CUHK-PADES dataset and has 15.34% and 24.39% rank-1 improvement in text and image-based retrievals. © 2023

Files in This Item: There are no files associated with this item.

Appears in Collections: College of Business & Economics > Department of Industrial Security > 1. Journal Articles

Show full item record

qrcode

Related Researcher

Researcher Rho, Seungmin photo

Rho, Seungmin: 경영경제대학 (산업보안학과)

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :7,453,214; Today View :3,641

RSS_1.0 RSS_2.0 ATOM_1.0

84, Heukseok-ro, Dongjak-gu, Seoul, Republic of Korea (06974)02-820-6194

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE