Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Language and vision based person re-identification for surveillance systems using deep learning with LIP layers

Authors
Bukhari, M.Yasmin, S.Naz, S.Maqsood, M.Rew, J.Rho, Seungmin
Issue Date
Apr-2023
Publisher
Elsevier Ltd
Keywords
Deep learning; Language and vision based Re-ID; Person re-identification; Surveillance
Citation
Image and Vision Computing, v.132
Journal Title
Image and Vision Computing
Volume
132
URI
https://scholarworks.bwise.kr/cau/handle/2019.sw.cau/69976
DOI
10.1016/j.imavis.2023.104658
ISSN
0262-8856
1872-8138
Abstract
Real-time surveillance systems have become a necessity of today's life owing to their relevance in the contemporary era for security reasons to ensure a secure and safe environment. Presently, Person re-identification (Re-ID)-based surveillance systems are becoming increasingly more prevalent and sophisticated since they do not require human intervention and are more reliable to deploy in public spaces leveraging multi-camera networks. However, one of the major problems in Person ReID is the visual appearance i-e the appearance of a person in an image is greatly affected by different camera views. As a result, the discriminative set of features must be learned in a deep learning model in order to re-identify persons from opposing camera viewpoints. To address this challenge, we propose an image/text-retrieval-based Person ReId method in which both visual and text-based features are exploited to carry out person re-identification. More precisely, the textual descriptions of the images are taken into account as text features with Glove Word Embedding followed by 1D-MAPCNN and fused with image-level features extracted using the GoogLeNet model. In addition, the feature discriminability is enhanced using local importance-based pooling (LIP) layers in which adaptive significance weights are learned during downsampling. Moreover, from two different modalities, feature refinement is done during training with the help of attention mechanisms using the Convolutional Block Attention module (CBAM) and the proposed shared attention neural network. It is observed that LIP layers along with both vision and textual features are playing a key role in acquiring discriminative features even if the visual appearance of the same person is greatly affected due to camera pose conditions. The proposed method is validated on the CUHK-PADES dataset and has 15.34% and 24.39% rank-1 improvement in text and image-based retrievals. © 2023
Files in This Item
There are no files associated with this item.
Appears in
Collections
College of Business & Economics > Department of Industrial Security > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Rho, Seungmin photo

Rho, Seungmin
경영경제대학 (산업보안학과)
Read more

Altmetrics

Total Views & Downloads

BROWSE