Action Recognition Network Using Stacked Short-Term Deep Features and Bidirectional Moving Averageopen access
- Authors
- Ha, Jinsol; Shin, Joongchol; Park, Hasil; Paik, Joonki
- Issue Date
- Jun-2021
- Publisher
- MDPI
- Keywords
- action recognition; three-dimensional convolution (C3D); short-term pixel-difference; bidirectional moving average
- Citation
- APPLIED SCIENCES-BASEL, v.11, no.12
- Journal Title
- APPLIED SCIENCES-BASEL
- Volume
- 11
- Number
- 12
- URI
- https://scholarworks.bwise.kr/cau/handle/2019.sw.cau/48329
- DOI
- 10.3390/app11125563
- ISSN
- 2076-3417
2076-3417
- Abstract
- Action recognition requires the accurate analysis of action elements in the form of a video clip and a properly ordered sequence of the elements. To solve the two sub-problems, it is necessary to learn both spatio-temporal information and the temporal relationship between different action elements. Existing convolutional neural network (CNN)-based action recognition methods have focused on learning only spatial or temporal information without considering the temporal relation between action elements. In this paper, we create short-term pixel-difference images from the input video, and take the difference images as an input to a bidirectional exponential moving average sub-network to analyze the action elements and their temporal relations. The proposed method consists of: (i) generation of RGB and differential images, (ii) extraction of deep feature maps using an image classification sub-network, (iii) weight assignment to extracted feature maps using a bidirectional, exponential, moving average sub-network, and (iv) late fusion with a three-dimensional convolutional (C3D) sub-network to improve the accuracy of action recognition. Experimental results show that the proposed method achieves a higher performance level than existing baseline methods. In addition, the proposed action recognition network takes only 0.075 seconds per action class, which guarantees various high-speed or real-time applications, such as abnormal action classification, human-computer interaction, and intelligent visual surveillance.
- Files in This Item
-
- Appears in
Collections - Graduate School of Advanced Imaging Sciences, Multimedia and Film > Department of Imaging Science and Arts > 1. Journal Articles
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.