Excavator activity recognition under occlusion via multi-camera deep learningopen access
- Authors
- Sharafat, Abubakar; Latif, Kamran; Deng, Tao; Seo, Jongwon
- Issue Date
- Mar-2026
- Publisher
- Elsevier B.V.
- Keywords
- Activity recognition; Deep learning; Excavator; Multi-camera input; Occlusion; Two-stream convolutional neural networks
- Citation
- Results in Engineering, v.29, pp 1 - 17
- Pages
- 17
- Indexed
- SCOPUS
ESCI
- Journal Title
- Results in Engineering
- Volume
- 29
- Start Page
- 1
- End Page
- 17
- URI
- https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/211568
- DOI
- 10.1016/j.rineng.2025.108611
- ISSN
- 2590-1230
2590-1230
- Abstract
- Accurate recognition of excavator activities is essential for automating construction processes. However, existing single-camera vision-based recognition methods tend to lose their effectiveness under occlusion. Occlusions are inherent in earthwork operations and are caused by various obstructions or the nature of earthwork operations, disrupting critical visual and motion cues. To address this challenge, this study presents a novel deep learning-based methodology designed to overcome these limitations through a multi-camera, two-stream Convolutional Neural Network-Long Short-Term Memory (CNN-LSTM) architecture for excavator activity recognition under occlusion. The proposed approach uses two synchronized cameras-external and in-cabin-each providing RGB and optical flow inputs. Each video data is processed through a dedicated CNN to extract spatial and motion features, which are fused and passed to a Long Short-Term Memory (LSTM) to capture temporal dependencies. A fully connected layer then classifies five excavator activities. To evaluate performance, three datasets-representing no occlusion, partial occlusion, and full occlusion scenarios-were curated to assess their performance under different levels of occlusion and compared with an existing single-camera CNN-LSTM approach using identical settings. The proposed method demonstrated recognition accuracies of 92.38 % and 91.43 % for the partial and full occlusion datasets, resulting in minor decreases of 2.97 % and 3.92 %. In comparison, a single-camera approach exhibits a notable accuracy reduction of approximately 6.0 % and 10.0 % for partial and full occlusions, respectively. These findings highlighted a significant improvement in the robustness and reliability of the multi-camera approach for excavator activity recognition in occluded real-world construction environments.
- Files in This Item
-
Go to Link
- Appears in
Collections - 서울 공과대학 > 서울 건설환경공학과 > 1. Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.