Multi-Modal Excavator Activity Recognition Using Two-Stream CNN-LSTM with RGB and Point Cloud Inputsopen access
- Authors
- Cho, Hyuk Soo; Latif, Kamran; Sharafat, Abubakar; Seo, Jongwon
- Issue Date
- Jul-2025
- Publisher
- MDPI
- Keywords
- excavator; activity recognition; deep learning; multi-modal; two-stream CNN-LSTM; point cloud; data fusion
- Citation
- Applied Sciences-basel, v.15, no.15, pp 1 - 28
- Pages
- 28
- Indexed
- SCIE
SCOPUS
- Journal Title
- Applied Sciences-basel
- Volume
- 15
- Number
- 15
- Start Page
- 1
- End Page
- 28
- URI
- https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/208627
- DOI
- 10.3390/app15158505
- ISSN
- 2076-3417
2076-3417
- Abstract
- Recently, deep learning algorithms have been increasingly applied in construction for activity recognition, particularly for excavators, to automate processes and enhance safety and productivity through continuous monitoring of earthmoving activities. These deep learning algorithms analyze construction videos to classify excavator activities for earthmoving purposes. However, previous studies have solely focused on single-source external videos, which limits the activity recognition capabilities of the deep learning algorithm. This paper introduces a novel multi-modal deep learning-based methodology for recognizing excavator activities, utilizing multi-stream input data. It processes point clouds and RGB images using the two-stream long short-term memory convolutional neural network (CNN-LSTM) method to extract spatiotemporal features, enabling the recognition of excavator activities. A comprehensive dataset comprising 495,000 video frames of synchronized RGB and point cloud data was collected across multiple construction sites under varying conditions. The dataset encompasses five key excavator activities: Approach, Digging, Dumping, Idle, and Leveling. To assess the effectiveness of the proposed method, the performance of the two-stream CNN-LSTM architecture is compared with that of single-stream CNN-LSTM models on the same RGB and point cloud datasets, separately. The results demonstrate that the proposed multi-stream approach achieved an accuracy of 94.67%, outperforming existing state-of-the-art single-stream models, which achieved 90.67% accuracy for the RGB-based model and 92.00% for the point cloud-based model. These findings underscore the potential of the proposed activity recognition method, making it highly effective for automatic real-time monitoring of excavator activities, thereby laying the groundwork for future integration into digital twin systems for proactive maintenance and intelligent equipment management.
- Files in This Item
-
Go to Link
- Appears in
Collections - 서울 공과대학 > 서울 건설환경공학과 > 1. Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.