Multi-Modal Excavator Activity Recognition Using Two-Stream CNN-LSTM with RGB and Point Cloud Inputs

Cho, Hyuk Soo; Latif, Kamran; Sharafat, Abubakar; Seo, Jongwon

doi:10.3390/app15158505

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Multi-Modal Excavator Activity Recognition Using Two-Stream CNN-LSTM with RGB and Point Cloud Inputsopen access

Authors: Cho, Hyuk Soo; Latif, Kamran; Sharafat, Abubakar; Seo, Jongwon

Issue Date: Jul-2025

Publisher: MDPI

Keywords: excavator; activity recognition; deep learning; multi-modal; two-stream CNN-LSTM; point cloud; data fusion

Citation: Applied Sciences-basel, v.15, no.15, pp 1 - 28

Pages: 28

Indexed: SCIE
SCOPUS

Journal Title: Applied Sciences-basel

Volume: 15

Number: 15

Start Page: 1

End Page: 28

URI: https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/208627

DOI: 10.3390/app15158505

ISSN: 2076-3417
2076-3417

Abstract: Recently, deep learning algorithms have been increasingly applied in construction for activity recognition, particularly for excavators, to automate processes and enhance safety and productivity through continuous monitoring of earthmoving activities. These deep learning algorithms analyze construction videos to classify excavator activities for earthmoving purposes. However, previous studies have solely focused on single-source external videos, which limits the activity recognition capabilities of the deep learning algorithm. This paper introduces a novel multi-modal deep learning-based methodology for recognizing excavator activities, utilizing multi-stream input data. It processes point clouds and RGB images using the two-stream long short-term memory convolutional neural network (CNN-LSTM) method to extract spatiotemporal features, enabling the recognition of excavator activities. A comprehensive dataset comprising 495,000 video frames of synchronized RGB and point cloud data was collected across multiple construction sites under varying conditions. The dataset encompasses five key excavator activities: Approach, Digging, Dumping, Idle, and Leveling. To assess the effectiveness of the proposed method, the performance of the two-stream CNN-LSTM architecture is compared with that of single-stream CNN-LSTM models on the same RGB and point cloud datasets, separately. The results demonstrate that the proposed multi-stream approach achieved an accuracy of 94.67%, outperforming existing state-of-the-art single-stream models, which achieved 90.67% accuracy for the RGB-based model and 92.00% for the point cloud-based model. These findings underscore the potential of the proposed activity recognition method, making it highly effective for automatic real-time monitoring of excavator activities, thereby laying the groundwork for future integration into digital twin systems for proactive maintenance and intelligent equipment management.

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 건설환경공학과 > 1. Journal Articles

Show full item record

qrcode

Related Researcher

Researcher Seo, Jong won photo

Seo, Jong won: COLLEGE OF ENGINEERING (DEPARTMENT OF CIVIL AND ENVIRONMENTAL ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE