Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Depth cue fusion for event-based stereo depth estimation

Authors
Ghosh, Dipon KumarJung, Yong Ju
Issue Date
May-2025
Publisher
ELSEVIER
Keywords
Event camera; Depth cue fusion; Stereo matching; Monocular depth; Structure from motion
Citation
INFORMATION FUSION, v.117
Journal Title
INFORMATION FUSION
Volume
117
URI
https://scholarworks.bwise.kr/gachon/handle/2020.sw.gachon/94332
DOI
10.1016/j.inffus.2024.102891
ISSN
1566-2535
1872-6305
Abstract
Inspired by the biological retina, event cameras utilize dynamic vision sensors to capture pixel intensity changes asynchronously. Event cameras offer numerous advantages, such as high dynamic range, high temporal resolution, less motion blur, and low power consumption. These features make event cameras particularly well- suited for depth estimation, especially in challenging scenarios involving rapid motion and high dynamic range imaging conditions. The human visual system perceives the scene depth by combining multiple depth cues such as monocular pictorial depth, stereo depth, and motion parallax. However, most existing algorithms of the event-based depth estimation utilize only single depth cue such as either stereo depth or monocular depth. While it is feasible to estimate depth from a single cue, estimating dense disparity in challenging scenarios and lightning conditions remains a challenging problem. Following this, we conduct extensive experiments to explore various methods for the depth cue fusion. Inspired by the experiment results, in this study, we propose a fusion architecture that systematically incorporates multiple depth cues for the event-based stereo depth estimation. To this end, we propose a depth cue fusion (DCF) network to fuse multiple depth cues by utilizing a novel fusion method called SpadeFormer. The proposed SpadeFormer is a full y context-aware fusion mechanism, which incorporates two modulation techniques (i.e., spatially adaptive denormalization (Spade) and cross-attention) for the depth cue fusion in a transformer block. The adaptive denormalization modulates both input features by adjusting the global statistics of features in across manner, and the modulated features are further fused by the cross-attention technique. Experiments conducted on a real-world dataset show that our method reduces the one-pixel error rate by at least 47.63% (3.708 for the best existing method vs. 1.942 for ours) and the mean absolute error by 40.07% (0.302 for the best existing method vs. 0.181 for ours). The results reveal that the depth cue fusion method outperforms the state-of-the-art methods by significant margins and produces better disparity maps.
Files in This Item
There are no files associated with this item.
Appears in
Collections
ETC > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Jung, Yong Ju photo

Jung, Yong Ju
College of IT Convergence (인공지능학과)
Read more

Altmetrics

Total Views & Downloads

BROWSE