TrackIME: Enhanced Video Point Tracking via Instance Motion Estimationopen access
- Authors
- Park, Seong Hyeon; Jang, Huiwon; Jeon, Byungwoo; Yun, Sukmin; Seo, Paul Hongsuck; Shin, Jinwoo
- Issue Date
- Sep-2024
- Publisher
- Neural information processing systems foundation
- Citation
- Advances in Neural Information Processing Systems, v.37
- Indexed
- SCOPUS
- Journal Title
- Advances in Neural Information Processing Systems
- Volume
- 37
- URI
- https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/123728
- ISSN
- 1049-5258
- Abstract
- Tracking points in video frames is essential for understanding video content. However, the task is fundamentally hindered by the computation demands for brute-force correspondence matching across the frames. As the current models down-sample the frame resolutions to mitigate this challenge, they fall short in accurately representing point trajectories due to information truncation. Instead, we address the challenge by pruning the search space for point tracking and let the model process only the important regions of the frames without down-sampling. Our first key idea is to identify the object instance and its trajectory over the frames, then prune the regions of the frame that do not contain the instance. Concretely, to estimate the instance's trajectory, we track a group of points on the instance and aggregate their motion trajectories. Furthermore, to deal with the occlusions in complex scenes, we propose to compensate for the occluded points while tracking. To this end, we introduce a unified framework that jointly performs point tracking and segmentation, providing synergistic effects between the two tasks. For example, the segmentation results enable a tracking model to avoid the occluded points referring to the instance mask, and conversely, the improved tracking results can help to produce more accurate segmentation masks. Our framework can be easily incorporated with various tracking models, and we demonstrate its efficacy for enhanced point tracking throughout extensive experiments. For example, on the recent TAP-Vid benchmark, our framework consistently improves all baselines, e.g., up to 13.5% improvement on the average Jaccard metric. The project url is https://trackime.github.io/. © 2024 Neural information processing systems foundation. All rights reserved.
- Files in This Item
-
Go to Link
- Appears in
Collections - COLLEGE OF COMPUTING > DEPARTMENT OF ARTIFICIAL INTELLIGENCE > 1. Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.