MVTamperBench: Evaluating Robustness of Vision-Language Modelsopen access
- Authors
- Agarwal, Amit; Panda, Srikant; Charles, Angeline; Patel, Hitesh Laxmichand; Kumar, Bhargava; Pattnayak, Priyaranjan; Rafi, Taki Hasan; Kumar, Tejaswini; Meghwani, Hansa; Gupta, Karan; Chae, Dong-kyu
- Issue Date
- Jul-2025
- Publisher
- Association for Computational Linguistics
- Citation
- Findings of the Association for Computational Linguistics: ACL 2025, pp 18804 - 18828
- Pages
- 25
- Indexed
- SCOPUS
- Journal Title
- Findings of the Association for Computational Linguistics: ACL 2025
- Start Page
- 18804
- End Page
- 18828
- URI
- https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/210871
- DOI
- 10.18653/v1/2025.findings-acl.963
- ISSN
- 0736-587X
- Abstract
- Multimodal Large Language Models (MLLMs), are recent advancement of Vision-Language Models (VLMs) that have driven major advances in video understanding. However, their vulnerability to adversarial tampering and manipulations remains underexplored. To address this gap, we introduce MVTamperBench, a benchmark that systematically evaluates MLLM robustness against five prevalent tampering techniques: rotation, masking, substitution, repetition, and dropping; based on real-world visual tampering scenarios such as surveillance interference, social media content edits, and misinformation injection. MVTamperBench comprises ~3.4K original videos, expanded into over ~17K tampered clips covering 19 distinct video manipulation tasks. This benchmark challenges models to detect manipulations in spatial and temporal coherence. We evaluate 45 recent MLLMs from 15+ model families. We reveal substantial variability in resilience across tampering types and show that larger parameter counts do not necessarily guarantee robustness. MVTamperBench sets a new benchmark for developing tamper-resilient MLLM in safety-critical applications, including detecting clickbait, preventing harmful content distribution, and enforcing policies on media platforms. We release all code, data, and benchmark to foster open research in trustworthy video understanding.
- Files in This Item
-
Go to Link
- Appears in
Collections - 서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.