V-PRUNE: Semantic-Aware Patch Pruning Before Tokenization in Vision-Language Model Inferenceopen accessV-PRUNE: Semantic-Aware Patch Pruning Before Tokenization in Vision–Language Model Inference
- Other Titles
- V-PRUNE: Semantic-Aware Patch Pruning Before Tokenization in Vision–Language Model Inference
- Authors
- Seo, Hyein; Choi, Yong Suk
- Issue Date
- Aug-2025
- Publisher
- MDPI
- Keywords
- vision-language models; efficient vision transformers; feature pruning; visual question answering
- Citation
- Applied Sciences-basel, v.15, no.17, pp 1 - 15
- Pages
- 15
- Indexed
- SCIE
SCOPUS
- Journal Title
- Applied Sciences-basel
- Volume
- 15
- Number
- 17
- Start Page
- 1
- End Page
- 15
- URI
- https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/208860
- DOI
- 10.3390/app15179463
- ISSN
- 2076-3417
2076-3417
- Abstract
- Recent vision-language models (VLMs) achieve strong performance across multimodal benchmarks but suffer from high inference costs due to the large number of visual tokens. Prior studies have shown that many image tokens receive consistently low attention scores during inference, indicating that a substantial portion of visual content contributes little to final predictions. These observations raise questions about the efficiency of conventional token pruning strategies, which are typically applied after all attention operations and depend on late-emerging attention scores. To address this, we propose V-PRUNE, a semantic-aware patch-level pruning framework for vision-language models that removes redundant content before tokenization. By evaluating local similarity via color and histogram statistics, our method enables lightweight and interpretable pruning without architectural changes. Applied to CLIP-based models, our approach reduces FLOPs and inference time across vision-language understanding tasks, while maintaining or improving accuracy. Qualitative results further confirm that essential regions are preserved and the pruning behavior is human-aligned, making our method a practical solution for efficient VLM inference.
- Files in This Item
-
Go to Link
- Appears in
Collections - 서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.