Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

V-PRUNE: Semantic-Aware Patch Pruning Before Tokenization in Vision-Language Model Inferenceopen accessV-PRUNE: Semantic-Aware Patch Pruning Before Tokenization in Vision–Language Model Inference

Other Titles
V-PRUNE: Semantic-Aware Patch Pruning Before Tokenization in Vision–Language Model Inference
Authors
Seo, HyeinChoi, Yong Suk
Issue Date
Aug-2025
Publisher
MDPI
Keywords
vision-language models; efficient vision transformers; feature pruning; visual question answering
Citation
Applied Sciences-basel, v.15, no.17, pp 1 - 15
Pages
15
Indexed
SCIE
SCOPUS
Journal Title
Applied Sciences-basel
Volume
15
Number
17
Start Page
1
End Page
15
URI
https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/208860
DOI
10.3390/app15179463
ISSN
2076-3417
2076-3417
Abstract
Recent vision-language models (VLMs) achieve strong performance across multimodal benchmarks but suffer from high inference costs due to the large number of visual tokens. Prior studies have shown that many image tokens receive consistently low attention scores during inference, indicating that a substantial portion of visual content contributes little to final predictions. These observations raise questions about the efficiency of conventional token pruning strategies, which are typically applied after all attention operations and depend on late-emerging attention scores. To address this, we propose V-PRUNE, a semantic-aware patch-level pruning framework for vision-language models that removes redundant content before tokenization. By evaluating local similarity via color and histogram statistics, our method enables lightweight and interpretable pruning without architectural changes. Applied to CLIP-based models, our approach reduces FLOPs and inference time across vision-language understanding tasks, while maintaining or improving accuracy. Qualitative results further confirm that essential regions are preserved and the pruning behavior is human-aligned, making our method a practical solution for efficient VLM inference.
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Choi, Yong Suk photo

Choi, Yong Suk
COLLEGE OF ENGINEERING (SCHOOL OF COMPUTER SCIENCE)
Read more

Altmetrics

Total Views & Downloads

BROWSE