Centralized Position Embeddings for Vision Transformers

Shin, Chanyong; Yun, Ilwi; Lee, Hyunku; Rhee, Chae Eun

doi:10.1109/ACCESS.2025.3629376

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Centralized Position Embeddings for Vision Transformers

Full metadata record

DC Field	Value	Language
dc.contributor.author	Shin, Chanyong	-
dc.contributor.author	Yun, Ilwi	-
dc.contributor.author	Lee, Hyunku	-
dc.contributor.author	Rhee, Chae Eun	-
dc.date.accessioned	2026-04-07T05:00:27Z	-
dc.date.available	2026-04-07T05:00:27Z	-
dc.date.issued	2025-11	-
dc.identifier.issn	2169-3536	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/212077	-
dc.description.abstract	Vision Transformers (ViTs) have achieved remarkable success across various vision tasks. However, ViTs inherently lack spatial inductive biases, necessitating explicit position embedding (PE) schemes. Recently, many studies have adopted non-fixed length position embeddings (nFPEs) over traditional absolute or relative PEs. These nFPEs, typically implemented using inductive modules like convolutional layers, offer advantages such as adaptability to varying token sequence lengths and the potential for translation equivariance. However, our analysis reveals that prevalent nFPE methods often yield positional information that is significantly skewed by feature content, which is not discussed yet. In this paper, we argue that nFPEs in prior works have two common limitations. First, nFPEs exhibit a significant semantic bias, as they are strongly affected and distorted by the semantic content of input feature maps, leading to indistinct positional information. Second, although the intrinsic token order reamains constant throughout the network, nFPEs redundantly recompute positional information within each transformer block, leading to inefficiency and potentially inconsistent PE application. To overcome these drawbacks, we propose Centralized Position Embedding (CPE). The core idea of CPE is to replace the scattered PE module in each transformer block with a unified PE network per stage, whose output is broadcast to all transformer blocks within that stage. This centralized design allows for a significantly larger receptive field for PE network at a negligible computational overhead, facilitating the extraction of less biased and more consistent positional informations, thus addressing the aforementioned limitations of nFPEs. By applying the proposed CPE to various ViTs for several vision tasks, we show that CPE yileds more precise positional information, leading to consistent performance improvements over existing PE strategies, supporting our arguments.	-
dc.format.extent	14	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC	-
dc.title	Centralized Position Embeddings for Vision Transformers	-
dc.type	Article	-
dc.publisher.location	미국	-
dc.identifier.doi	10.1109/ACCESS.2025.3629376	-
dc.identifier.scopusid	2-s2.0-105020870870	-
dc.identifier.wosid	001615892300007	-
dc.identifier.bibliographicCitation	IEEE ACCESS, v.13, pp 190122 - 190135	-
dc.citation.title	IEEE ACCESS	-
dc.citation.volume	13	-
dc.citation.startPage	190122	-
dc.citation.endPage	190135	-
dc.type.docType	Article	-
dc.description.isOpenAccess	Y	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Computer Science	-
dc.relation.journalResearchArea	Engineering	-
dc.relation.journalResearchArea	Telecommunications	-
dc.relation.journalWebOfScienceCategory	Computer Science, Information Systems	-
dc.relation.journalWebOfScienceCategory	Engineering, Electrical & Electronic	-
dc.relation.journalWebOfScienceCategory	Telecommunications	-
dc.subject.keywordPlus	Embeddings	-
dc.subject.keywordPlus	Machine vision	-
dc.subject.keywordPlus	Semantics	-
dc.subject.keywordAuthor	Transformers	-
dc.subject.keywordAuthor	Semantics	-
dc.subject.keywordAuthor	Computer vision	-
dc.subject.keywordAuthor	Feature extraction	-
dc.subject.keywordAuthor	Encoding	-
dc.subject.keywordAuthor	Convolution	-
dc.subject.keywordAuthor	Visualization	-
dc.subject.keywordAuthor	Data mining	-
dc.subject.keywordAuthor	Computer architecture	-
dc.subject.keywordAuthor	Attention mechanisms	-
dc.subject.keywordAuthor	position embedding	-
dc.subject.keywordAuthor	vision transformer	-
dc.identifier.url	https://ieeexplore.ieee.org/document/11227107	-

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Eun, Rhee Chae photo

Eun, Rhee Chae: COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE