One-stage Detection Model based on Swin Transformer
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Kim, Tae Yang | - |
dc.contributor.author | Niaz, Asim | - |
dc.contributor.author | Choi, Jung Sik | - |
dc.contributor.author | Choi, Kwang Nam | - |
dc.date.accessioned | 2024-05-20T08:04:35Z | - |
dc.date.available | 2024-05-20T08:04:35Z | - |
dc.date.issued | 2024 | - |
dc.identifier.issn | 2169-3536 | - |
dc.identifier.uri | https://scholarworks.bwise.kr/cau/handle/2019.sw.cau/73785 | - |
dc.description.abstract | Object detection using vision transformers (ViTs) has recently garnered considerable research interest. Vision Transformers execute image classification through a multi-head attention-based MLP head and post-image segmentation into patches. However, conventional models prioritize object classification over predicting bounding boxes crucial for precise object detection. To address this gap, a two-stage detector has been devised based on Transformers, which initially extracts feature maps via a pre-trained CNN model. In contrast, our research introduces a one-stage object detector founded on the Swin-Transformer architecture. This one-stage detector adeptly performs simultaneous object classification and bounding box prediction employing a pure Swin-Transformer Encoder Block, obviating the need for a pre-trained CNN model. Our proposed model is trained, validated, and evaluated on the COCO dataset comprising 82,783 training images, 40,504 validation images, and 40,775 test images. The proposed model showed average precision (AP) 30.2% performance improvement by 5.59% compared to the performance evaluation of the existing ViT-based 1-stage detector. Authors | - |
dc.format.extent | 13 | - |
dc.language | 영어 | - |
dc.language.iso | ENG | - |
dc.publisher | Institute of Electrical and Electronics Engineers Inc. | - |
dc.title | One-stage Detection Model based on Swin Transformer | - |
dc.type | Article | - |
dc.identifier.doi | 10.1109/ACCESS.2024.3393152 | - |
dc.identifier.bibliographicCitation | IEEE Access, v.12, pp 60960 - 60972 | - |
dc.description.isOpenAccess | Y | - |
dc.identifier.wosid | 001214302300001 | - |
dc.identifier.scopusid | 2-s2.0-85191548289 | - |
dc.citation.endPage | 60972 | - |
dc.citation.startPage | 60960 | - |
dc.citation.title | IEEE Access | - |
dc.citation.volume | 12 | - |
dc.type.docType | Article | - |
dc.publisher.location | 미국 | - |
dc.subject.keywordAuthor | Attention | - |
dc.subject.keywordAuthor | Computational modeling | - |
dc.subject.keywordAuthor | Computer-vision | - |
dc.subject.keywordAuthor | Detectors | - |
dc.subject.keywordAuthor | Feature extraction | - |
dc.subject.keywordAuthor | Object Detection | - |
dc.subject.keywordAuthor | Predictive models | - |
dc.subject.keywordAuthor | single-stage detection | - |
dc.subject.keywordAuthor | Task analysis | - |
dc.subject.keywordAuthor | Transformer Network | - |
dc.subject.keywordAuthor | Transformers | - |
dc.subject.keywordAuthor | YOLO | - |
dc.relation.journalResearchArea | Computer Science | - |
dc.relation.journalResearchArea | Engineering | - |
dc.relation.journalResearchArea | Telecommunications | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Information Systems | - |
dc.relation.journalWebOfScienceCategory | Engineering, Electrical & Electronic | - |
dc.relation.journalWebOfScienceCategory | Telecommunications | - |
dc.description.journalRegisteredClass | scie | - |
dc.description.journalRegisteredClass | scopus | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
84, Heukseok-ro, Dongjak-gu, Seoul, Republic of Korea (06974)02-820-6194
COPYRIGHT 2019 Chung-Ang University All Rights Reserved.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.