Cited 0 time in
MSTR: Multi-Scale Transformer for End-to-End Human-Object Interaction Detection
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Kim, Bumsoo | - |
| dc.contributor.author | Mun, Jonghwan | - |
| dc.contributor.author | On, Kyoung-Woon | - |
| dc.contributor.author | Shin, Minchul | - |
| dc.contributor.author | Lee, Junhyun | - |
| dc.contributor.author | Kim, Eun Sol | - |
| dc.date.accessioned | 2022-12-20T10:37:13Z | - |
| dc.date.available | 2022-12-20T10:37:13Z | - |
| dc.date.created | 2022-12-07 | - |
| dc.date.issued | 2022-06 | - |
| dc.identifier.issn | 1063-6919 | - |
| dc.identifier.uri | https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/173242 | - |
| dc.description.abstract | Human-Object Interaction (HOI) detection is the task of identifying a set of (human, object, interaction) triplets from an image. Recent work proposed transformer encoder-decoder architectures that successfully eliminated the need for many hand-designed components in HOI detection through end-to-end training. However, they are limited to single-scale feature resolution, providing suboptimal performance in scenes containing humans, objects, and their interactions with vastly different scales and distances. To tackle this problem, we propose a Multi-Scale TRansformer (MSTR) for HOI detection powered by two novel HOI-aware deformable attention modules called Dual-Entity attention and Entity-conditioned Context attention. While existing deformable attention comes at a huge cost in HOI detection performance, our proposed attention modules of MSTR learn to effectively attend to sampling points that are essential to identify interactions. In experiments, we achieve the new state-of-the-art performance on two HOI detection benchmarks. | - |
| dc.language | 영어 | - |
| dc.language.iso | en | - |
| dc.publisher | IEEE Computer Society | - |
| dc.title | MSTR: Multi-Scale Transformer for End-to-End Human-Object Interaction Detection | - |
| dc.type | Article | - |
| dc.contributor.affiliatedAuthor | Kim, Eun Sol | - |
| dc.identifier.doi | 10.1109/CVPR52688.2022.01897 | - |
| dc.identifier.scopusid | 2-s2.0-85141778735 | - |
| dc.identifier.wosid | 000870783005038 | - |
| dc.identifier.bibliographicCitation | Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, v.2022-June, pp.19556 - 19565 | - |
| dc.relation.isPartOf | Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition | - |
| dc.citation.title | Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition | - |
| dc.citation.volume | 2022-June | - |
| dc.citation.startPage | 19556 | - |
| dc.citation.endPage | 19565 | - |
| dc.type.rims | ART | - |
| dc.type.docType | Proceedings Paper | - |
| dc.description.journalClass | 1 | - |
| dc.description.isOpenAccess | N | - |
| dc.description.journalRegisteredClass | scopus | - |
| dc.relation.journalResearchArea | Computer Science | - |
| dc.relation.journalResearchArea | Imaging Science & Photographic Technology | - |
| dc.relation.journalWebOfScienceCategory | Computer Science, Artificial Intelligence | - |
| dc.relation.journalWebOfScienceCategory | Imaging Science & Photographic Technology | - |
| dc.subject.keywordAuthor | Scene analysis and understanding | - |
| dc.identifier.url | https://ieeexplore.ieee.org/document/9878434 | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366
COPYRIGHT © 2024 HANYANG UNIVERSITY.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.
