Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

VerbDiff: Text-Only Diffusion Models with Enhanced Interaction Awareness

Full metadata record
DC Field Value Language
dc.contributor.authorCha, Seungju-
dc.contributor.authorLee, Kwanyoung-
dc.contributor.authorKim, Ye-Chan-
dc.contributor.authorOh, Hyunwoo-
dc.contributor.authorKim, Dong-Jin-
dc.date.accessioned2025-11-13T00:30:26Z-
dc.date.available2025-11-13T00:30:26Z-
dc.date.issued2025-08-
dc.identifier.issn1063-6919-
dc.identifier.issn2575-7075-
dc.identifier.urihttps://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/209114-
dc.description.abstractRecent large-scale text-to-image diffusion models generate photorealistic images but often struggle to accurately depict interactions between humans and objects due to their limited ability to differentiate various interaction words. In this work, we propose VerbDiff to address the challenge of capturing nuanced interactions within text-to-image diffusion models. VerbDiff is a novel text-to-image generation model that weakens the bias between interaction words and objects, enhancing the understanding of interactions. Specifically, we disentangle various interaction words from frequency-based anchor words and leverage localized interaction regions from generated images to help the model better capture semantics in distinctive words without extra conditions. Our approach enables the model to accurately understand the intended interaction between humans and objects, producing high-quality images with accurate interactions aligned with specified verbs. Extensive experiments on the HICO-DET dataset demonstrate the effectiveness of our method compared to previous approaches. © 2025 Elsevier B.V., All rights reserved.-
dc.format.extent10-
dc.language영어-
dc.language.isoENG-
dc.publisherIEEE-
dc.titleVerbDiff: Text-Only Diffusion Models with Enhanced Interaction Awareness-
dc.typeArticle-
dc.identifier.doi10.1109/CVPR52734.2025.00753-
dc.identifier.scopusid2-s2.0-105017084617-
dc.identifier.bibliographicCitationProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 8041 - 8050-
dc.citation.titleProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition-
dc.citation.startPage8041-
dc.citation.endPage8050-
dc.type.docTypeConference paper-
dc.description.isOpenAccessN-
dc.description.journalRegisteredClassscopus-
dc.subject.keywordAuthordiffusion-
dc.subject.keywordAuthortext to image generation-
dc.identifier.urlhttps://ieeexplore.ieee.org/document/11092639-
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > ETC > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Kim, Dong Jin photo

Kim, Dong Jin
COLLEGE OF ENGINEERING (DEPARTMENT OF INTELLIGENCE COMPUTING)
Read more

Altmetrics

Total Views & Downloads

BROWSE