VerbDiff: Text-Only Diffusion Models with Enhanced Interaction Awareness

Cha, Seungju; Lee, Kwanyoung; Kim, Ye-Chan; Oh, Hyunwoo; Kim, Dong-Jin

doi:10.1109/CVPR52734.2025.00753

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

VerbDiff: Text-Only Diffusion Models with Enhanced Interaction Awareness

Full metadata record

DC Field	Value	Language
dc.contributor.author	Cha, Seungju	-
dc.contributor.author	Lee, Kwanyoung	-
dc.contributor.author	Kim, Ye-Chan	-
dc.contributor.author	Oh, Hyunwoo	-
dc.contributor.author	Kim, Dong-Jin	-
dc.date.accessioned	2025-11-13T00:30:26Z	-
dc.date.available	2025-11-13T00:30:26Z	-
dc.date.issued	2025-08	-
dc.identifier.issn	1063-6919	-
dc.identifier.issn	2575-7075	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/209114	-
dc.description.abstract	Recent large-scale text-to-image diffusion models generate photorealistic images but often struggle to accurately depict interactions between humans and objects due to their limited ability to differentiate various interaction words. In this work, we propose VerbDiff to address the challenge of capturing nuanced interactions within text-to-image diffusion models. VerbDiff is a novel text-to-image generation model that weakens the bias between interaction words and objects, enhancing the understanding of interactions. Specifically, we disentangle various interaction words from frequency-based anchor words and leverage localized interaction regions from generated images to help the model better capture semantics in distinctive words without extra conditions. Our approach enables the model to accurately understand the intended interaction between humans and objects, producing high-quality images with accurate interactions aligned with specified verbs. Extensive experiments on the HICO-DET dataset demonstrate the effectiveness of our method compared to previous approaches. © 2025 Elsevier B.V., All rights reserved.	-
dc.format.extent	10	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	IEEE	-
dc.title	VerbDiff: Text-Only Diffusion Models with Enhanced Interaction Awareness	-
dc.type	Article	-
dc.identifier.doi	10.1109/CVPR52734.2025.00753	-
dc.identifier.scopusid	2-s2.0-105017084617	-
dc.identifier.bibliographicCitation	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 8041 - 8050	-
dc.citation.title	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition	-
dc.citation.startPage	8041	-
dc.citation.endPage	8050	-
dc.type.docType	Conference paper	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scopus	-
dc.subject.keywordAuthor	diffusion	-
dc.subject.keywordAuthor	text to image generation	-
dc.identifier.url	https://ieeexplore.ieee.org/document/11092639	-

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > ETC > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Kim, Dong Jin photo

Kim, Dong Jin: COLLEGE OF ENGINEERING (DEPARTMENT OF INTELLIGENCE COMPUTING)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE