Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

SeeDiff: Off-the-Shelf Seeded Mask Generation from Diffusion Models

Full metadata record
DC Field Value Language
dc.contributor.authorPark, Joon Hyun-
dc.contributor.authorJo, Kumju-
dc.contributor.authorBaik, Sungyong-
dc.date.accessioned2025-05-26T08:30:24Z-
dc.date.available2025-05-26T08:30:24Z-
dc.date.issued2025-04-
dc.identifier.issn2159-5399-
dc.identifier.issn2374-3468-
dc.identifier.urihttps://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/207442-
dc.description.abstractEntrusted with the goal of pixel-level object classification, the semantic segmentation networks entail the laborious preparation of pixel-level annotation masks. To obtain pixel-level annotation masks for a given class without human efforts, recent few works have proposed to generate pairs of images and annotation masks by employing image and text relationships modeled by text-to-image generative models, especially Stable Diffusion. However, these works do not fully exploit the capability of text-guided Diffusion models and thus require a pre-trained segmentation network, careful text prompt tuning, or the training of a segmentation network to generate final annotation masks. In this work, we take a closer look at attention mechanisms of Stable Diffusion, from which we draw connections with classical seeded segmentation approaches. In particular, we show that cross-attention alone provides very coarse object localization, which however can provide initial seeds. Then, akin to region expansion in seeded segmentation, we utilize the semantic-correspondence-modeling capability of self-attention to iteratively spread the attention to the whole class from the seeds using multi-scale self-attention maps. We also observe that a simple-text-guided synthetic image often has a uniform background, which is easier to find correspondences, compared to complex-structured objects. Thus, we further refine a mask using a more accurate background mask. Our proposed method, dubbed SeeDiff, generates high-quality masks off-the-shelf from Stable Diffusion, without additional training procedure, prompt tuning, or a pre-trained segmentation network.-
dc.format.extent10-
dc.language영어-
dc.language.isoENG-
dc.publisherAssociation for the Advancement of Artificial Intelligence-
dc.titleSeeDiff: Off-the-Shelf Seeded Mask Generation from Diffusion Models-
dc.typeArticle-
dc.publisher.location영국-
dc.identifier.doi10.1609/aaai.v39i6.32686-
dc.identifier.scopusid2-s2.0-105003903951-
dc.identifier.bibliographicCitationProceedings of the AAAI Conference on Artificial Intelligence, v.39, no.6, pp 6406 - 6415-
dc.citation.titleProceedings of the AAAI Conference on Artificial Intelligence-
dc.citation.volume39-
dc.citation.number6-
dc.citation.startPage6406-
dc.citation.endPage6415-
dc.type.docTypeConference paper-
dc.description.isOpenAccessN-
dc.description.journalRegisteredClassscopus-
dc.subject.keywordPlusAttention mechanisms-
dc.subject.keywordPlusDiffusion model-
dc.subject.keywordPlusGenerative model-
dc.subject.keywordPlusModelling capabilities-
dc.subject.keywordPlusMulti-scales-
dc.subject.keywordPlusObject classification-
dc.subject.keywordPlusObject localization-
dc.subject.keywordPlusPixel level-
dc.subject.keywordPlusSemantic correspondence-
dc.subject.keywordPlusSemantic segmentation-
dc.identifier.urlhttps://ojs.aaai.org/index.php/AAAI/article/view/32686-
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > ETC > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Baik, Sungyong photo

Baik, Sungyong
COLLEGE OF ENGINEERING (DEPARTMENT OF INTELLIGENCE COMPUTING)
Read more

Altmetrics

Total Views & Downloads

BROWSE