Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

SeeDiff: Off-the-Shelf Seeded Mask Generation from Diffusion Models

Authors
Park, Joon HyunJo, KumjuBaik, Sungyong
Issue Date
Apr-2025
Publisher
Association for the Advancement of Artificial Intelligence
Citation
Proceedings of the AAAI Conference on Artificial Intelligence, v.39, no.6, pp 6406 - 6415
Pages
10
Indexed
SCOPUS
Journal Title
Proceedings of the AAAI Conference on Artificial Intelligence
Volume
39
Number
6
Start Page
6406
End Page
6415
URI
https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/207442
DOI
10.1609/aaai.v39i6.32686
ISSN
2159-5399
2374-3468
Abstract
Entrusted with the goal of pixel-level object classification, the semantic segmentation networks entail the laborious preparation of pixel-level annotation masks. To obtain pixel-level annotation masks for a given class without human efforts, recent few works have proposed to generate pairs of images and annotation masks by employing image and text relationships modeled by text-to-image generative models, especially Stable Diffusion. However, these works do not fully exploit the capability of text-guided Diffusion models and thus require a pre-trained segmentation network, careful text prompt tuning, or the training of a segmentation network to generate final annotation masks. In this work, we take a closer look at attention mechanisms of Stable Diffusion, from which we draw connections with classical seeded segmentation approaches. In particular, we show that cross-attention alone provides very coarse object localization, which however can provide initial seeds. Then, akin to region expansion in seeded segmentation, we utilize the semantic-correspondence-modeling capability of self-attention to iteratively spread the attention to the whole class from the seeds using multi-scale self-attention maps. We also observe that a simple-text-guided synthetic image often has a uniform background, which is easier to find correspondences, compared to complex-structured objects. Thus, we further refine a mask using a more accurate background mask. Our proposed method, dubbed SeeDiff, generates high-quality masks off-the-shelf from Stable Diffusion, without additional training procedure, prompt tuning, or a pre-trained segmentation network.
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > ETC > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Baik, Sungyong photo

Baik, Sungyong
COLLEGE OF ENGINEERING (DEPARTMENT OF INTELLIGENCE COMPUTING)
Read more

Altmetrics

Total Views & Downloads

BROWSE