Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Prompt-Based Learning for Image Variation Using Single Image Multi-Scale Diffusion Modelsopen access

Authors
Park, JiwonJeong, DasolLee, HyebeanHan, SeungheePaik, Joonki
Issue Date
2024
Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Keywords
Training; Computational modeling; Periodic structures; Diffusion models; Data models; Image synthesis; Adaptation models; Noise reduction; Feature extraction; Context modeling; Single image generation; prompt-based learning; text guided image editing
Citation
IEEE ACCESS, v.12, pp 158810 - 158823
Pages
14
Journal Title
IEEE ACCESS
Volume
12
Start Page
158810
End Page
158823
URI
https://scholarworks.bwise.kr/cau/handle/2019.sw.cau/77788
DOI
10.1109/ACCESS.2024.3487215
ISSN
2169-3536
2169-3536
Abstract
In this paper, we propose a novel technique for a multi-scale framework with text-based learning using a single image to perform variations and text-based editing of the input image. Our approach captures the detailed internal information of a single image, enabling numerous variations while preserving the original features. In addition, text-conditioned learning provides a method to combine text and images to effectively perform text-based editing based on a single image. We propose a technique that integrates the diffusion U-Net structure within a multi-scale framework to accurately capture the quality and internal structure of an image from a single image and perform diverse variations while maintaining the features of the original image. Additionally, we utilized a pre-trained Bootstrapped Language-Image Pretraining (BLIP) model to generate various prompts for effective text-based editing, and we fed the prompts that most closely resembled the input image into the training process using Contrastive Language-Image Pretraining (CLIP)'s prior knowledge. To improve accuracy during the image editing stage, we designed a contrastive loss function to enhance the relevance between the prompt and the image. As a result, we improved the performance of learning between text and images, and through various experiments, we demonstrated its effectiveness on text-based image editing tasks. Our experiments show that the proposed method significantly improves the performance of single-image-based generative models and presents new possibilities in the field of text-based image editing.
Files in This Item
Appears in
Collections
Graduate School of Advanced Imaging Sciences, Multimedia and Film > Department of Imaging Science and Arts > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Paik, Joon Ki photo

Paik, Joon Ki
첨단영상대학원 (영상학과)
Read more

Altmetrics

Total Views & Downloads

BROWSE