Prompt-Based Learning for Image Variation Using Single Image Multi-Scale Diffusion Modelsopen access
- Authors
- Park, Jiwon; Jeong, Dasol; Lee, Hyebean; Han, Seunghee; Paik, Joonki
- Issue Date
- 2024
- Publisher
- IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
- Keywords
- Training; Computational modeling; Periodic structures; Diffusion models; Data models; Image synthesis; Adaptation models; Noise reduction; Feature extraction; Context modeling; Single image generation; prompt-based learning; text guided image editing
- Citation
- IEEE ACCESS, v.12, pp 158810 - 158823
- Pages
- 14
- Journal Title
- IEEE ACCESS
- Volume
- 12
- Start Page
- 158810
- End Page
- 158823
- URI
- https://scholarworks.bwise.kr/cau/handle/2019.sw.cau/77788
- DOI
- 10.1109/ACCESS.2024.3487215
- ISSN
- 2169-3536
2169-3536
- Abstract
- In this paper, we propose a novel technique for a multi-scale framework with text-based learning using a single image to perform variations and text-based editing of the input image. Our approach captures the detailed internal information of a single image, enabling numerous variations while preserving the original features. In addition, text-conditioned learning provides a method to combine text and images to effectively perform text-based editing based on a single image. We propose a technique that integrates the diffusion U-Net structure within a multi-scale framework to accurately capture the quality and internal structure of an image from a single image and perform diverse variations while maintaining the features of the original image. Additionally, we utilized a pre-trained Bootstrapped Language-Image Pretraining (BLIP) model to generate various prompts for effective text-based editing, and we fed the prompts that most closely resembled the input image into the training process using Contrastive Language-Image Pretraining (CLIP)'s prior knowledge. To improve accuracy during the image editing stage, we designed a contrastive loss function to enhance the relevance between the prompt and the image. As a result, we improved the performance of learning between text and images, and through various experiments, we demonstrated its effectiveness on text-based image editing tasks. Our experiments show that the proposed method significantly improves the performance of single-image-based generative models and presents new possibilities in the field of text-based image editing.
- Files in This Item
-
- Appears in
Collections - Graduate School of Advanced Imaging Sciences, Multimedia and Film > Department of Imaging Science and Arts > 1. Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.