Image Manipulation Using Korean Translation and CLIP: Ko-CLIP
- Authors
- Kim, Sieun; Joe, Inwhee
- Issue Date
- Apr-2023
- Publisher
- Springer International Publishing AG
- Keywords
- Computer Vision; Image Processing; Machine Learning; Natural Language Processing
- Citation
- Lecture Notes in Networks and Systems, v.724 LNNS, pp 222 - 230
- Pages
- 9
- Indexed
- SCOPUS
- Journal Title
- Lecture Notes in Networks and Systems
- Volume
- 724 LNNS
- Start Page
- 222
- End Page
- 230
- URI
- https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/192229
- DOI
- 10.1007/978-3-031-35314-7_21
- ISSN
- 2367-3370
2367-3389
- Abstract
- Deep Learning, a field of artistic intelligence (AI), is showing good results in natural language processing (NLP) and image processing classification. In the NLP field, in particular, the BERT-based model has become the main focus of the latest language model. It is a representative model that utilizes BERT pre-training and fine-tuning. Through the process of pre-training vast amounts of data and fine-tuning it, more natural NLP can be implemented. CLIP recently built a dataset with only web crawling without manual labeling to create a huge dataset that forms image-text pairs. With the CLIP Model, it tells you which image the input text is deeply related to. However, CLIP does not recognize Korean text when it is input, so it cannot accurately analyze it. In this paper, we propose to use the BERT Model of NLP and CLIP in the field of image processing to process images by receiving Korean text input. The Korean text is translated into English through the BERT Model and used as input text in the CLIP Model. The output that went through the two models reflected the contents of the Korean text. It can be seen that Output is related to the accuracy of Korean text.
- Files in This Item
-
Go to Link
- Appears in
Collections - 서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.