웹 검색엔진 및 딥러닝 기반 한글 단어 인식 OCR 시스템

장혁수; 고상호; 이재현; 박승권

doi:10.7840/kics.2023.48.9.1169

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

웹 검색엔진 및 딥러닝 기반 한글 단어 인식 OCR 시스템The Deep Learning-Based OCR System for Korean Word with Web Search Engine

Other Titles: The Deep Learning-Based OCR System for Korean Word with Web Search Engine

Authors: 장혁수; 고상호; 이재현; 박승권

Issue Date: Sep-2023

Publisher: 한국통신학회

Keywords: 광학 문자 인식; 딥러닝; 합성곱 신경망; 한글 단어 인식; 단어 분리; OCR; Deep Learning; CNN; Korean Word Recognition; Word Segmentation

Citation: 한국통신학회논문지, v.48, no.9, pp 1169 - 1174

Pages: 6

Indexed: KCI

Journal Title: 한국통신학회논문지

Volume: 48

Number: 9

Start Page: 1169

End Page: 1174

URI: https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/191663

DOI: 10.7840/kics.2023.48.9.1169

ISSN: 1226-4717
2287-3880

Abstract: Optical character recognition (OCR)은 이미지 내의 텍스트를 인식하여 이를 텍스트 데이터로 변환하는 기술이다. 외국에서는 OCR로 문서 처리를 자동화하여 비용과 시간을 절약하는 데 활용되고 있다. 그러나 한국에서는 한글의 언어적 특성 때문에 영어와 숫자에 비해 인식률이 낮아, OCR이 적극적으로 사용되지 않고 있다. 따라서OCR의 한글 인식 정확도가 향상되면 한국에서도 OCR을 통한 업무 효율성 증가를 기대할 수 있다. 본 논문에서는 convolutional neural network (CNN)을 이용해 한글, 영어 및 숫자를 훈련시켰다. 이를 기반으로 문자가 복합적으로 구성된 단어에서 한글의 완성형 글자를 구분해 인식하고, 인식된 단어를 검색엔진에 검색 후 수정된 검색어가 존재하면 이를 최종 결과물로 출력해 인식 정확도를 향상시키는 시스템을 구현하였다. 인식률 측정 결과 한글, 영어 및 숫자가 복합적으로 구성된 영수증에서 최대 90.1%의 문자 인식률이 확인되었다.
Optical character recognition (OCR) is the technology that recognizes text in an image and converts it into text data. In foreign countries, OCR enables automated document processing. Since the recognition rate of Hangul is lower than that of English and Numbers, the OCR is not widely used in Korea. If the OCR accuracy of Hangul is improved, we expect an increase in work efficiency through OCR in Korea as well. In this paper, the OCR system was based on the convolutional neural network (CNN) to train Hangul, English, and Numbers. Subsequently, the process was implemented that distinguishes the complex words to complete Hangul characters, recognizes the complete Hangul characters, and converts them into text data. Additionally, to further improve the accuracy of the OCR system, search the text data in a web search engine, and verify the existence of modified words. If a modified word is found in the web search results, it is considered the correct recognition result and included in the final text data. We conducted a recognition rate measurement and found that the OCR system was able to accurately recognize up to 90.1% of characters in documents containing Hangul, English, and Numbers.

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Show full item record

qrcode

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Altmetrics

Total Views & Downloads

BROWSE