텍스트 네트워크 분석을 위한 대규모 텍스트의 자료정리(Data Cleaning) 방법에 관한 연구

박치성; 이준석

doi:10.26847/mspa.2017.27.4.35

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

텍스트 네트워크 분석을 위한 대규모 텍스트의 자료정리(Data Cleaning) 방법에 관한 연구

Full metadata record

DC Field	Value	Language
dc.contributor.author	박치성	-
dc.contributor.author	이준석	-
dc.date.available	2019-03-08T09:57:31Z	-
dc.date.issued	2017-12	-
dc.identifier.issn	1229-389X	-
dc.identifier.uri	https://scholarworks.bwise.kr/cau/handle/2019.sw.cau/5153	-
dc.description.abstract	텍스트마이닝 기술의 발전은 정책학을 비롯한 사회과학 영역에서 텍스트를 활용한 연구가 증가하는데 기여해 왔으며, 그 대표적 예가 텍스트 분석에 네트워크 분석을 접목한 정책 프레임 분석이다. 하지만 텍스트는 대표적인 비정형 데이터이며, 따라서 이를 분석 가능한 형태로 변환하기 위해서는 다양한 요인에 대한 융복합적 고려가 필요하다. 그럼에도 사회과학 영역에서 텍스트를 데이터로 사용한 기존 국내 연구의 경우, 자료정리 과정에 대해 자세히 명시하지 않았다는 한계를 보이고 있다. 이에 따라 본 연구는 사회현상을 분석하기 위해 대용량의 텍스트 자료를 사용하는 경우에 초점을 맞추어, 텍스트 자료정리 방식 및 절차의 정형화를 시도하였다. 이를 위해 1) 텍스트 자료정리 과정에서의 고려요인을 종합하고, 2) 분석 대상 텍스트의 5% 미만에 해당하는 샘플을 무작위 추출하여 연구자의 자료정리 결과를 프로그램을 통한 자료정리 결과와 비교하는 방식을 제시하였으며, 이때 3) 비교과정의 반복을 통해 자료정리 결과의 오류를 개선해 나갈 수 있는 방안 역시 제안하였다. 이와 같은 방식 및 절차에 따라 텍스트 자료를 정리한 결과, 자료정리 결과의 상대적 오류를 상당부문 감소시킬 수 있다는 사실을 확인할 수 있었다.	-
dc.description.abstract	With a help from the development of text mining technology, social scientists, including those in policy science field, could use massive text data vigorously in their works. The epitome of these types of studies is policy framing analysis, in which social network analysis is conjoined with text analysis. To transform unstructured text data into structured one for empirical analysis such as framing study, various methodological issues have to be taken account for. However, most previous studies in Korea has not specified how data cleaning of text data was performed in their analysis procedure. In that regard, this study tried to formalize the text data cleaning processes, which is collected from the diverse sources such as internet newspaper articles. To ensure this end, this study proposes 1) specific factors in cleaning text data including the typology of text data coding errors, selecting criteria of stopwords and etc., 2) data cleaning process using approximately 5% sample extracted from the data set. For this small sample, comparisons were conducted between the coding done by researchers and a computer, so as to capture the errors from the computer programs’ coding. The study collected 331 newspaper articles to demonstrate the validity of the proposed methods. When the proposed methods were applied, the error rates reduced to about 30% comparing the coding done by the computer without any treatment.	-
dc.format.extent	34	-
dc.language	한국어	-
dc.language.iso	KOR	-
dc.publisher	한국국정관리학회	-
dc.title	텍스트 네트워크 분석을 위한 대규모 텍스트의 자료정리(Data Cleaning) 방법에 관한 연구	-
dc.title.alternative	Steps of Text Data Cleaning: for Network Text Analysis Using Large-scale Data	-
dc.type	Article	-
dc.identifier.doi	10.26847/mspa.2017.27.4.35	-
dc.identifier.bibliographicCitation	현대사회와 행정, v.27, no.4, pp 35 - 68	-
dc.identifier.kciid	ART002294640	-
dc.description.isOpenAccess	N	-
dc.citation.endPage	68	-
dc.citation.number	4	-
dc.citation.startPage	35	-
dc.citation.title	현대사회와 행정	-
dc.citation.volume	27	-
dc.publisher.location	대한민국	-
dc.subject.keywordAuthor	텍스트마이닝	-
dc.subject.keywordAuthor	네트워크 텍스트 분석	-
dc.subject.keywordAuthor	자연어처리	-
dc.subject.keywordAuthor	Text Mining	-
dc.subject.keywordAuthor	Network Text Analysis	-
dc.subject.keywordAuthor	Natural Language Process	-
dc.description.journalRegisteredClass	kci	-

Files in This Item: There are no files associated with this item.

Appears in Collections: College of Social Sciences > Department of Public Service > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Park, Chi Sung photo

Park, Chi Sung: 사회과학대학 (공공인재학부)

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :7,554,275; Today View :1,854

RSS_1.0 RSS_2.0 ATOM_1.0

84, Heukseok-ro, Dongjak-gu, Seoul, Republic of Korea (06974)02-820-6194

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE