역문서빈도로 가중된 부속단어를 이용한 FastText 워드 임베딩

최재걸; 이상웅

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

역문서빈도로 가중된 부속단어를 이용한 FastText 워드 임베딩FastText word embedding with IDF-weighted subword information

Other Titles: FastText word embedding with IDF-weighted subword information

Authors: 최재걸; 이상웅

Issue Date: Jun-2019

Publisher: 한국차세대컴퓨팅학회

Keywords: word embedding; word2vec; FastText; inverse document frequency; skip-gram; 워드 임베딩; 워드투벡; 패스트택스트; 역문서빈도; 스킵그램

Citation: 한국차세대컴퓨팅학회 논문지, v.15, no.3, pp.67 - 77

Journal Title: 한국차세대컴퓨팅학회 논문지

Volume: 15

Number: 3

Start Page: 67

End Page: 77

URI: https://scholarworks.bwise.kr/gachon/handle/2020.sw.gachon/2406

ISSN: 1975-681X

Abstract: 워드 임베딩은 자연어처리 분야에서 중요한 기술로, word2vec이 대표적인 알고리즘으로 알려져 있다. word2vec을 비롯한 사전기반의 워드 임베딩 알고리즘들은 단어의 형태소특징을 사용하지 않는 방식, 즉 단어를 하나의 개체로 사용하기 때문에 학습에 사용된 단어에 대해서만 단어 벡터를 만들 수 있는 한계를 가지고 있다. FastText는 이 문제를 해결하기 위해 제안된 알고리즘으로, 부속단어들의 조합으로 워드 임베딩을 하며, 이에 따라 학습에 사용된 적이 없는 단어에 대해서도 단어 벡터를 만들 수 있다. FastText는 형태소적 특징을 사용하기 때문에, word2vec 방식에 비하여 구문적 부분에서는 강점이 있고, 의미적 부분에서는 약점이 있다. 이 논문에서는 부속단어의 역문서빈도를 이용하여 FastText를 개선하는 방법을 제시하며, FastText가 가지고 있는 의미적 부분에서의 약점을 극복하고자 한다. 실험결과는 구문적 부분에서의 손실이 거의 없이 의미적부분에서 개선이 있었음을 보여준다. 또한 이 방법은 부속단어를 이용한 워드 임베딩에 모두 적용할 수 있다. 중의어를 구별하여 워드 임베딩하기 위해 고안된 확률적 FastText에도 역문서 빈도를 적용 실험하고, 결과를 통해 성능이 향상되었음을 확인하고자 한다.

Files in This Item: There are no files associated with this item.

Appears in Collections: IT융합대학 > 소프트웨어학과 > 1. Journal Articles

Show full item record

qrcode

Related Researcher

Researcher Lee, Sang-Woong photo

Lee, Sang-Woong: College of IT Convergence (Department of Software)

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :4,230,992; Today View :2,175

RSS_1.0 RSS_2.0 ATOM_1.0

1342, Seongnam-daero, Sujeong-gu, Seongnam-si, Gyeonggi-do, Republic of Korea(13120)031-750-5114

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE