Improved Machine Reading Comprehension Using Data Validation for Weakly Labeled Data
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Yang Y. | - |
dc.contributor.author | Kang S. | - |
dc.contributor.author | Seo J. | - |
dc.date.available | 2020-03-03T06:45:29Z | - |
dc.date.created | 2020-02-24 | - |
dc.date.issued | 2020-01 | - |
dc.identifier.issn | 2169-3536 | - |
dc.identifier.uri | https://scholarworks.bwise.kr/gachon/handle/2020.sw.gachon/17740 | - |
dc.description.abstract | Machine reading comprehension (MRC) is a natural language processing task wherein a given question is answered according to a holistic understanding of a given context. Recently, many researchers have shown interest in MRC, for which a considerable number of datasets are being released. Datasets for MRC, which are composed of the context-query-answer triple, are designed to answer a given query by referencing and understanding a readily-available, relevant context text. The TriviaQA dataset is a weakly labeled dataset, because it contains irrelevant context that forms no basis for answering the query. The existing syntactic data cleaning method struggles to deal with the contextual noise this irrelevancy creates. Therefore, a semantic data cleaning method using reasoning processes is necessary. To address this, we propose a new MRC model in which the TriviaQA dataset is validated and trained using a high-quality dataset. The data validation method in our MRC model improves the quality of the training dataset, and the answer extraction model learns with the validated training data, because of our validation method. Our proposed method showed a 4.33% improvement in performance for the TriviaQA Wiki, compared to the existing baseline model. Accordingly, our proposed method can address the limitation of irrelevant context in MRC better than the human supervision. © 2013 IEEE. | - |
dc.language | 영어 | - |
dc.language.iso | en | - |
dc.publisher | Institute of Electrical and Electronics Engineers Inc. | - |
dc.relation.isPartOf | IEEE Access | - |
dc.title | Improved Machine Reading Comprehension Using Data Validation for Weakly Labeled Data | - |
dc.type | Article | - |
dc.type.rims | ART | - |
dc.description.journalClass | 1 | - |
dc.identifier.wosid | 000524677500039 | - |
dc.identifier.doi | 10.1109/ACCESS.2019.2963569 | - |
dc.identifier.bibliographicCitation | IEEE Access, v.8, pp.5667 - 5677 | - |
dc.description.isOpenAccess | N | - |
dc.identifier.scopusid | 2-s2.0-85078309202 | - |
dc.citation.endPage | 5677 | - |
dc.citation.startPage | 5667 | - |
dc.citation.title | IEEE Access | - |
dc.citation.volume | 8 | - |
dc.contributor.affiliatedAuthor | Kang S. | - |
dc.type.docType | Article | - |
dc.subject.keywordAuthor | Computational and artificial intelligence | - |
dc.subject.keywordAuthor | data validation | - |
dc.subject.keywordAuthor | machine reading comprehension | - |
dc.subject.keywordAuthor | natural language processing | - |
dc.subject.keywordAuthor | neural networks | - |
dc.subject.keywordAuthor | weak label | - |
dc.subject.keywordPlus | Neural networks | - |
dc.subject.keywordPlus | Query processing | - |
dc.subject.keywordPlus | Semantics | - |
dc.subject.keywordPlus | Computational and artificial intelligences | - |
dc.subject.keywordPlus | Data validation | - |
dc.subject.keywordPlus | NAtural language processing | - |
dc.subject.keywordPlus | Reading comprehension | - |
dc.subject.keywordPlus | Weak labels | - |
dc.subject.keywordPlus | Natural language processing systems | - |
dc.description.journalRegisteredClass | scie | - |
dc.description.journalRegisteredClass | scopus | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
1342, Seongnam-daero, Sujeong-gu, Seongnam-si, Gyeonggi-do, Republic of Korea(13120)031-750-5114
COPYRIGHT 2020 Gachon University All Rights Reserved.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.