PatentNet: multi-label classification of patent documents using deep learning based language understanding
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Roudsari, Arousha Haghighian | - |
dc.contributor.author | Afshar, Jafar | - |
dc.contributor.author | Lee, Wookey | - |
dc.contributor.author | Lee, Suan | - |
dc.date.accessioned | 2022-06-24T09:40:14Z | - |
dc.date.available | 2022-06-24T09:40:14Z | - |
dc.date.created | 2022-06-24 | - |
dc.date.issued | 2022-01 | - |
dc.identifier.issn | 0138-9130 | - |
dc.identifier.uri | https://scholarworks.bwise.kr/gachon/handle/2020.sw.gachon/84772 | - |
dc.description.abstract | Patent classification is an expensive and time-consuming task that has conventionally been performed by domain experts. However, the increase in the number of filed patents and the complexity of the documents make the classification task challenging. The text used in patent documents is not always written in a way to efficiently convey knowledge. Moreover, patent classification is a multi-label classification task with a large number of labels, which makes the problem even more complicated. Hence, automating this expensive and laborious task is essential for assisting domain experts in managing patent documents, facilitating reliable search, retrieval, and further patent analysis tasks. Transfer learning and pre-trained language models have recently achieved state-of-the-art results in many Natural Language Processing tasks. In this work, we focus on investigating the effect of fine-tuning the pre-trained language models, namely, BERT, XLNet, RoBERTa, and ELECTRA, for the essential task of multi-label patent classification. We compare these models with the baseline deep-learning approaches used for patent classification. We use various word embeddings to enhance the performance of the baseline models. The publicly available USPTO-2M patent classification benchmark and M-patent datasets are used for conducting experiments. We conclude that fine-tuning the pre-trained language models on the patent text improves the multi-label patent classification performance. Our findings indicate that XLNet performs the best and achieves a new state-of-the-art classification performance with respect to precision, recall, F1 measure, as well as coverage error, and LRAP. | - |
dc.language | 영어 | - |
dc.language.iso | en | - |
dc.publisher | SPRINGER | - |
dc.relation.isPartOf | SCIENTOMETRICS | - |
dc.title | PatentNet: multi-label classification of patent documents using deep learning based language understanding | - |
dc.type | Article | - |
dc.type.rims | ART | - |
dc.description.journalClass | 1 | - |
dc.identifier.wosid | 000731194300001 | - |
dc.identifier.doi | 10.1007/s11192-021-04179-4 | - |
dc.identifier.bibliographicCitation | SCIENTOMETRICS, v.127, no.1, pp.207 - 231 | - |
dc.description.isOpenAccess | Y | - |
dc.identifier.scopusid | 2-s2.0-85121374042 | - |
dc.citation.endPage | 231 | - |
dc.citation.startPage | 207 | - |
dc.citation.title | SCIENTOMETRICS | - |
dc.citation.volume | 127 | - |
dc.citation.number | 1 | - |
dc.contributor.affiliatedAuthor | Roudsari, Arousha Haghighian | - |
dc.type.docType | Article | - |
dc.subject.keywordAuthor | Patent classification | - |
dc.subject.keywordAuthor | Multi-label text classification | - |
dc.subject.keywordAuthor | Pre-trained language model | - |
dc.subject.keywordPlus | NEURAL-NETWORKS | - |
dc.relation.journalResearchArea | Computer Science | - |
dc.relation.journalResearchArea | Information Science & Library Science | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Interdisciplinary Applications | - |
dc.relation.journalWebOfScienceCategory | Information Science & Library Science | - |
dc.description.journalRegisteredClass | scie | - |
dc.description.journalRegisteredClass | ssci | - |
dc.description.journalRegisteredClass | scopus | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
1342, Seongnam-daero, Sujeong-gu, Seongnam-si, Gyeonggi-do, Republic of Korea(13120)031-750-5114
COPYRIGHT 2020 Gachon University All Rights Reserved.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.