PatentNet: multi-label classification of patent documents using deep learning based language understanding

Roudsari, Arousha Haghighian; Afshar, Jafar; Lee, Wookey; Lee, Suan

Detailed Information

Cited 10 time in webofscience

Cited 20 time in scopus

Metadata Downloads

PatentNet: multi-label classification of patent documents using deep learning based language understandingopen access

Authors: Roudsari, Arousha Haghighian; Afshar, Jafar; Lee, Wookey; Lee, Suan

Issue Date: Jan-2022

Publisher: SPRINGER

Keywords: Patent classification; Multi-label text classification; Pre-trained language model

Citation: SCIENTOMETRICS, v.127, no.1, pp.207 - 231

Journal Title: SCIENTOMETRICS

Volume: 127

Number: 1

Start Page: 207

End Page: 231

URI: https://scholarworks.bwise.kr/gachon/handle/2020.sw.gachon/84772

DOI: 10.1007/s11192-021-04179-4

ISSN: 0138-9130

Abstract: Patent classification is an expensive and time-consuming task that has conventionally been performed by domain experts. However, the increase in the number of filed patents and the complexity of the documents make the classification task challenging. The text used in patent documents is not always written in a way to efficiently convey knowledge. Moreover, patent classification is a multi-label classification task with a large number of labels, which makes the problem even more complicated. Hence, automating this expensive and laborious task is essential for assisting domain experts in managing patent documents, facilitating reliable search, retrieval, and further patent analysis tasks. Transfer learning and pre-trained language models have recently achieved state-of-the-art results in many Natural Language Processing tasks. In this work, we focus on investigating the effect of fine-tuning the pre-trained language models, namely, BERT, XLNet, RoBERTa, and ELECTRA, for the essential task of multi-label patent classification. We compare these models with the baseline deep-learning approaches used for patent classification. We use various word embeddings to enhance the performance of the baseline models. The publicly available USPTO-2M patent classification benchmark and M-patent datasets are used for conducting experiments. We conclude that fine-tuning the pre-trained language models on the patent text improves the multi-label patent classification performance. Our findings indicate that XLNet performs the best and achieves a new state-of-the-art classification performance with respect to precision, recall, F1 measure, as well as coverage error, and LRAP.

Files in This Item: There are no files associated with this item.

Appears in Collections: ETC > 1. Journal Articles

Show full item record

qrcode

Related Researcher

Researcher HAGHIGHIAN ROUDSARI, AROUSHA photo

HAGHIGHIAN ROUDSARI, AROUSHA: College of IT Convergence (Department of Software)

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :4,245,505; Today View :2,782

RSS_1.0 RSS_2.0 ATOM_1.0

1342, Seongnam-daero, Sujeong-gu, Seongnam-si, Gyeonggi-do, Republic of Korea(13120)031-750-5114

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE