Detailed Information

Cited 10 time in webofscience Cited 20 time in scopus
Metadata Downloads

PatentNet: multi-label classification of patent documents using deep learning based language understandingopen access

Authors
Roudsari, Arousha HaghighianAfshar, JafarLee, WookeyLee, Suan
Issue Date
Jan-2022
Publisher
SPRINGER
Keywords
Patent classification; Multi-label text classification; Pre-trained language model
Citation
SCIENTOMETRICS, v.127, no.1, pp.207 - 231
Journal Title
SCIENTOMETRICS
Volume
127
Number
1
Start Page
207
End Page
231
URI
https://scholarworks.bwise.kr/gachon/handle/2020.sw.gachon/84772
DOI
10.1007/s11192-021-04179-4
ISSN
0138-9130
Abstract
Patent classification is an expensive and time-consuming task that has conventionally been performed by domain experts. However, the increase in the number of filed patents and the complexity of the documents make the classification task challenging. The text used in patent documents is not always written in a way to efficiently convey knowledge. Moreover, patent classification is a multi-label classification task with a large number of labels, which makes the problem even more complicated. Hence, automating this expensive and laborious task is essential for assisting domain experts in managing patent documents, facilitating reliable search, retrieval, and further patent analysis tasks. Transfer learning and pre-trained language models have recently achieved state-of-the-art results in many Natural Language Processing tasks. In this work, we focus on investigating the effect of fine-tuning the pre-trained language models, namely, BERT, XLNet, RoBERTa, and ELECTRA, for the essential task of multi-label patent classification. We compare these models with the baseline deep-learning approaches used for patent classification. We use various word embeddings to enhance the performance of the baseline models. The publicly available USPTO-2M patent classification benchmark and M-patent datasets are used for conducting experiments. We conclude that fine-tuning the pre-trained language models on the patent text improves the multi-label patent classification performance. Our findings indicate that XLNet performs the best and achieves a new state-of-the-art classification performance with respect to precision, recall, F1 measure, as well as coverage error, and LRAP.
Files in This Item
There are no files associated with this item.
Appears in
Collections
ETC > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher HAGHIGHIAN ROUDSARI, AROUSHA photo

HAGHIGHIAN ROUDSARI, AROUSHA
College of IT Convergence (Department of Software)
Read more

Altmetrics

Total Views & Downloads

BROWSE