218 DESIDOC Journal of Library & Information Technology, Vol. 42, No. 4, July 2022, pp. 218-226, DOI : 10.14429/djlit.42.4.17733 2022, DESIDOC Received : 01 December 2021, Revised : 04 April 2022 Accepted : 05 May 2022, Online published : 19 July 2022 Automated Multi-Label Classifcation on Fertilizer-Themed Patent Documents in Indonesia Aris Yaman #,* , Bagus Sartono $ , Agus M. Soleh $ , Ariani Indrawati # and Yulia Aris Kartika # # National Research, and Innovation Agency (BRIN), Indonesia $ Department Statistics and Data Science at IPB University, Indonesia * E-mail: arisyaman@apps.ipb.ac.id ABSTRACT Patent literature research has a high scientifc value for the industrial, commercial, legal, and policymaking communities. Therefore, patent analysis has become crucial. Patent topic classifcation is an important process in patent topic modeling analysis. However, the classifcation process is time-consuming and expensive, as it is usually carried out manually by an expert. Moreover, a patent document may be categorised in more than one category or label, further complicating the task. As the number of patent documents submitted increases, creating an automated patent classifcation system that yields accurate results becomes increasingly critical. Therefore, in this paper, we analyse the performance of two algorithms with regard to multi-label classifcation in patent documents: multi-label k-nearest neighbor (ML-KNN) and classifer chain k-nearest neighbor (CC-KNN), combined with latent Dirichlet allocation (LDA). These two methods have a considerable advantage in handling the continuously updated dataset; they also exhibit superior performance compared to other multi-label learning algorithms. This study also compares these two algorithms with the term frequency (TF)-weighting measure. The optimal value obtained is based on the following evaluation parameters: micro F1, accuracy, Hamming loss, and one error. The result shows that the ML-KNN method is better than the CC-KNN method and that the multi-label classifcation based on topics (patent LDA) is better than the TF-weighting technique. Keywords: Topic modeling; Multi-label classifcation; Patent document; LDA; ML-KNN; CC-KNN 1. INTRODUCTION Patent rights are a type of intellectual property rights (IPR), which are exclusive rights granted to innovators in the feld of technology for a set period to carry out the innovation themselves or give permission to others for the same 1 . Patenting innovations has several advantages, including strengthening the market position and competitive advantage, increasing return on investment or proft, generating additional income from licensing, gaining access to new markets and technology through cross-licensing, reducing the risk of illegal imitators, enhancing the ability to raise funds and obtain grants, and boosting the public impression of a company 2 . Therefore, patent analysis has become crucial. Patent literature research can reveal important technical details and connections, explain business patterns, ofer innovative industrial solutions, and help investors make important investment decisions 3–5 . Generally, patent-analysis experts are required to have a specifc level of experience in a variety of research topics. Unfortunately, the rapid growth of patents in both quantity and quality has led to an increase in the workload of patent experts. Consequently, efciency and consistency in analyzing patent documents have decreased 6 . One of the crucial processes in patent literature analysis is patent topic classifcation, in which patents covering similar topics or technological areas are grouped. Thus, developing an automated classifcation system for patent documents has become extremely important, as it will help both inventors and patent-analysis experts identify patents on similar topics 7 . However, developing an accurate automated patent document classifcation system is quite difcult for various reasons. First, the International Patent Classifcation (IPC) system is complicated, with a hierarchical structure and several labels 8-9 . Second, patent documents’ complexity poses a concern; patent documents are complicated and typically contain extensive jargon or new technical terms resulting from technological advances 10 . Third, as knowledge and technology evolve over time, a patent documents may have several categories, and so we must simultaneously categorise a patent document into many labels, which is referred to as multi-label classifcation. Unfortunately, the majority of the classifcation problems investigated in machine learning, especially in patent topic classifcation modeling, are single-label classifcation problems 11 . Multi-label classifcation difers from binary and multi-class classifcation in that it is more difcult to learn; in multi-label classifcation, one must classify an object into more than one label simultaneously 11-12 . There are at least two commonly methods to overcome difculties in multi- label classifcation: the problem transformation method and