Jour of Adv Research in Dynamical & Control Systems, Vol. 12, Special Issue-06, 2020 145 DOI: 10.5373/JARDCS/V12SP6/SP20201018 *Corresponding Author: Prihastuti Harsani , Email: prihastuti.harsani@unpak.ac.id Article History: Received: Mar 21, 2020, Accepted: June 22, 2020 A Study using Machine Learning with N- Gram Model in Harmonized System Classification Prihastuti Harsani 1 , Adang Suhendra 2 , Lily Wulandari 2 , Wahyu Catur Wibowo 3 1 Department of Computer Science, UniversitasPakuan, Indonesia 2 Department of Computer Science and Information Technology,Universitas Gunadarma, Indonesia 3 Department of Computer Science, Universitas Indonesia, Indonesia Abstract:Harmonized System or commonly called HS is a list of classifications of goods made systematically with the aim of facilitating the taxing, trade transactions, transportation and statistics that have been improved from the previous classification system. In international trade (import / export) each item to be traded must be determined its HS Code based on the description that accompanies the goods. The description of imported goods in the form of text will be translated into the classification of imported goods regulated in the 2017 Indonesian Customs Tariff Book BTKI is the Indonesian Customs Tariff Book that contains the goods classification system applicable in Indonesia, including Provisions for Interpretation (KUMHS), Notes, and Goods Classification Structures compiled based on the ASEAN Harmonized Tariff Nomenclature (AHTN) Harmonized System. The classification of goods based on the HS code faces several challenges, including HS Complexity, Gaps in HS terminology, The amount of text in the goods description. This study conducted an experiment that applied machine learning in classifying imported goods. The focus of this research is the classification based on short text categorization. Documents compiled on pandek text in accordance with the characteristics of the description of the goods. The study conducted experiments with three methods, namely: Libshorttext, text categorization (Text) and topic modeling. Feature extraction methods used are Term Frequency - Index Document Frequency (TF-IDF) and Latent Dirichlect Allocation (LDA). Classification is done based on the 8 digit HS system. The goods description that accompanies transaction data has an average number of words as many as 7. Classification of goods based on the HS code is a matter of categorizing short texts. The feature used is the Ngram model. The method used is Libshort, Text Categorization and topic modelling. evaluation shows that libshort has the highest accuracy and fscore value followed by text categorization and topic modeling. SVM and KNN give two different results on the classification. Based on the experimental results, it is not yet concluded whether an increase in N values on the N-Gram model will result in a better FScore value on short texts.. Keywords: HS,LDA,TF-IDF,Ngram,SVM,KNN Introduction Harmonized System or commonly called HS is a list of classifications of goods made systematically with the aim of facilitating the taxing, trade transactions, transportation and statistics that have been improved from the previous classification system. In international trade (import / export) each item to be traded must be determined its HS Code based on the description that accompanies the goods. HS code determination can be done through analysis of goods based on the characteristics of the goods obtained from the characteristics of the goods. Based on the export / import notification form, the item description is inputted by the exporter / importer in accordance with the fields provided. The text entered is an open sentence that is inputted in accordance with the interpretation of the exporter / importer. HS code is determined based on the description that has been entered. Guidelines used by importers in interpreting goods are contained in regulations issued by the Ministry of Finance of the Republic of Indonesia [1]. The accuracy of interpretation is influenced by the knowledge, skills and experience possessed by traders and customs officials which will affect the accuracy of determining HS codes [2]. Determination of HS codes based on the actual description is a matter of text categorization. The description of imported goods in the form of text will be translated into the classification of imported goods regulated in the 2017 Indonesian Customs Tariff Book BTKI is the Indonesian Customs Tariff Book that contains the goods classification system applicable in Indonesia, including Provisions for Interpretation (KUMHS), Notes, and Goods Classification Structures compiled based on the ASEAN Harmonized Tariff Nomenclature (AHTN) Harmonized System. The classification of goods based on the HS code faces several challenges, including, 1). HS Complexity. HS is a structured multipurpose nomenclature, organized into 21 Sections and 98 Chapters. Classification that is done manually requires carefulness, experience and good knowledge. 2). Gaps in HS terminology. There is a gap between the description of goods entered by the importer and the description of goods in the HS nomenclature used by customs. A simple string search helps the importer to find the relevant HS code slightly because of the difference between the