31 Journal of Biological Control, 32(1): 31-36, 2018, DOI: 10.18311/jbc/2018/18163 Research Article M. PRATHEEPA 1* , J. CRUZ ANTONY 2 , CHANDISH R. BALLAL 1 and H. BHEEMANNA 3 1 ICAR-National Bureau of Agricultural Insect Resources, Bengaluru – 560024, Karnataka, India 2 Department of Computer Science, Jain University, Bengaluru – 560011, Karnataka, India 3 University of Agricultural Sciences, Agricultural Research Station, Raichur - 584102, Karnataka, India * Corresponding author E-mail: mpratheepa@gmail.com Optimized binning technique in decision tree model for predicting the Helicoverpa armigera (Hübner) incidence on cotton (Article chronicle: Received: 14-11-2017; Revised: 26-02-2018; Accepted: 10-03-2018) ABSTRACT: The data mining technique decision tree induction model is a popular method used for prediction and classification prob- lems. The most suitable model in pest forewarning systems is decision tree analysis since pest surveillance data contains biotic, abiotic and environmental variables and IF-THEN rules can be easily framed. The abiotic factors like maximum and minimum temperature, rainfall, relative humidity, etc. are continuous numerical data and are important in climate-change studies. The decision tree model is implemented after pre-processing the data which are suitable for analysis. Data discretization is a pre-processing technique which is used to transform the continuous numerical data into categorical data resulting in interval as nominal values. The most commonly used binning methods are equal-width partitioning and equal-depth partitioning. The total number of bins created for the variable is important because either large number of bins or small number of bins affects the accuracy in results of IF-THEN rules. Hence, optimized binning technique based on Mean Integrated Squared Error (MISE) method is proposed for forming accurate IF-THEN rules in predicting the pest Helicoverpa armigera incidence on cotton crop based on decision tree analysis. KEY WORDS: Bin optimization, decision tree, discretization, Helicoverpa armigera, IF-THEN rules, pest prediction INTRODUCTION There are several data mining techniques like logis- tic regression, decision tree analysis, Bayesian Networks (BNs) and Rule-Learners (RLs) which are widely used in prediction models. These techniques otherwise called as classification techniques. Classification techniques used to predict the target variable in the form of categorical class labels as “Yes/No”, “Present/Absent”, “High/Low”, etc. Classification algorithm uses nominal values of in- dependent variables to predict their class labels of target variable. Generally independent variables are in the form of real-valued attributes or continuous numerical values as in the case of weather data which contains maximum tem- perature, minimum temperature, relative humidity, rain fall, etc. Suitable discretization algorithms are needed to handle problems of conversion of real valued attributes of inde- pendent variables to nominal or categorical values. Decision tree analysis and rule-learners could be suit- able model for pest prediction by using weather data, since the IF-THEN rules derived from these models are easily understandable and interpretable (Zhao and Ram, 2004). Discretizing real-valued continuous attributes is an impor- tant technique in data pre-processing to select the relevant features in classification algorithms. Discretization is usu- ally performed prior to the induction or learning process. George et al., (1994) addressed about the relevant features and irrelevant features selection in decision tree algorithm ID3. In discretization process the continuous element ar- ray has been divided into different bins/buckets/intervals. The term “cut-point” refers to the point where the partition occurs in an array. Let us consider a continuous interval [a,b] is partitioned into [a,c] and [c,b] where ‘c’ is known as cut-point or split-point (Sotiris and Dimitris, 2006). The cut-point or split-point has been chosen based on the differ-