Learning Dynamics of Pesticide Abuse through Data Mining Ahsan Abdullah Stephen Brobst Ijaz Pervaiz National University of Computers & Emerging Sciences, Islamabad, Pakistan, ahsan@nu.edu.pk Teradata Division, NCR, Dayton, OH, USA Directorate of Pest Warning & Quality Control of Pesticides, Punjab, Multan Muhammad Umer, Azhar Nisar National University of Computers & Emerging Sciences, Islamabad, Pakistan Abstract Recent studies by agriculture researchers in Pakistan have shown that attempts of crop yield maximization through pro-pesticide state policies have led to a dangerously high pesticide usage. These studies have reported a negative correlation between pesticide usage and crop yield in Pakistan. Hence excessive use (or abuse) of pesticides is harming the farmers with adverse financial, environmental and social impacts. In this work we have shown that how data mining integrated agricultural data including pest scouting, pesticide usage and meteorological recordings is useful for optimization (and reduction) of pesticide usage. The data used in this work has never been utilized in this manner ever before. We have performed unsupervised clustering of this data through Recursive Noise Removal (RNR) heuristic of Abdullah and Brobst (2003). These clusters reveal interesting patterns of farmer practices along with pesticide usage dynamics and hence help identify the reasons for this pesticide abuse. Keywords: Data Mining, Agriculture, Pesticide Abuse, cotton, Unsupervised Clustering. 1. Introduction Pakistan is one of the five major cotton growing countries in the world. Almost 70% of world cotton is produced in China (Mainland), India, Pakistan, USA and Uzbekistan (Chaudhry 2000). Pakistan is worlds 7 th most populous country; anticipating population growth, in 60’s and 70’s pesticides were identified as means for increasing production, as a positive correlation is believed to exist between yield and pesticide usage. However, FAO (2001) have reported the existence of a negative correlation between pesticide usage and yield in Pakistan (Figure 1). A marked increase in yield loss while the pesticide usage is on the rise has created a complex situation. Excessive use of pesticides is harmful in multiple ways. On one hand, farmers have to pay more for the pesticides, while on the other, increased pesticide usage develops immunity in pests, thus making them more harmful to the crops. Copyright (c) 2003, Australian Computer Society, Inc. This paper appeared at The Australasian Workshop on Data Mining and Web Intelligence (AWDM&WI2004), Dunedin, New Zealand. Conferences in Research and Practice in Information Technology, Vol. 32. Editors, James Hogan, Paul Montague, Martin Purvis and Chris Steketee. Reproduction for academic, not-for profit purposes permitted provided this text is included Excessive usage of many pesticides is also harmful for the environment and hazardous to human health. Figure 1: Yield and Pesticide Usage in Pakistan (FAO 2001) Pesticide usage can be reduced by looking for the conditions in which the usage is optimum and trying to dig out for the circumstances that lead the farmers to an excessive pesticide usage. This can best be done by looking for patterns in the past happenings. In this paper we have shown how data mining can be successfully applied for this purpose. We applied an indigenously developed data mining tool based on our “Clustering by Recursive Noise Removal” technique from Abdullah and Brobst (2003) to the pest scouting, pesticide usage and meteorological data from Pakistani cotton fields. Rest of the paper is organized as follows; Section 2 gives background of our work, Section 3 presents a brief review of related work, Section 4 and 5 describe structure and working of RNR algorithm, RNR application and discussion on results are presented in Section 6, while conclusions are summed up in Section 7. 2. Working Scenario To learn from the past one needs a detailed record of the past. In our case details of past pest situations, pesticide usage history and farmer demographics was required i.e. pest scouting data. Pest scouting is a systematic field sampling process that provides field specific information on pest pressure and crop injury.