Clustering-Based Method for Data Envelopment Analysis Hassan Najadat, Kendall E. Nygard, Doug Schesvold North Dakota State University Fargo, ND 58105 Abstract. Data Envelopment Analysis (DEA) is a powerful performance measurement in economic sector and operations research to assess the relative efficiency for each decision making unit (DMU). In general, there are two assumptions in DEA. Firstly, the DEA assumes that all DMUs are homogenous in their environments and secondly, the DEA is a deterministic approach which refers to not allow to noise or errors in measurements. A large number of papers have addressed the DEA models but not many of them have focused on the heterogonous of DMUs and on the scalability over large datasets (i.e. when datasets contain a large number of DMUs). In this paper, we propose a new method for determining efficiently the performance scores of non-homogenous DMUs based on clustering methods to discover the outliers early. Experimental results presented show big improvements for our approach in assessing a funding transportation system for school districts in North Dakota State. Keywords: Data envelopment analysis, Data mining, constraint-based cluster, outlier discovery, decision making unit. 1. Introduction The most important factor in economic activities is the productivity of each unit in an organization [SMK00]. Grossman discusses productivity improvement as representing one of the key competitive advantages of an enterprise [Gro93]. Managers use performance measurements to provide them with a strategic plan about organizations [LEB95]. By using the performance measurements, managers can adopt a long-term perspective, make communication more precise, and allocate the organization's resources to the most attractive improvements activities [SZ95]. This paper integrates two important fields of information technology: data mining and data envelopment analysis (DEA) to provide a new tool in measuring the performance of decision making units (DMU). The general motivation for this approach is to achieve synergy-producing results that could not be obtained if each model were operating individually. In economic sector, the goal is analyzing the performance assessment of actions, productions, or organizational units [Kle04] in order to improve different types of efficiency. This efficiency is calculated as a ratio of set weighted outputs to set weighted inputs. The growth of acceptance of the DEA methodology in measuring the effectiveness of large entities is evidence of its applicability [Emr04, CSZ04]. In data mining field, the goal is to extract useful information from large databases [HK01]. This information can be used in various applications such as financial markets analysis [AIS93, Ben01] and business management [BL99]. The DEA yields a detailed analysis for DMUs to determine the efficient and inefficient units in order to gain useful information for making further improvements. This information can discover unknown relationships among the data which includes identifying the most productive operating scale sizes, the savings in recourses, and the most suitable ways to enhance inefficient units [Tha01]. Thus, both fields (i.e. the data mining and the DEA) serve the goals of management of an organization to get the best guide to improve the productivity of organization. The number of research papers published on various DEA applications [Emr04] and data mining applications [HK01] build a solid base in academic fields and business applications for both areas. The DEA, which has the ability to measure the productivity of each DMU in the presence of multiple inputs and outputs [Sil86], is a mathematical model based on building a linear program for each DMU under evaluation. This linear program calculates the performance scores by constraining all DMUs to have an efficiency scores less than or equal to one. The work we are proposing herein takes not only the initiative of developing a framework for measuring the performance of funding school's districts in transportation operations and introducing valuable results from the economical perspective but also providing a new method of integrating the constraint-based clustering into DEA that relieves from the burden of including the whole school districts in calculating the efficiency score for each district. We provide extensive comparison analysis to show the characteristics of our method and how it will be compared under standard DEA in terms of the quality of results. The performance of these school districts is measured several times using different economical models to get the most suitable view of the situation