Mining sustainability indicators to classify hydrocarbon development Muhammad Shaheen a, , Muhammad Shahbaz a, , Aziz Guergachi b , Zahoorur Rehman a a Department of Computer Science & Engineering, University of Engineering & Technology, Lahore, Pakistan b Information Technology Management Ryerson University, Toronto, ON, Canada article info Article history: Received 8 December 2010 Received in revised form 23 April 2011 Accepted 23 April 2011 Available online 29 April 2011 Keywords: Sustainability indicators Clustering Decision Tree Data mining Hydrocarbon development Energy development abstract The role of energy in economic, social and ecological development of a country defines its significance in sustainable development. We propose here a method to classify a nation’s hydrocarbon development into one of five classes: (1) futuristic; (2) conforming; (3) sustainable; (4) unsustainable; or, (5) critical. K means clustering is a method of unsupervised classification in which the clusters cannot be labeled due to their lack of a class value. We propose a unique method to label unsupervised classes which is then used to divide the energy data of nations into five clusters. The labeled clusters are structured in an ID3 decision tree which provides a hierarchical structure to evaluate the hydrocarbon development in a given country. The results indicate some useful and interesting patterns in sustainability indicators. Ó 2011 Elsevier B.V. All rights reserved. 1. Introduction Sustainability is defined as, ‘‘The development that meets the needs of today without compromising the ability of future genera- tions to meet their needs’’ [4]. Energy plays vital role in the socio-eco-economic development of a country. Energy is globally available in different forms; the most common energy is derived from hydrocarbons. In our energy based global economy, energy providers are desperately looking for ways to extract hydrocarbons to meet the needs of consumption. One third of the world’s popu- lation relies on the use of animal power and non-commercial fuels and almost two billion people lack access to electricity [32]. Energy is precious as it leads to better living standards, health, environ- ment and prosperity but two key questions remain: Why are the large reserves of energy in developing countries failing to bring remarkable change in the energy dependent socio-eco-economic dimensions? And, despite adequate energy planning, why do energy extraction and distribution practices still lead to wide-scale economic recession? There is an absence of a procedure to assess energy develop- ment in any given country. In 2001, at a world summit on sustain- able development in South Africa, 41 indicators for sustainable energy development were proposed [35], but only some of these variables are quantifiable. Many of the indicators are not directly related to energy development because energy is often framed only as a means to an end. The end is sustainable develop- ment for a nation’s economy, ecology and social welfare. An anno- tated list of these indicators with descriptions is presented in Table 1. The meaning of sustainable development varies in different contexts. In the case of energy, development is considered to be sustainable if the consumption rate conforms to the production rate. The first Enquête Commission formulated four rules of sus- tainability which emphasize the need for environmental protection and growth in natural resources. The commission also focused on potential effects of the above synergy on social and economic con- ditions [18]. One of the rules concerns conformity in consumption and production of energy resources. An imbalance between these factors may cause the depletion-midpoint of crude oil to be be- tween 2010 and 2020 [30]. However, the natural gas market is younger and its depletion is not expected to occur in the next 60 years. Data mining has increased the opportunities for decision mak- ers to extract useful implicit knowledge from a large pool of col- lected data [29]. Data mining of substantial datasets results in the supervised/unsupervised classification of datasets or the pre- diction of an unknown real value [3,6]. Clustering is an unsuper- vised data mining technique that is used to classify patterns into clusters. A number of methods are proposed for clustering different types of datasets [8,26,16]. There is currently no method to assess a country’s hydrocarbon development using the indicators put forward in the IAEA’s sus- tainability indicators proposed in 2001. We propose a method to 0950-7051/$ - see front matter Ó 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.knosys.2011.04.016 Corresponding authors. Tel.: +92 3314525045 (M. Shaheen), +92 3027424229 (M. Shahbaz). E-mail addresses: shaheen@uet.edu.pk (M. Shaheen), m.shahbaz@uet.edu.pk (M. Shahbaz), a2guerga@ryerson.ca (A. Guergachi), xahoor@gmail.com (Z. Rehman). Knowledge-Based Systems 24 (2011) 1159–1168 Contents lists available at ScienceDirect Knowledge-Based Systems journal homepage: www.elsevier.com/locate/knosys