Applying k-Nearest Neighbour in Diagnosing Heart Disease Patients Mai Shouman 1* , Tim Turner 1 , Rob Stocker 1 1 School of Engineering and Information Technology University of New South Wales at the Australian Defence Force Academy Northcott Drive, Canberra ACT 2600 Abstract. Heart disease is the leading cause of death in the world over the past 10 years. Researchers have been using several data mining techniques to help health care professionals in the diagnosis of heart disease. K-Nearest-Neighbour (KNN) is one of the successful data mining techniques used in classification problems. However, it is less used in the diagnosis of heart disease patients. Recently, researchers are showing that combining different classifiers through voting is outperforming other single classifiers. This paper investigates applying KNN to help healthcare professionals in the diagnosis of heart disease. It also investigates if integrating voting with KNN can enhance its accuracy in the diagnosis of heart disease patients. The results show that applying KNN could achieve higher accuracy than neural network ensemble in the diagnosis of heart disease patients. The results also show that applying voting could not enhance the KNN accuracy in the diagnosis of heart disease. Keywords: Data Mining, K-Nearest-Neighbour, Voting, Heart Disease. 1. Introduction Heart disease is the leading cause of death in the world over the past 10 years. The World Health Organization reported that heart disease is the first leading cause of death in high and low income countries [1]. The European Public Health Alliance reported that heart attacks and other circulatory diseases account for 41% of all deaths [2]. The Economic and Social Commission of Asia and the Pacific reported that in one fifth of Asian countries, most lives are lost to non-communicable diseases such as cardiovascular, cancers, and diabetes diseases [3]. The Australian Bureau of Statistics reported that heart and circulatory system diseases are the first leading cause of death in Australia, causing 33.7% all deaths [4]. Motivated by the world-wide increasing mortality of heart disease patients each year and the availability of huge amount of patients’ data that could be used to extract useful knowledge, researchers have been using data mining techniques to help health care professionals in the diagnosis of heart disease [5-6]. Data mining is an essential step in knowledge discovery. It is the exploration of large datasets to extract hidden and previously unknown patterns, relationships and knowledge that are difficult to be detected with traditional statistical methods [7-11]. The application of data mining is rapidly spreading in a wide range of sectors such as analysis of organic compounds, financial forecasting, healthcare and weather forecasting [12]. Data mining in healthcare is an emerging field of high importance for providing prognosis and a deeper understanding of medical data. Healthcare data mining attempts to solve real world health problems in diagnosis and treatment of diseases [13]. Researchers are using data mining techniques in the medical diagnosis of several diseases such as diabetes [14], stroke [15], cancer [16], and heart disease [17]. Several data mining techniques are used in the diagnosis of heart disease showing different levels of accuracy. K-Nearest-Neighbour (KNN) is one of the most widely used data mining techniques in pattern recognition and classification problems [18]. Recently Paris et al. examined single classifiers and combining different classifiers through voting and showed that voting outperformed other single classifiers [19]. This paper investigates applying KNN in the diagnosis of heart disease on the benchmark dataset to allow comparisons with other data mining techniques used on the same dataset. It also investigates if integrating * Corresponding Author. Tel: +61 2 6268 8034 Fax: +61 2 6268 8581 m.shouman@adfa.edu.au