Selecting Features of Single Lead ECG Signal for Automatic Sleep Stages Classification using Correlation-based Feature Subset Selection Ary Noviyanto 1 , Sani M. Isa 2 , Ito Wasito 3 and Aniati Murni Arymurthy 4 1 Computer Science, Universitas Indonesia Depok 16424/Jawa Barat, Indonesia 2 Computer Science, Universitas Indonesia Depok 16424/Jawa Barat, Indonesia 3 Computer Science, Universitas Indonesia Depok 16424/Jawa Barat, Indonesia 4 Computer Science, Universitas Indonesia Depok 16424/Jawa Barat, Indonesia Abstract Knowing about our sleep quality will help human life to maximize our life performance. ECG signal has potency to determine the sleep stages so that sleep quality can be measured. The data that used in this research is single lead ECG signal from the MIT-BIH Polysomnographic Database. The ECG’s features can be derived from RR interval, EDR information and raw ECG signal. Correlation-based Feature Subset Selection (CFS) is used to choose the features which are significant to determine the sleep stages. Those features will be evaluated using four different characteristic classifiers (Bayesian network, multilayer perceptron, IB1 and random forest). Performance evaluations by Bayesian network, IB1 and random forest show that CFS performs excellent. It can reduce the number of features significantly with small decreasing accuracy. The best classification result based on this research is a combination of the feature set derived from raw ECG signal and the random forest classifier. Keywords: ECG features, Correlation-based Feature Subset Selection, RR interval, EDR, Raw ECG Signal, Sleep stages. 1. Introduction The quality of sleep directly affects the quality of life. Using a particular measure [1], we can calculate the sleep quality of somebody by knowing the composition of his/her sleep stages. Sleep experts analyze polysomnogram data as the standard technique to determine the sleep stages. Polysomnogram is a simultaneous recording of physiological variables during sleep that include brain activity (electroencephalogram, EEG), eye movements (electroocculogram, EOG), and chin muscle activity (electromyohram, EMG) [2]. Based on the previous works [3, 4, 5, 6], ECG as a substitute of the standard technique to determine the sleep stages has promising results. The main reason of using ECG is the expensiveness of recording process of the polysomnogram data. The data gathering has to do in a sleep laboratory that is expensive with uncomfortable processes for patients and also require trained staff. Another issue is that the sleep study (polysomnography) is costly. It means that the manual determination of the sleep stages in a long sequence of the polysomnogram data is a work that requires endurance and high accuracy. The manual determination of the sleep stages can also trigger lack standard of the sleep stages determination (i.e. every sleep expert may have different results in the sleep stages determination). The automatic processes in the polysomnography are necessary to handle the issue in the manual determination of sleep stages. There are two groups of sleep in sleep architecture; non- rapid eye movement (NREM) sleep and rapid eye movement (REM) sleep [7]. The NREM sleep can be divided into NREM 1, NREM 2, NREM 3 and NREM 4. The graphic that represents the sleep stages sequence, called hypnogram, is shown in Figure 1. ECG (Electrocardiogram), or sometimes called EKG, simply is IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 5, No 1, September 2011 ISSN (Online): 1694-0814 www.IJCSI.org 139