The 2010 IEEE I ICME Interational Conference on Complex Me d ical Engineering July l3-15, 20 I 0, Gol d Coast, Australia Clustering Nuclei Using Machine Learning Techniques Yu Peng, Mira Pak, Min Xu, Suhuai Luo, Jesse S.Jin, Yue Cui, W. S. Felix Wong Leonardo D. Santos Abstrct - Cervical cancer is the second most common cancer among women. Meanwhile, cervical cancer could be largely preventable and curable with regular Pap tests. Nuclei changes in the cervix could be found by this test. Accurate nuclei detection is extremely critical as it is the previous step of analysing nuclei changes and diagnosis afterwards. Recently, computer-aided nuclei segmentation has increased dramatically. Although such algorithms could be utilised in the situation for sparse nuclei since they are intuitively detected, the segmentation for the complicated nuclei clusters is still challenging task. This paper presents a new methodology for the detection of cervical nuclei clusters. We frst detect all the nuclei from the cervical microscopic image by an ellipse ftting algorithm. Second, we chose some high-relevant features from all the features we obtained in last step via F-score, which is based on to what extent one feature attributes to results. All the ellipses are then classifed into single ones and cluster ones by C4.5 decision tree with selected features. We evaluated the performance of this method by the classifcation accuracy, sensitivity, and cluster predictive value. With the 9 selected features from the original 13 features, we came by the promising classifcation accuracy (97.8%) . I.INTRODUCTION Cancer is a group of diseases in which cells in the body grow, change, and multiply out of control. Cervical cancer refers to the erratic growth of cells that originate in the tissues of a cervix. It is usually a slow-growing cancer that may not have symptoms but can be found with regular Pap tests. According to U.S. National Cancer Institute, cervical cancer is the second most common cancer in women, and the third most frequent cause of cancer death, accounting for nearly 300,000 deaths annually worldwide, especially in middle and low income countries. Fortunately, cervical cancer could be largely preventable and curable with regular Pap tests, which is used to find cell changes in the cervix [1]. Recently, computer assisted screening and applications Manuscript received February 5, 2010. This work was supported in part by the CSC-Newcastle Scholarship ,ARC LP0669645 and IntelliRAD .Yu Peng, Jesse S. Jin, Mira Park, Suhuai Luo, Min Xu and Yue Cui are with the School of Design, Communication and IT, University of Newcastle, Callaghan 2308 Australia (e-mail: Yu.Peng@uon.edu.au ). W. S. Felix Wong is with the Department of Obstetrics and Gynaecology, School of Medicine, University of New South Wales, Sydney, NSW 2052, Australia. Leonardo D. Santos is with Department of Anatomical Pathology, Sydney South West Pathology Service, Liverpool Hospital, Liverpool NSW 2170, Australia (e-mail: Leonardo.Santos@sswahs.nsw.gov.au). 978-1-4244-6843-01101 $ 26.00 (c)2010 IEEE 52 of digital image are widely reache d for cervical cancer diagnosis an d treatment [2].The use of image segmentation in Pap tests is increasing gradually. There is no doubt that the more cervical cells can be detected, the more analysis of cells change can be done. Abnormal cells could be treated before they tum into cervical cancer or in an early stage. Images segmentation is the first step towards image understanding and image analysis [3].To increase the accuracy in computer-assisted diagnosis, accurate nuclei segmentation is crucial. Afer nuclei detection, the features of the individual nucleus could be obtained and analyse d . Cytological features of a tissue image including nuclei count, nuclei size distribution, an d nuclei shape distribution are significant features for decision making in pathology [4]. The features can be acquire d easily by image segmentation when the nuclei are separated in images. However, in pathological con d itions, nuclei in tissues are mostly clustered. Overlooking clustered nuclei and analysing only isolated nuclei can dramatically increase analysis time or affect the statistical validation of the result [5]. Therefore, the solution is accurately detecting clustered regions and isolated nuclei before applying segmentation algorithms such as a watershed algorithm, interactive region growing to segment clustered nuclei. Currently, many techniques of the discriminating isolated nuclei and clustered nuclei have been employed based on a certain features of objects, such as object area, perimeter and circularity [6]. The convex hull based method is one of the most widely used techniques. Clustere d nuclei could be identified when a ratio between the smallest convex polygon of each object and each real vector space beyond a certain threshold [3]. Hereby, the smallest convex polygon for a set of points(S) in a real vector space of S is the minimal convex set containing S. it is common to use the term 'convex hull' for this kind convex polygon. However, the inaccuracy could be brought about by the incomprehensiveness of cluster detection only based on few features. It is easy to omit some other features more related with result. And, this feature selection method itself is mainly based on experience. To solve this problem, we proposed a decision tree based method to detect clustered nuclei by using as many features of each object as possible.