Objective Evaluation of Methods for Border Detection in Dermoscopy Images M. Emre Celebi, Gerald Schaefer, and Hitoshi Iyatomi Abstract— Dermoscopy is one of the major imaging modal- ities used in the diagnosis of melanoma and other pigmented skin lesions. Due to the difﬁculty and subjectivity of human interpretation, dermoscopy image analysis has become an important research area. Border detection is often the ﬁrst step in the automated analysis of dermoscopy images. Although numerous methods have been developed for the detection of lesion borders, very few studies were comprehensive in the evaluation of their results. In this paper, we evaluate ﬁve recent border detection methods on a set of 90 dermoscopy images using three sets of dermatologist-drawn borders as the ground- truth. In contrast to previous work, we utilize an objective measure, the Normalized Probabilistic Rand Index, which takes into account the variations in the ground-truth images. The results demonstrate that the differences between four of the evaluated border detection methods are in fact smaller than those predicted by commonly used measures. I. INTRODUCTION Malignant melanoma, the most deadly form of skin cancer, is one of the most rapidly increasing cancers in the world, with an estimated incidence of 62,480 and an estimated total of 8,420 deaths in the United States in 2008 alone [1]. Early diagnosis is particularly important since melanoma can be cured with a simple excision if detected early. Dermoscopy, also known as epiluminescence microscopy, has become one of the most important tools in the diagnosis of melanoma and other pigmented skin lesions. This non- invasive skin imaging technique involves optical magniﬁca- tion, which makes subsurface structures more easily visible when compared to conventional clinical images [2]. This in turn reduces screening errors, and provides greater differ- entiation between difﬁcult lesions such as pigmented Spitz nevi and small, clinically equivocal lesions [3]. However, it has also been demonstrated that dermoscopy may actually lower the diagnostic accuracy in the hands of inexperienced dermatologists [4]. Therefore, in order to minimize the diag- nostic errors that result from the difﬁculty and subjectivity of visual interpretation, the development of computerized image analysis techniques is of paramount importance. Automated border detection is often the ﬁrst step in the au- tomated or semi-automated analysis of dermoscopy images. This work was supported by grants from the Louisiana Board of Regents (LEQSF2008-11-RD-A-12) and the Ministry of Education, Culture, Science, and Technology of Japan (Grant-in-Aid for Scientiﬁc Research C, 20591461, 2008-2010). M. Emre Celebi is with the Department of Computer Science, Louisiana State University in Shreveport, Shreveport, LA 71115 USA (ecelebi@lsus.edu). Gerald Schaefer is with the School of Engi- neering and Applied Science, Aston University, Aston Triangle, Birmingham B4 7ET, UK (g.schaefer@aston.ac.uk). Hitoshi Iyatomi is with the Department of Electrical Informatics, Hosei University, 3-7-2 Kajino- cho Koganei, Tokyo 184-8584, Japan (iyatomi@hosei.ac.jp). It is crucial for the image analysis for two main reasons. First, the border structure provides important information for accurate diagnosis as many clinical features such as asymmetry, border irregularity, and abrupt border cutoff are calculated directly from the border. Second, the extraction of other important clinical features such as atypical pigment networks, globules [5], and blue-white areas [6] critically depends on the accuracy of border detection. Automated border detection is a challenging task due to several reasons: i) low contrast between the lesion and the surrounding skin, ii) irregular and fuzzy lesion borders, iii) artifacts such as hair, bubbles, and skin lines, and iv) variegated coloring inside the lesion. Numerous methods have been developed for border de- tection in dermoscopy images. Recent approaches include fuzzy c-means clustering [7][8], gradient vector ﬂow snakes [9], thresholding followed by region growing [10], mean- shift clustering [11], color quantization followed by spatial segmentation [12], statistical region merging [13], and two- stage k-means++ clustering followed by region merging [14]. Some of these studies used subjective visual examination to evaluate their results [7][8]. Others used objective measures including Hance et al.’s [15] XOR measure [9][12][13][14], pixel misclassiﬁcation probability [16], sensitivity & speci- ﬁcity [11], and recall & precision [10]. These measures require borders drawn by dermatologists, which serve as the ground truth. In this paper, we refer to the borders detected by automated methods as automatic borders and those drawn by dermatologists as manual borders. In a recent study, Guillod et al. [16] demonstrated that a single dermatologist, even one who is experienced in dermoscopy, cannot be used as an absolute reference for evaluating border detection accuracy. In addition, they em- phasized that manual borders are not precise, with inter- dermatologist borders and even borders determined by the same dermatologist at different times showing signiﬁcant disagreement, so that a probabilistic model of the border is preferred to an absolute gold-standard model. Only a few of the above-mentioned studies used borders drawn by multiple dermatologists. Guillod et al. [16] used ﬁfteen sets of borders drawn by ﬁve dermatologists over a minimum period of one month. They constructed a probabil- ity image for each lesion by associating a misclassiﬁcation probability with each pixel based on the number of times it was selected as part of the lesion. The automatic bor- ders were then compared against these probability images. Iyatomi et al. [10] simpliﬁed Guillod et al.’s method by combining the manual borders that correspond to each image