Experiments with SVM and stratified sampling with an imbalanced problem: Detection of intestinal contractions Fernando Vilariño, Panagiota Spyridonos, Petia Radeva, Jordi Vitrià Computer Vision Center. Universitat Autonòma de Barcelona. Spain {fernando@cvc.uab.es } ABSTRACT In this paper we show some preliminary results of our research in the fieldwork of classification of imbalanced datasets with SVM and stratified sampling. Our main goal is to deal with the clinical problem of automatic intestinal contractions detection in endoscopic video images. The prevalence of contractions is very low, and this yields to highly skewed training sets. Stratified sampling together with SVM has been reported in the literature to behave well in this kind of problems. We applied both the SOMOTE algorithm developed by Chawla et al. and under-sampling, in a cascade system implementation to deal with the skewed training sets in the final SVM classifier. We show comparative results for both sampling techniques using precision-recall, which appear to be useful tools for performance testing. 1. Introduction Automatic detection of intestinal contractions is one paradigmatic example of classification with imbalanced datasets. Its prevalence is very low, and the data analysis requires high amounts of expert time. Both the number of intestinal contractions, and their temporal distribution along the intestinal tract, characterize small bowel motility patterns that are indicative of the presence of different malfunctions. Different techniques have been applied for intestinal motility analysis in several medical imaging modalities. A good review about this issue can be found in (Hansen; 2002). The novelty of our research in this fieldwork relies on the use of Wireless Capsule Video Endoscopy images (WCVE) (Schulmann et al.; 2005; Brodsky; 2003; Eliakim; 2004). In this clinical domain, the specialist has to analyse a video, and manually label each frame where a contraction event happens. Usually, each video analysis may last one or two hours, and among a typical quantity of 20.00 frames, only 700 contractions are reported. We focused our efforts on the automatic detection of intestinal contractions using video as data source. We have trained a SVM system with contraction and non-contractions frames from several videos, previously labelled by hand by the experts. The choice of SVM (Vapnik; 1995) is underpinned by the fact that empirical results show a good behaviour of this technique with moderate skewed datasets. Recently, several methods have been developed to improve the performance of SVM classifiers on imbalanced problems (Akbani et al ; 2004; Brank et al; 2003). Stratified sampling is based on re-sampling the original datasets in different ways: under- sampling the majority class or over-sampling the minority class. In this work, we show the preliminary results of a comparative study of under-sampling vs. SMOTE over-sampling technique –developed by (Chawla et al; 2002)-.Both techniques SVM and other popular single classifiers have been applied. With the purpose of reducing the imbalanced character of the datasets, we use a 2-steps cascade methodology that prunes false positives without loosing sensitivity. In order to assess the performance of the different methods implemented, precision-recall curves are proposed. The main advantage of these plots is that they show both the sensitivity of the