Spectral Data Self-organization Based on Bootstrapping and Clustering Approaches Ioanna Vourlaki [1], George Livanos [1], Michalis Zervakis [1], Costas Balas [1] [1] Department of Electronics and Computer Engineering Technical University of Crete Chania, P.C. 73100, Crete, Greece E-mail: michalis@display.tuc.gr George Giakos [2] [2] Department of Electrical and Computer Engineering Science and Technology Integration Research Laboratory Manhattan College Riverdale, NY E-mail: giakos@intelphoton.com Abstract—This study introduces a novel technique for self- organizing data, without any prior knowledge on their statistical distribution, fusing efficient strategies from clustering and resampling. The proposed methodology aims at searching for hidden characteristics within the processed dataset and revealing additional data structures or subclasses that can be utilized for identifying irregular groups that are of particular importance in disease modeling. The performance evaluation of the presented algorithm to biomedical data from cervical cancer is tested and analyzed on sample vectors representing the temporal response of tissue areas obtained through multispectral imaging. The results of this study show that stratified, repeated applications of simple clustering schemes can effectively organize big data, giving rise to the application of the proposed method for tissue classification for enabling accurate and early disease diagnosis. Keywords—multispectral analysis, cancer diagnosis, bootstrapping, clustering, data organization I. INTRODUCTION Cervical cancer constitutes one of the most common types of cancer expressed in women worldwide, especially in women under 40 years-old [1]. In this sense, there exists a wide scientific interest in prognosis, early diagnosis and treatment of precancerous lesions. Despite the achievements of cytology and colposcopy that have resulted in reduced rates of cervical cancer morbidity and mortality, many lesions still remain undetected or overestimated, leading to patients' health risk or their prompt to unnecessary biopsies respectively. Thus, reliable, cost-effective and accurate tissue screening and testing methods must be utilized. Pap smear, optics, spectroscopy and high-resolution imaging methods are among the key directions for efficient cervical cancer screening [2, 3]. Tissue evaluations take place considering the alteration of morphological and biochemical properties of the cervical sections and cells, indicating a malignancy evolution. A quite detailed description of the various screening approaches that outperform conventional cytology is presented in [4]. Cervical cancer is expressed when abnormal cells on the cervix, the lower part of the uterus that opens into the vagina, grow up in an uncontrollable way. This kind of cancer can often be successfully treated when detected in early stages, especially since screening tests and a vaccine to prevent the human papilloma virus (HPV) [5], the main cause of cervical cancer, are readily available. Cervical intraepithelial neoplasia (CIN) is believed that precedes invasive cervical cancer, which, when found early, is highly treatable and associated with long survival and good quality of life. Chemical substances, known as optical biomarkers, are often used in order to increase the confidence of clinicians in cancer diagnosis and staging [6]. Imaging techniques are limited by the inherently weak optical signals if endogenous chromospheres and fluorophores are used and also by the subtle spectral differences of normal and diseased biological samples. In the case of cervical cancer, topical application of acetic acid (AA) solution 3-5% is routinely used as a contrast agent for more than 70 years in order to highlight the abnormal areas [7]. The agent-tissue interaction generates an optical signal, which is perceived as transient tissue whitening. Clinical evidence supports that the degree and duration of the latter is associated with the lesion’s grade, with the phenomenon known as acetowhitening (AW) effect. The method dictates the application of acetic acid to the cervix exterior for visualizing the biochemical reaction on it. The phenomenon can be observed under incandescent light, without magnification, producing a low intensity chemiluminescent light, which allows the medical expert to have a subjective interpretation of the epithelial condition in vivo. In 2001, Balas [8] developed a novel multispectral imaging system, capable of performing time-resolved spectroscopy, for the in vivo early detection, quantitative staging and mapping of cervical cancer. This technique is based on measuring the modifications of the light scattering properties of the cervix, observed in cases of cervical neoplasias, after applying acetic acid solution to the examined tissue section. The processing and analysis of the optically enhanced output images revealed the increased sensitivity to detect incipient lesions, the priceless capability to extract additional, specific information regarding the evolution of the disease and the ability to discriminate neoplasias of different grade. Apart from the selection of the imaging modality for the assessments of cervical cancer, the methodology for extracting, processing and interpreting the relevant information from the available data is of paramount importance. In essence, the examined cases are represented by feature vectors reflecting