DETECTING AND ANALYZING COPY NUMBER ALTERNATIONS IN ARRAY-BASED CGH DATA Mariam A. Sheha * ,§ , Mai S. Mabrouk , and Mahmoud Elhefnawi ,|| * Systems and Biomedical Engineering Department Cairo University, Giza, Egypt Biomedical Engineering Department Misr University for Science and Technology (MUST), Giza, Egypt Biomedical Informatics and Chemoinformatics Research Group Department of Informatics and Systems Center of Excellence for Advanced Sciences National Research Center, Giza, Egypt § Mariam_sheha@hotmail.com msm eng@yahoo.com || mahef111@gmail.com Accepted 26 October 2016 Published 19 December 2016 ABSTRACT Copy number changes or alterations are a form of genetic variation in the human genome. Genomic DNA copy number alterations (CNAs) are associated with the development and progression of cancers. Array-based comparative genomic hybridization (a-CGH) is a technique used to identify copy number changes in genomic DNA. It yields data consisting of °uorescence intensity ratios of test and reference DNA samples. The intensity ratios provide information about the number of copies in DNA. Practical issues such as the contamination of tumor cells in tissue specimens and normal- ization errors necessitate the use of automated statistics algorithms for learning about the genomic alterations from array CGH data. Speci¯cally, there is a need for algorithms that can identify gains and losses in the number of copies based on statistical considerations, rather than merely detect trends in the data. For this purpose the proposed study introduces three di®erent approaches; Circular binary segmentation, Bayesian approach, relying on the hidden Markov model and e®ective Gaussian mixture (GM) clustering for the analysis of array CGH pro¯les. Publicly available data on pancreatic adenocarcinoma and Coriell cell line bacterial arti¯cial chromosome (BAC) array were used for the analysis to illustrate the reliability and success of the techniques. Keywords: Array-based comparative genomic hybridization (a-CGH); Hidden of Markov model (HMM); Circular binary segmentation (CBS); Gaussian mixture (GM); pancreatic adenocarcinoma; Coriell cell line bacterial. INTRODUCTION Chromosomes are the structures in each of the body's cells, it carries the genetic information (DNA) that tells the body how to develop and function. They come in pairs, one from each parent, and are numbered 1 to 22 approximately from largest to smallest. Each person has Corresponding author: Mai S. Mabrouk, Biomedical Engineering Department, Misr University for Science and Technology (MUST), Giza, Egypt. E-mail: msm eng@yahoo.com Biomedical Engineering: Applications, Basis and Communications, Vol. 28, No. 6 (2016) 1650044 (15 pages) DOI: 10.4015/S1016237216500447 1650044-1