A New Smoothing Model for Analyzing Array CGH Data Nha Nguyen * , Heng Huang † , Soontorn Oraintara * and An Vo * * Department of Electrical Engineering, University of Texas at Arlington Email: nhn3175@exchange.uta.edu, oraintar@uta.edu, vpnan@gauss.uta.edu † Corresponding Author Department of Computer Science and Engineering, University of Texas at Arlington Email: heng@uta.edu Abstract—Array based Comparative Genomic Hybridization (CGH) is a molecular cytogenetic method for the detection of chromosomal imbalances and it has been extensively used for studying copy number alterations in various cancer types. Our method captures both the intrinsic spatial change of genome hy- bridization intensities, and the physical distance between adjacent probes along a chromosome which are not uniform. In this paper, we introduce a dual-tree complex wavelet transform method with the bivariate shrinkage estimator into array CGH data smoothing study. We tested the proposed method on both simulated data and real data, and the results demonstrated superior performance of our method in comparison with extant methods. I. I NTRODUCTION Array-based comparative genomic hybridization (array CGH) is a highly efficient technique, allowing the simulta- neous measurement of DNA copy numbers across the whole genome at hundreds or thousands of loci and the reliable detection of local one-copy-level variations. Characterization of these DNA copy number changes is important for both the basic understanding of cancer and its diagnosis. In order to develop effective methods to identify aberration regions from array CGH data, many recent research works focus on smoothing-based data processing. For example, Eilers and De Menezes proposed a quantile regression method that employs an L1 error for both of fitness measure and roughness penalty [1]. Hsu et al. [2] used wavelet transform to fit the data. In this paper, we introduce a dual-tree complex wavelet transform method with the bivariate shrinkage estimator into array CGH data smoothing study. The unequal spacing of probes on the chromosome is taken into account. Using the synthetic data, our experimental results demonstrate our method overperforms the previous methods. In terms of the root mean squared error measurement at different noise levels, our method improves about 17.8% − 43% than other methods. Furthermore, we also use the real array CGH data to validate the efficiency of our method. II. WAVELET METHODS In this section, we provide a brief review of wavelet transforms which were used for array CGH data smoothing and is used by this paper. h 2 g 2 h 2 g 2 h 2 g 2 (a) 2 2 h g 2 2 h g 2 2 h g (b) Fig. 1. A 3 level DWT. (a) Analysis FB, (b) Synthesis FB . A. Discrete Wavelet Transform The discrete wavelet transform (DWT), based on the octave band tree structure, can be viewed as the multiresolution decomposition of a signal. Fig. 1 shows 3 level DWT analysis and synthesis filter banks (FBs). It takes a length N sequence, and generates an output sequence of length N using a set of lowpass and highpass fiters followed by a decimator. It has N/2 values at the highest resolution, N/4 values at the next resolution, and N/2 L at the level L. Because of decimation, the DWT is a critically sampled decomposition. However, the drawback of DWT is the shift variant property. In signal denoising, the DWT creates artifacts around the discontinuities of the input signal [3]. These artifacts degrade the performance of the threshold-based denoising algorithm. B. Stationary Wavelet Transform The stationary wavelet transform (SWT) [3] is similar to the DWT except that it does not employ a decimator after filtering, and each level’s filters are upsampled versions of the previous ones. The SWT is also known as the shift invariant DWT. The absence of a decimator leads to a full rate decomposition. Each subband contains the same number of samples as the input. So for a decomposition of L levels, there is a redundant ratio of (L + 1) : 1. However, the shift invariant property of the SWT makes it preferable for the usage in various signal processing applications such as denoising and classification