SiDCoN: A Tool to Aid Scoring of DNA Copy Number Changes in SNP Chip Data Derek J. Nancarrow 1 *, Herlina Y. Handoko 1 , Mitchell S. Stark 1 , David C. Whiteman 2 , Nicholas K. Hayward 1 1 Oncogenomics, Queensland Institute of Medical Research, Herston, Queensland, Australia, 2 Cancer and Population Studies, Queensland Institute of Medical Research, Herston, Queensland, Australia The recent application of genome-wide, single nucleotide polymorphism (SNP) microarrays to investigate DNA copy number aberrations in cancer has provided unparalleled sensitivity for identifying genomic changes. In some instances the complexity of these changes makes them difficult to interpret, particularly when tumour samples are contaminated with normal (stromal) tissue. Current automated scoring algorithms require considerable manual data checking and correction, especially when assessing uncultured tumour specimens. To address these limitations we have developed a visual tool to aid in the analysis of DNA copy number data. Simulated DNA Copy Number (SiDCoN) is a spreadsheet-based application designed to simulate the appearance of B-allele and logR plots for all known types of tumour DNA copy number changes, in the presence or absence of stromal contamination. The system allows the user to determine the level of stromal contamination, as well as specify up to 3 different DNA copy number aberrations for up to 5000 data points (representing individual SNPs). This allows users great flexibility to assess simple or complex DNA copy number combinations. We demonstrate how this utility can be used to estimate the level of stromal contamination within tumour samples and its application in deciphering the complex heterogeneous copy number changes we have observed in a series of tumours. We believe this tool will prove useful to others working in the area, both as a training tool, and to aid in the interpretation of complex copy number changes. Citation: Nancarrow DJ, Handoko HY, Stark MS, Whiteman DC, Hayward NK (2007) SiDCoN: A Tool to Aid Scoring of DNA Copy Number Changes in SNP Chip Data. PLoS ONE 2(10): e1093. doi:10.1371/journal.pone.0001093 INTRODUCTION Single nucleotide polymorphism (SNP) microarrays provide data on both genotype and signal intensity, the combination of which can be used to generate information on chromosomal segment copy number. An increasing number of studies utilise whole- genome high density SNP chips to generate DNA copy number profiles for a variety of tumour types. Kits and software tools are now commercially available for this purpose from a number of suppliers. This emerging technology has distinct advantages over previous karyotype-based comparative genome hybridization (CGH) methods [1] and analytic methods are evolving rapidly. When applying these SNP microarrays (SNP-aCGH) to cancer research, the aim is to synthesize a comprehensive DNA copy number profile which maps aberrations across the entire genome within individual tumour samples. There are several method papers devoted to the analysis of DNA copy number using SNP array platforms [2–5] and dedicated software functions are available in commercial applications. There are two broad approaches to this work: 1) identifying statistically significant genomic regions of change (e.g. Colella and coworkers [2]); 2) developing tools to auto-analyse the data to generate genome- wide, sample specific DNA copy number profiles. The success of SNP-aCGH for mapping sample specific DNA copy number changes stems from the ability to combine CGH and loss of heterozygosity (LOH) studies in the same analysis. As is often the case with new biotechnology, the analysis procedures lag behind the experimental advancements in terms of simplicity and flexibility. While commercially available software applications provide analysis algorithms to identify significant regions of change, we [6] have found this to be inadequate for generating a whole-genome view of DNA copy number changes without heavy manual interpretation. In SNP-aCGH analyses the resulting genotype data consist of intensity values for two channels corresponding to the fluorophors associated with the A & B alleles (attached to specific oligos/beads). Data can be plotted as raw A versus raw B intensity plots, however several refined data presentation methods have proven more useful. One of these, log 2 of the sample intensity to reference intensity ratio (logR), provides a continuous measure of the CGH component of the data. In this case, the signal intensity of each SNP in the target sample is expressed as a ratio over that of the normal sample or reference pool. Log 2 of this ratio provides an effective means to curtail the range of outlying values. While the variability of individual logR values is large, due to variances in PCR conditions and primer sequences, modified algorithms such as that of Nannya and coworkers [7] and the Illumina proprietary method, as well as the application of a moving average, are available to reduce the effects of this variation across a chromosomal region. These features, including a proprietary algorithm for SNP normalisation, are built in to the Illumina Beadstudio 2&3 software packages. Another key SNP-aCGH data presentation track, Allele B frequency (Ballele), visualises the LOH component. By adjusting Academic Editor: Anja-Katrin Bielinsky, University of Minnesota, United States of America Received September 12, 2007; Accepted October 4, 2007; Published October 31, 2007 Copyright: ß 2007 Nancarrow et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This research was supported by grant number CA 001833-03 from the United States National Cancer Institute, the Queensland Cancer Fund and the National Health and Medical Research Council (NHMRC) of Australia (Program no. 199600). David Whiteman is supported by Senior Research Fellowships from the National Health and Medical Research Council of Australia. The funding bodies played no role in the design or conduct of the study; the collection, management, analysis, or interpretation of the data; or the preparation, review, or approval of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * To whom correspondence should be addressed. E-mail: derekN@qimr.edu.au PLoS ONE | www.plosone.org 1 October 2007 | Issue 10 | e1093