Journal of the Korean Data & Information Science Society 2011, 22(5), 967–976 A note on Box-Cox transformation and application in microarray data Mezbahur Rahman 1 · Namyong Lee 2 12 Department of Mathematics and Statistics, Minnesota State University Received 24 April 2011, revised 23 June 2011, accepted 21 August 2011 Abstract The Box-Cox transformation is a well known family of power transformations that brings a set of data into agreement with the normality assumption of the residuals and hence the response variable of a postulated model in regression analysis. Normalization (studentization) of the regressors is a common practice in analyzing microarray data. Here, we implement Box-Cox transformation in normalizing regressors in microarray data. Pridictabilty of the model can be improved using data transformation compared to studentization. Keywords: Maximum likelihood estimates, moments for the ordered standard normal variates, normality tests, Shapiro-Wilk W statistic. 1. Scientific background: Colon cancer Colon cancer is one of the most common types of cancer with over 90,000 people diagnosed every year in USA. The cause of the onset of colon cancer is unknown although tumors are known to develop from polyps, which are extra tissue growths. Polyps maybe present in the colon for years prior to the evolution into cancer. Identification (via screening) and removal of polyps can prevent the onset of colon cancer although small polyps present in the colon do not always cause individuals any problems. Symptoms of large polyps and colon cancer include bleeding from the lower GI (gastrointestinal) tract and changes in bowel habits. 2. Discovery and prediction of tumors using microarry analysis Current microarray technology is able to take a single tissue sample to construct an Affymetrix oglionucleotide array containing (estimated) expression levels of thousands of different genes for that tissue. General objective is to develop a more systematic approach to cancer classification based on the simulataneous expression monitoring of thousands of genes using Affymetrix oglionucleotide microarrays. 1 Corresponding author: Professor, Department of Mathematics and Statistics, Minnesota State Uni- versity, Mankato, MN 56001, USA. E-mail: mezbahur.rahman@mnsu.edu. 2 Professor, Department of Mathematics and Statistics, Minnesota State University, Mankato, MN 56001, USA.