34 The Open Applied Informatics Journal, 2009, 3, 34-43 1874-1363/09 2009 Bentham Open. Open Access Classification of Trends in Dose-Response Microarray Experiments Using Information Theory Selection Methods D. Lin *,1 , Z. Shkedy 1 , T. Burzykowski 1 , M. Aerts 1 , H. W. H. Gohlmann 2 , A. De Bondt 2 , T. Perera 2 , T. Geerts 2 , I. Van den Wyngaert 2 and L. Bijnens 2 1 Hasselt University, I-Biostat, Agoralaan 1, Building D, 3590, Deipenbeek, Belgium 2 Johnson & Johnson PR&D, Turnhoutseweg 30, 2340, Beerse, Belgium Abstract: Dose-response microarray experiments consist of monitoring expression levels of thousands of genes with respect to increasing dose of the treatment under investigation. The primary goal of such an experiment is to establish a dose-response relationship, while the secondary goals are to determine the minimum effective dose level and to identify the shape of the dose-response curve. Recently, Lin et al. [1] discussed several testing procedures to test for monotone trend based on isotonic regression of the observed means [2,3]. Once a monotone relationship between the gene expression and dose is established, there is a set of R possible monotone models that can be fitted to the data. A selection of the best model from this set allows us to identify both the shape of dose-response curve and the minimum effective dose level. In this paper we focus on classification of dose-response curve shapes using the information theory model selection. In particular, the Order Restricted Information Criterion (ORIC) is discussed for the inference under order restriction. The posterior probability of the model is calculated using information criteria that take into account both the goodness-of-fit and the complexity of the models. The method is applied to a dose-response microarray experiment with 12 arrays (for three samples at each of the four dose levels) with 16,998 genes. Keywords: Dose-response curve, microarray, classification, information theory, model selection, ORIC. 1. INTRODUCTION A common experiment in the early drug development is a dose-response study that is set up to assess the biological activity of a chemical compound. In such a study, the response of primary interest is measured at several increasing dose levels. Typically, the first dose level is a control group with zero dose. In recent years, dose-response studies were extended to the microarray setting, in which the arrays are administered to measure expression of thousands of genes. The goal of the experiment is to identify genes whose expressions are affected by dose. Recently, Lin et al. [1] discussed several testing procedures, namely Williams' [4,5], Marcus' [6], the global likelihood ratio test [2], M [7], and the modified M [1] that can be used to identify genes with a monotonic relationship between gene expression and doses. In this paper, we follow up the investigation on evaluating specific monotonic trend of dose response relationship based on the genes selected by one of the procedures mentioned above. The question of primary interest is the nature (or the curve shape) of the dose-response relationship. This question is closely related to the problem of determination of the minimum effective dose (MED) - that is the smallest dose, at which the mean response is shifted from the mean of dose zero [8,9]. Several testing procedures were proposed for finding the MED. For *Address correspondence to this author at the Office: D 52, Center for Statistics, Hasselt University, Agoralaan - Building D, 3590 Diepenbeek, Belgium; Tel: +32-11-268286; Fax: +32-11-268299; E-mail: dan.lin@uhasselt.be example, Williams [4,5] proposed a step-down procedure, in which tests are performed sequentially from the highest to the lowest dose level. The procedure stops at the first dose level, for which the null hypothesis (of no dose effect) is not rejected. As a result, the MED is the first dose above that dose level. Other test procedures, proposed by Tamhane et al. [9], are based on contrasts among the sample means of gene expressions at different dose levels. Note that Williams' procedure assumes monotonicity of the dose-response relationship, while the tests based on contrasts of the sample means do not require this assumption. In the microarray setting, the testing procedures mentioned above are additionally subject to the multiplicity problem. To avoid the multiple testing issue in determining the MED and in identifying the shape of the dose-response curve, we propose to classify possible dose-response trends using model selection based on information theory. Assuming a monotone relationship, the dose-response curve could be either linear, nonlinear, concave or convex. Furthermore, for an experiment with K +1 dose levels, there is a fixed number of monotonic models that can be fitted. For instance, in a dose-response experiment with four dose levels, upon the establishment of a monotonic relationship between gene expression and doses, there is a set of seven models, shown in Table 1 and Fig. (1), that can be fitted to the data. Each model can be associated with a MED. For example, g 1 is a model with two parameters, and the MED is the last dose level. g 2 is also a model with two parameters, but the MED is the third dose level.