A mixture model-based cluster analysis of DNA microarray gene expression data on Brahman and Brahman composite steers fed high-, medium-, and low-quality diets 1 A. Reverter* 2 , K. A. Byrne*, H. L. Bruce†, Y. H. Wang*, B. P. Dalrymple*, and S. A. Lehnert* Cooperative Research Centre for Cattle and Beef Quality, *CSIRO Livestock Industries, Queensland Bioscience Precinct, St Lucia, Queensland 4067, Australia; †Food Science Australia, Tingalpa DC, Queensland D 4173, Australia ABSTRACT: The objective of this study is to explore aspects of the statistical analysis of gene expression response at the muscle tissue level to varying levels of energy and protein in the diet. Eleven Brahman and Brahman composite steers (weighing 302 ± 9.8 kg, on average) were allocated randomly into high- (HIGH), medium- (MED), and low- (LOW) quality forage diets for 27 d. After this period, a biopsy of the longissimus dorsi muscle was taken from each animal and total RNA was extracted to generate the labeled target for microarray experimentation. These targets were hy- bridized to a complementary DNA (cDNA) microarray of 9,274 probes from cattle muscle and subcutaneous fat cDNA libraries. After edits, 151,904 expression in- tensity levels of 4,747 genes were analyzed. Emphasis was given to the choice of power transformation of the intensity channel readings and to the consistency of Key Words: Beef, Complementary DNA, Gene Expression, Maximum Likelihood, Statistical Analysis 2003 American Society of Animal Science. All rights reserved. J. Anim. Sci. 2003. 81:1900–1910 Introduction Gene expression technology is becoming increasingly accessible to animal scientists. The expectation is that these techniques will contribute to a greater under- 1 The authors acknowledge the following funding bodies: the Coop- erative Research Centre for Cattle and Beef Quality and its core partners, The University of New England, NSW Agriculture, CSIRO, and Queensland DPI. All CRC participants, both scientists and tech- nical staff, who contributed to or supported the work, including those involved in animal management, data collection, laboratory analyses, and data handling are gratefully acknowledged. The assistance of A. Day and B. van den Heuvel during collection of blood samples is gratefully acknowledged. The authors wish to thank P. Allingham for performing the tissue biopsies and B. Hunter for providing the animals. 2 Correspondence—phone: +61-7-3214-2392; fax: +61-7-3214-2881; E-mail: Tony.Reverter-Gomez@csiro.au. Received December 19, 2002. Accepted April 16, 2002. 1900 readings within each diet quality group. The statistical approach to isolate differentially expressed genes was based on model-based clustering via a mixture of nor- mal distributions estimated through maximal likeli- hood. The base-2 logarithm was found to be the optimal power transformation to normalize gene intensity lev- els. A two-sample t-statistic was defined as a measure of possible differential expression. For each of the three diet contrasts, HIGH vs. LOW, HIGH vs. MED, and MED vs. LOW, three clusters were found, two of which contained more than 94% genes with almost no altered gene expression levels, whereas the third cluster con- tained the remaining genes with a differential expres- sion. Results from the HIGH vs. LOW contrast identi- fied 27 genes with a greater than 95% posterior proba- bility of belonging to the cluster of differentially expressed genes. standing of the genetic basis of economically important traits. Moody (2001) provides a review of techniques for evaluating gene expression in livestock species. Re- cently, Lee and Hossner (2002) used PCR products to demonstrate that a nutrient challenge can stimulate or inhibit the expression of various well-known candidate genes involved with lipogenesis and adipose tissue me- tabolism. One area of intensive development is the analysis and interpretation of the large data sets generated by these techniques and in particular complementary DNA (cDNA) microarray. Novel statistical challenges are presented because microarray data are very high dimensional with very little replication. In a typical experiment, the expression of a number of genes rang- ing anywhere from 1,000 to over 20,000 could be ex- plored. However, RNA samples from only a handful of experimental animals may be available. The Cooperative Research Centre for Cattle and Beef Quality (Beef CRC) undertook gene expression profiling