Computational Statistics and Data Analysis 56 (2012) 656–663 Contents lists available at SciVerse ScienceDirect Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda Multivariate probit analysis of binary familial data using stochastic representations Yihao Deng a , Roy T. Sabo b, , N. Rao Chaganty c a Department of Mathematical Sciences, Indiana University Purdue University-Fort Wayne, Fort Wayne, IN 46805-1499, USA b Department of Biostatistics, Virginia Commonwealth University, Richmond, VA 23298-0032, USA c Department of Mathematics and Statistics, Old Dominion University, Norfolk, VA 23529-0077, USA article info Article history: Received 28 October 2010 Received in revised form 5 September 2011 Accepted 10 September 2011 Available online 18 September 2011 Keywords: Multivariate probit Stochastic representation Familial binary data Fisher information abstract The probit function is an alternative transformation to the logistic function in the analysis of binary data. However, use of the probit function is prohibitively complicated for cases of multivariate or repeated-measure binary responses, as integrations involving the multivariate normal distribution can be difficult to compute. In this paper, we propose an alternative form to stochastically represent random variables in the case of familial binary data that simplifies calculation of the multivariate normal integrals involved in the probit link. We provide examples of these stochastic representations for one- and two- parent families, and compare the performance of this methodology with that of moment estimators by calculating asymptotic relative efficiencies and through a real-life data example. Particular attention is paid to analyzing the properties of regression parameter estimates from these two methods with respect to the feasible ranges of the correlation parameters. Published by Elsevier B.V. 1. Introduction Familial data consisting of measurements on parents and children appear naturally in many fields, including biomedicine, psychology and social sciences. Since members within the same family can share similarities due to genetic inheritance or cohabitation, measurements on these subjects are usually dependent. Such data (or responses) can be either continuous (e.g. height and blood pressure) or discrete (e.g. presence or absence of a certain trait or disease). Generalized linear models are easily applied to repeated continuous responses; yet several difficulties arise when responses are discrete. In the latter case, correlation parameters are restricted by the Frechét bounds, which are dependent not only on the structured correlation matrix but also on the marginal means and covariates (Prentice, 1988; Chaganty and Joe, 2006). Though these bounds apply to all types of discrete data, they are most stringent for binary observations, which are ubiquitously used to represent qualitative measurements in all fields of science. Chaganty and Deng (2007) found ranges of familial dependence using several well-known association statistics in the case of binary response variables, and found that for even the simplest familial case of one parent and two children, the feasible correlation ranges can be prohibitively narrow. In most cases, there is no guarantee that correlation parameter estimates will lie within the restricted bounds, so specific methodologies are required to ensure feasibility. Ashford and Sowden (1970) proposed the multivariate probit model (MVP) for the analysis of binary variables, which assumes that a latent standard normal variable corresponds to each binary outcome. The binary outcome takes the value 1 Corresponding author. Tel.: +1 804 828 3047; fax: +1 804 828 8900. E-mail addresses: dengy@ipfw.edu (Y. Deng), rsabo@vcu.edu (R.T. Sabo), rchagant@odu.edu (N.R. Chaganty). 0167-9473/$ – see front matter. Published by Elsevier B.V. doi:10.1016/j.csda.2011.09.014