Everything You Always Wanted to Know about Copula Modeling but Were Afraid to Ask Christian Genest 1 and Anne-Catherine Favre 2 Abstract: This paper presents an introduction to inference for copula models, based on rank methods. By working out in detail a small, ﬁctitious numerical example, the writers exhibit the various steps involved in investigating the dependence between two random variables and in modeling it using copulas. Simple graphical tools and numerical techniques are presented for selecting an appropriate model, estimating its parameters, and checking its goodness-of-ﬁt. A larger, realistic application of the methodology to hydrological data is then presented. DOI: 10.1061/ASCE1084-0699200712:4347 CE Database subject headings: Frequency analysis; Distribution functions; Risk management; Statistical models. Introduction Hydrological phenomena are often multidimensional and hence require the joint modeling of several random variables. Tradition- ally, the pairwise dependence between variables such as depth, volume, and duration of ﬂows has been described using classical families of bivariate distributions. Perhaps the most common models occurring in this context are the bivariate normal, log- normal, gamma, and extreme-value distributions. The main limi- tation of this approach is that the individual behavior of the two variables or transformations thereof must then be characterized by the same parametric family of univariate distributions. Copula models, which avoid this restriction, are just begin- ning to make their way into the hydrological literature; see, e.g., De Michele and Salvadori 2002, Favre et al. 2004, Salvadori and De Michele 2004, and De Michele et al. 2005. Restricting attention to the bivariate case for the sake of simplicity, the copula approach to dependence modeling is rooted in a represen- tation theorem due to Sklar 1959. The latter states that the joint cumulative distribution function c.d.f. Hx , y of any pair X , Y  of continuous random variables may be written in the form Hx, y = CFx, Gy, x, y  R 1 where Fx and Gy=marginal distributions; and C:0,1 2 → 0,1=copula. While Sklar 1959 showed that C, F, and G are uniquely determined when H is known, a valid model for X , Y  arises from Eq. 1 whenever the three “ingredients” are chosen from given parametric families of distributions, viz. F  F  , G  G  , C  C   Thus, for example, F might be normal with bivariate parameter  =  ,  2 ; G might be gamma with parameter  =  , ; and C might be taken from the Farlie–Gumbel–Morgenstern family of copulas, deﬁned for each   -1,1 by C  u, v = uv + uv1- u1- v, u, v  0,1 2 The main advantage provided to the hydrologist by this approach is that the selection of an appropriate model for the dependence between X and Y , represented by the copula, can then proceed independently from the choice of the marginal distributions. For an introduction to the theory of copulas and a large selec- tion of related models, the reader may refer, e.g., to the mono- graphs by Joe 1997 and Nelsen 1999, or to reviews such as Frees and Valdez 1998 and Cherubini et al. 2004, in which actuarial and ﬁnancial applications are considered. While the the- oretical properties of these objects are now fairly well understood, inference for copula models is, to an extent, still under develop- ment. The literature on the subject is yet to be collated, and most of it is not written with the end user in mind, making it difﬁcult to decipher except for the most mathematically inclined. The aim of this paper is to present, in the simplest terms pos- sible, the successive steps required to build a copula model for hydrological purposes. To this end, a ﬁctitious data set of very small size will be used to illustrate the diagnostic and inferential tools currently available. Although intuition will be given for the various techniques to be presented, emphasis will be put on their implementation, rather than on their theoretical foundation. Therefore, computations will be presented in more detail than usual, at the expense of exhaustive mathematical exposition, for which the reader will only be given appropriate references. The pedagogical data set to be used throughout the paper is introduced in the “Dependence and Ranks” section, where it will be explained why statistical inference concerning dependence structures should always be based on ranks. This will lead, in the “Measuring Dependence” section, to the description of classical nonparametric measures of dependence and tests of indepen- dence. Exploratory tools for uncovering dependence and measur- ing it will be reviewed in the “Additional Graphical Tools for Detecting Dependence” section. Point and interval estimation for 1 Professor, Dépt. de mathématiques et de statistique, Univ. Laval, Québec QC, Canada G1K 7P4. 2 Professor, Chaire en Hydrologie Statistique, INRS, Eau, Terre et Environnement, Québec QC, Canada G1K 9A9. Note. Discussion open until December 1, 2007. Separate discussions must be submitted for individual papers. To extend the closing date by one month, a written request must be ﬁled with the ASCE Managing Editor. The manuscript for this paper was submitted for review and pos- sible publication on August 29, 2006; approved on August 29, 2006. This paper is part of the Journal of Hydrologic Engineering, Vol. 12, No. 4, July 1, 2007. ©ASCE, ISSN 1084-0699/2007/4-347–368/$25.00. JOURNAL OF HYDROLOGIC ENGINEERING © ASCE / JULY/AUGUST 2007 / 347