Everything You Always Wanted to Know about Copula Modeling but Were Afraid to Ask Christian Genest 1 and Anne-Catherine Favre 2 Abstract: This paper presents an introduction to inference for copula models, based on rank methods. By working out in detail a small, fictitious numerical example, the writers exhibit the various steps involved in investigating the dependence between two random variables and in modeling it using copulas. Simple graphical tools and numerical techniques are presented for selecting an appropriate model, estimating its parameters, and checking its goodness-of-fit. A larger, realistic application of the methodology to hydrological data is then presented. DOI: 10.1061/ASCE1084-0699200712:4347 CE Database subject headings: Frequency analysis; Distribution functions; Risk management; Statistical models. Introduction Hydrological phenomena are often multidimensional and hence require the joint modeling of several random variables. Tradition- ally, the pairwise dependence between variables such as depth, volume, and duration of flows has been described using classical families of bivariate distributions. Perhaps the most common models occurring in this context are the bivariate normal, log- normal, gamma, and extreme-value distributions. The main limi- tation of this approach is that the individual behavior of the two variables or transformations thereofmust then be characterized by the same parametric family of univariate distributions. Copula models, which avoid this restriction, are just begin- ning to make their way into the hydrological literature; see, e.g., De Michele and Salvadori 2002, Favre et al. 2004, Salvadori and De Michele 2004, and De Michele et al. 2005. Restricting attention to the bivariate case for the sake of simplicity, the copula approach to dependence modeling is rooted in a represen- tation theorem due to Sklar 1959. The latter states that the joint cumulative distribution function c.d.f.Hx , yof any pair X , Y of continuous random variables may be written in the form Hx, y= CFx, Gy, x, y R 1 where Fxand Gy=marginal distributions; and C:0,1 2 0,1=copula. While Sklar 1959showed that C, F, and G are uniquely determined when H is known, a valid model for X , Y arises from Eq. 1whenever the three “ingredients” are chosen from given parametric families of distributions, viz. F F , G G , C C Thus, for example, F might be normal with bivariateparameter = , 2 ; G might be gamma with parameter = , ; and C might be taken from the Farlie–Gumbel–Morgenstern family of copulas, defined for each -1,1by C u, v= uv + uv1- u1- v, u, v 0,12 The main advantage provided to the hydrologist by this approach is that the selection of an appropriate model for the dependence between X and Y , represented by the copula, can then proceed independently from the choice of the marginal distributions. For an introduction to the theory of copulas and a large selec- tion of related models, the reader may refer, e.g., to the mono- graphs by Joe 1997and Nelsen 1999, or to reviews such as Frees and Valdez 1998and Cherubini et al. 2004, in which actuarial and financial applications are considered. While the the- oretical properties of these objects are now fairly well understood, inference for copula models is, to an extent, still under develop- ment. The literature on the subject is yet to be collated, and most of it is not written with the end user in mind, making it difficult to decipher except for the most mathematically inclined. The aim of this paper is to present, in the simplest terms pos- sible, the successive steps required to build a copula model for hydrological purposes. To this end, a fictitious data set of very small size will be used to illustrate the diagnostic and inferential tools currently available. Although intuition will be given for the various techniques to be presented, emphasis will be put on their implementation, rather than on their theoretical foundation. Therefore, computations will be presented in more detail than usual, at the expense of exhaustive mathematical exposition, for which the reader will only be given appropriate references. The pedagogical data set to be used throughout the paper is introduced in the “Dependence and Ranks” section, where it will be explained why statistical inference concerning dependence structures should always be based on ranks. This will lead, in the “Measuring Dependence” section, to the description of classical nonparametric measures of dependence and tests of indepen- dence. Exploratory tools for uncovering dependence and measur- ing it will be reviewed in the “Additional Graphical Tools for Detecting Dependence” section. Point and interval estimation for 1 Professor, Dépt. de mathématiques et de statistique, Univ. Laval, Québec QC, Canada G1K 7P4. 2 Professor, Chaire en Hydrologie Statistique, INRS, Eau, Terre et Environnement, Québec QC, Canada G1K 9A9. Note. Discussion open until December 1, 2007. Separate discussions must be submitted for individual papers. To extend the closing date by one month, a written request must be filed with the ASCE Managing Editor. The manuscript for this paper was submitted for review and pos- sible publication on August 29, 2006; approved on August 29, 2006. This paper is part of the Journal of Hydrologic Engineering, Vol. 12, No. 4, July 1, 2007. ©ASCE, ISSN 1084-0699/2007/4-347–368/$25.00. JOURNAL OF HYDROLOGIC ENGINEERING © ASCE / JULY/AUGUST 2007 / 347