Everything You Always Wanted to Know about Copula
Modeling but Were Afraid to Ask
Christian Genest
1
and Anne-Catherine Favre
2
Abstract: This paper presents an introduction to inference for copula models, based on rank methods. By working out in detail a small,
fictitious numerical example, the writers exhibit the various steps involved in investigating the dependence between two random variables
and in modeling it using copulas. Simple graphical tools and numerical techniques are presented for selecting an appropriate model,
estimating its parameters, and checking its goodness-of-fit. A larger, realistic application of the methodology to hydrological data is then
presented.
DOI: 10.1061/ASCE1084-0699200712:4347
CE Database subject headings: Frequency analysis; Distribution functions; Risk management; Statistical models.
Introduction
Hydrological phenomena are often multidimensional and hence
require the joint modeling of several random variables. Tradition-
ally, the pairwise dependence between variables such as depth,
volume, and duration of flows has been described using classical
families of bivariate distributions. Perhaps the most common
models occurring in this context are the bivariate normal, log-
normal, gamma, and extreme-value distributions. The main limi-
tation of this approach is that the individual behavior of the two
variables or transformations thereof must then be characterized
by the same parametric family of univariate distributions.
Copula models, which avoid this restriction, are just begin-
ning to make their way into the hydrological literature; see, e.g.,
De Michele and Salvadori 2002, Favre et al. 2004, Salvadori
and De Michele 2004, and De Michele et al. 2005. Restricting
attention to the bivariate case for the sake of simplicity, the
copula approach to dependence modeling is rooted in a represen-
tation theorem due to Sklar 1959. The latter states that the joint
cumulative distribution function c.d.f. Hx , y of any pair X , Y
of continuous random variables may be written in the form
Hx, y = CFx, Gy, x, y R 1
where Fx and Gy=marginal distributions; and C:0,1
2
→ 0,1=copula.
While Sklar 1959 showed that C, F, and G are uniquely
determined when H is known, a valid model for X , Y arises from
Eq. 1 whenever the three “ingredients” are chosen from given
parametric families of distributions, viz.
F F
, G G
, C C
Thus, for example, F might be normal with bivariate parameter
= ,
2
; G might be gamma with parameter = , ; and C
might be taken from the Farlie–Gumbel–Morgenstern family of
copulas, defined for each -1,1 by
C
u, v = uv + uv1- u1- v, u, v 0,1 2
The main advantage provided to the hydrologist by this approach
is that the selection of an appropriate model for the dependence
between X and Y , represented by the copula, can then proceed
independently from the choice of the marginal distributions.
For an introduction to the theory of copulas and a large selec-
tion of related models, the reader may refer, e.g., to the mono-
graphs by Joe 1997 and Nelsen 1999, or to reviews such as
Frees and Valdez 1998 and Cherubini et al. 2004, in which
actuarial and financial applications are considered. While the the-
oretical properties of these objects are now fairly well understood,
inference for copula models is, to an extent, still under develop-
ment. The literature on the subject is yet to be collated, and most
of it is not written with the end user in mind, making it difficult to
decipher except for the most mathematically inclined.
The aim of this paper is to present, in the simplest terms pos-
sible, the successive steps required to build a copula model for
hydrological purposes. To this end, a fictitious data set of very
small size will be used to illustrate the diagnostic and inferential
tools currently available. Although intuition will be given for the
various techniques to be presented, emphasis will be put on their
implementation, rather than on their theoretical foundation.
Therefore, computations will be presented in more detail than
usual, at the expense of exhaustive mathematical exposition, for
which the reader will only be given appropriate references.
The pedagogical data set to be used throughout the paper is
introduced in the “Dependence and Ranks” section, where it will
be explained why statistical inference concerning dependence
structures should always be based on ranks. This will lead, in the
“Measuring Dependence” section, to the description of classical
nonparametric measures of dependence and tests of indepen-
dence. Exploratory tools for uncovering dependence and measur-
ing it will be reviewed in the “Additional Graphical Tools for
Detecting Dependence” section. Point and interval estimation for
1
Professor, Dépt. de mathématiques et de statistique, Univ. Laval,
Québec QC, Canada G1K 7P4.
2
Professor, Chaire en Hydrologie Statistique, INRS, Eau, Terre et
Environnement, Québec QC, Canada G1K 9A9.
Note. Discussion open until December 1, 2007. Separate discussions
must be submitted for individual papers. To extend the closing date by
one month, a written request must be filed with the ASCE Managing
Editor. The manuscript for this paper was submitted for review and pos-
sible publication on August 29, 2006; approved on August 29, 2006. This
paper is part of the Journal of Hydrologic Engineering, Vol. 12, No. 4,
July 1, 2007. ©ASCE, ISSN 1084-0699/2007/4-347–368/$25.00.
JOURNAL OF HYDROLOGIC ENGINEERING © ASCE / JULY/AUGUST 2007 / 347