A model for categorical length data from
groundfish surveys
Birgir Hrafnkelsson and Gunnar Stefánsson
Abstract: An extension of the multinomial model of counts is presented to account for overdispersion and different
correlation structure. Such models are needed in biological applications such as the analysis of length measurements
from surveys of heterogeneous populations used for assessments of marine resources. One of the goals of such a sur-
vey is to estimate the length distribution of each species within a particular area. Using data on Atlantic cod (Gadus
morhua) in Icelandic waters, it is demonstrated that the assumptions used in practice for categorical length data are
seriously violated. The length data on cod exhibit variances that are larger than those of the standard multinomial
model and correlation coefficients that are greater than those of the Dirichlet-multinomial model. To alleviate these
problems, a hierarchical model based on the multinomial distribution and the logistically transformed multivariate
Gaussian distribution is proposed. It is illustrated that this model captures the complex covariance structure of the data.
The parameters in the models are estimated using a Bayesian estimation procedure based on Markov chain Monte
Carlo.
Résumé : Nous présentons une extension du modèle multinomial des dénombrements qui tient compte de la surdisper-
sion et des structures de corrélation différentes. De tels modèles sont nécessaires dans certaines applications biologi-
ques dans l’évaluation des ressources marines, telles que l’analyse des mesures de longueur obtenues dans les
inventaires de populations hétérogènes. Un des objectifs d’un tel inventaire est de déterminer la distribution de fré-
quence des longueurs de chacune des espèces dans une région donnée. L’utilisation de données sur la morue (Gadus
morhua) des eaux islandaises nous a permis de démontrer que les présuppositions utilisées en pratique pour les don-
nées de longueur par catégories ne sont pas valides. Les données de longueur de la morue ont des variances qui sont
plus grandes que celles du modèle multinomial standard et des coefficients de corrélation supérieurs à ceux du modèle
multinomial avec distribution de Dirichlet. Afin de réduire ces problèmes, nous proposons un modèle hiérarchique basé
sur la distribution multinomiale et la distribution gauussienne multidimensionnelle avec transformation logistique. Nos
montrons que ce modèle reflète bien la structure complexe de covariance des données. Nous avons déterminé les paramè-
tres du modèle à l’aide d’une estimation bayésienne basée sur une simulation de Monte-Carlo par chaînes de Markov.
[Traduit par la Rédaction] Hrafnkelsson and Stefánsson 1142
Introduction
A major focus of marine research programmes is to obtain
information on the current state and historical development
of fish populations. To facilitate such research, enormous
efforts are undertaken to sample the fish populations in vari-
ous ways. The two most fundamental data sets obtained
from sampling of fish populations are the length measure-
ments of individual fish and biomass measurements (e.g.,
average catch per tow), each of which can be obtained from
marine surveys or commercial fisheries. Other data sets can
be highly important in individual situations, but at least one
of these two types is always a part of the analysis of fish
population dynamics.
The statistical aspects of abundance indices have been ex-
tensively documented (e.g., Pennington 1983; Jacobson et al.
1996; Stefánsson 1996). Length measurements of individual
fish are done on discrete scales (e.g., 1-cm or 1-mm group-
ings) and are therefore commonly analyzed as count data.
Although fundamental to stock assessment, the proper ties of
these data have not been extensively studied, and the data
sets have generally been analyzed using simple techniques.
This paper demonstrates that assumptions underlying these
techniques are seriously violated and methods are provided
to alleviate these problems.
Within models of fish population dynamics, it is common
practice to use either lognormal errors or a multinomial dis-
tribution when investigating numbers that by their nature are
counts or estimated counts. Examples include models for
catches in numbers at age (e.g., Gavaris 1980; Gudmundsson
1994) and models for the frequency of fish in a given length
group (e.g., MacDonald and Pitcher 1979; Methot 2000). In-
Can. J. Fish. Aquat. Sci. 61: 1135–1142 (2004) doi: 10.1139/F04-049 © 2004 NRC Canada
1135
Received 6 March 2003. Accepted 13 January 2004. Published on the NRC Research Press Web site at http://cjfas.nrc.ca on
28 August 2004.
J17381
B. Hrafnkelsson.
1
Faculty of Engineering, University of Iceland, Hjardarhagi 2-6, 107 Reykjavík, Iceland.
G. Stefánsson. Marine Research Institute, Skúlagata 4, P.O. Box 1390, 121 Reykjavík, Iceland, and Faculty of Science, University
of Iceland, Hjardarhagi 2-6, 107 Reykjavík, Iceland.
1
Corresponding author (e-mail: birgirhr@hi.is).