Fisheries Research 107 (2011) 261–271
Contents lists available at ScienceDirect
Fisheries Research
journal homepage: www.elsevier.com/locate/fishres
Decreasing uncertainty in catch rate analyses using Delta-AdaBoost: An
alternative approach in catch and bycatch analyses with high percentage of zeros
Yan Li
∗
, Yan Jiao, Qing He
Department of Fisheries and Wildlife Sciences, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061-0321, USA
article info
Article history:
Received 5 July 2010
Received in revised form 27 October 2010
Accepted 9 November 2010
Keywords:
Delta model
AdaBoost
Tweedie distribution
Catch rate
Zero catch
abstract
The gillnet data of walleye (Sander vitreus), yellow perch (Perca flavescens), and white perch (Morone amer-
icana), collected by a fishery-independent survey (Lake Eire Partnership Index Fishing Survey, PIS) from
1989 to 2008, contained 75–83% of zero observations. AdaBoost algorithm was applied to the model
analyses with such fishery data for each species. The 3- and 5-fold cross-validations were conducted
to evaluate the performance of each candidate model. The performance of the delta model consisting
of one generalized additive model and one AdaBoost model (Delta-AdaBoost) was compared with five
candidate models. The five candidate models included: the delta model comprising two generalized lin-
ear models (Delta-GLM), the delta model comprising two generalized linear models with polynomial
terms up to degree 3 (Delta-GLM-Poly), the delta model comprising two generalized additive models
(Delta-GAM), the generalized linear model with Tweedie distribution (GLM-Tweedie), and the general-
ized additive model with Tweedie distribution (GAM-Tweedie). To predict the presence/absence of fish
species, the performance of AdaBoost model was compared in terms of error rate with conventional
generalized linear and additive models assuming a binomial distribution. Results from 3- and 5-fold
cross-validation indicated that Delta-AdaBoost model yielded the smallest training error (0.431–0.433
for walleye, 0.528–0.519 for yellow perch and 0.251 for white perch) and test error (0.435–0.436 for
walleye, 0.524 for yellow perch and 0.254–0.255 for white perch) on average, followed by Delta-GLM-
Poly model for yellow perch and white perch, and Delta-GAM model for walleye. In the prediction of the
presence/absence of fish species, AdaBoost model had the lowest error rate, compared with generalized
linear and additive models. We suggested AdaBoost algorithm to be an alternative to deal with the high
percentage of zero observations in the catch and bycatch analyses in fisheries studies.
© 2010 Elsevier B.V. All rights reserved.
1. Introduction
Catch and bycatch rate estimations play an indispensable role in
fish stock assessment and management (Gunderson, 1993; Helser
and Hayes, 1995; Maunder and Punt, 2004). Various methods have
been developed to estimate the catch and bycatch rates for a spe-
cific fishery. The commonly used methods include the ratio method,
which determines catch rates relative to a standard value (Beverton
and Holt, 1957); the generalized linear model, which incorpo-
rates multiple variables to describe the environmental and fishing
effects (Gavaris, 1980; Kimura, 1981); and the generalized additive
model, which demonstrates the nonlinear relationship between
the catch/bycatch rate and explanatory variables through a smooth
function (Bigelow et al., 1999; Damalas et al., 2007). However, these
∗
Corresponding author at: Department of Fisheries and Wildlife Sciences, Vir-
ginia Polytechnic Institute and State University, 100 Cheatham Hall, Blacksburg, VA
24061-0321, USA. Tel.: +1 540 8088373.
E-mail address: yanli08@vt.edu (Y. Li).
methods have difficulties in dealing with the highly skewed data
where a large amount of zeros are included. Such data are fre-
quently encountered in the catch analyses of rare species and the
bycatch analyses (Maunder and Punt, 2004; Ortiz et al., 2000). The
presence of zeros may invalidate the assumptions of normality we
usually use, and may cause computational difficulties.
Ignorance of a considerable proportion of zeros may result in a
loss of information that reflects the spatial or temporal distribu-
tion characteristics of fish stocks. Two types of approaches have
been applied in previous studies to deal with zeros in fishery data
analyses. One approach is to add a small constant to each zero
observation of the response variable, followed by a generalized
linear or additive model analysis (Maunder and Punt, 2004; Ortiz
et al., 2000; Shono, 2008). However, the estimation results are sen-
sitive to the choice of the constant (Maunder and Punt, 2004; Ortiz
et al., 2000). The other approach is to utilize the delta model and
the Tweedie distribution model. In the delta model, the positive
values are fitted by a generalized linear or additive model, and
the probabilities of observing zero values are fitted by a gener-
alized linear or additive model with an assumption of binomial
0165-7836/$ – see front matter © 2010 Elsevier B.V. All rights reserved.
doi:10.1016/j.fishres.2010.11.008