Fisheries Research 107 (2011) 261–271 Contents lists available at ScienceDirect Fisheries Research journal homepage: www.elsevier.com/locate/fishres Decreasing uncertainty in catch rate analyses using Delta-AdaBoost: An alternative approach in catch and bycatch analyses with high percentage of zeros Yan Li , Yan Jiao, Qing He Department of Fisheries and Wildlife Sciences, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061-0321, USA article info Article history: Received 5 July 2010 Received in revised form 27 October 2010 Accepted 9 November 2010 Keywords: Delta model AdaBoost Tweedie distribution Catch rate Zero catch abstract The gillnet data of walleye (Sander vitreus), yellow perch (Perca flavescens), and white perch (Morone amer- icana), collected by a fishery-independent survey (Lake Eire Partnership Index Fishing Survey, PIS) from 1989 to 2008, contained 75–83% of zero observations. AdaBoost algorithm was applied to the model analyses with such fishery data for each species. The 3- and 5-fold cross-validations were conducted to evaluate the performance of each candidate model. The performance of the delta model consisting of one generalized additive model and one AdaBoost model (Delta-AdaBoost) was compared with five candidate models. The five candidate models included: the delta model comprising two generalized lin- ear models (Delta-GLM), the delta model comprising two generalized linear models with polynomial terms up to degree 3 (Delta-GLM-Poly), the delta model comprising two generalized additive models (Delta-GAM), the generalized linear model with Tweedie distribution (GLM-Tweedie), and the general- ized additive model with Tweedie distribution (GAM-Tweedie). To predict the presence/absence of fish species, the performance of AdaBoost model was compared in terms of error rate with conventional generalized linear and additive models assuming a binomial distribution. Results from 3- and 5-fold cross-validation indicated that Delta-AdaBoost model yielded the smallest training error (0.431–0.433 for walleye, 0.528–0.519 for yellow perch and 0.251 for white perch) and test error (0.435–0.436 for walleye, 0.524 for yellow perch and 0.254–0.255 for white perch) on average, followed by Delta-GLM- Poly model for yellow perch and white perch, and Delta-GAM model for walleye. In the prediction of the presence/absence of fish species, AdaBoost model had the lowest error rate, compared with generalized linear and additive models. We suggested AdaBoost algorithm to be an alternative to deal with the high percentage of zero observations in the catch and bycatch analyses in fisheries studies. © 2010 Elsevier B.V. All rights reserved. 1. Introduction Catch and bycatch rate estimations play an indispensable role in fish stock assessment and management (Gunderson, 1993; Helser and Hayes, 1995; Maunder and Punt, 2004). Various methods have been developed to estimate the catch and bycatch rates for a spe- cific fishery. The commonly used methods include the ratio method, which determines catch rates relative to a standard value (Beverton and Holt, 1957); the generalized linear model, which incorpo- rates multiple variables to describe the environmental and fishing effects (Gavaris, 1980; Kimura, 1981); and the generalized additive model, which demonstrates the nonlinear relationship between the catch/bycatch rate and explanatory variables through a smooth function (Bigelow et al., 1999; Damalas et al., 2007). However, these Corresponding author at: Department of Fisheries and Wildlife Sciences, Vir- ginia Polytechnic Institute and State University, 100 Cheatham Hall, Blacksburg, VA 24061-0321, USA. Tel.: +1 540 8088373. E-mail address: yanli08@vt.edu (Y. Li). methods have difficulties in dealing with the highly skewed data where a large amount of zeros are included. Such data are fre- quently encountered in the catch analyses of rare species and the bycatch analyses (Maunder and Punt, 2004; Ortiz et al., 2000). The presence of zeros may invalidate the assumptions of normality we usually use, and may cause computational difficulties. Ignorance of a considerable proportion of zeros may result in a loss of information that reflects the spatial or temporal distribu- tion characteristics of fish stocks. Two types of approaches have been applied in previous studies to deal with zeros in fishery data analyses. One approach is to add a small constant to each zero observation of the response variable, followed by a generalized linear or additive model analysis (Maunder and Punt, 2004; Ortiz et al., 2000; Shono, 2008). However, the estimation results are sen- sitive to the choice of the constant (Maunder and Punt, 2004; Ortiz et al., 2000). The other approach is to utilize the delta model and the Tweedie distribution model. In the delta model, the positive values are fitted by a generalized linear or additive model, and the probabilities of observing zero values are fitted by a gener- alized linear or additive model with an assumption of binomial 0165-7836/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.fishres.2010.11.008