The Gambler’s Ruin Problem, Genetic Algorithms, and the Sizing of Populations George Harik Illinois Genetic Algorithms Laboratory University of Illinois Urbana, IL 61801 USA gharik@illigal.ge.uiuc.edu Erick Cant ´ u-Paz Illinois Genetic Algorithms Laboratory University of Illinois Urbana, IL 61801 USA cantupaz@illigal.ge.uiuc.edu David E. Goldberg Illinois Genetic Algorithms Laboratory University of Illinois Urbana, IL 61801 USA deg@illigal.ge.uiuc.edu Brad L. Miller I2 Technologies Boston, MA 02139 USA bmiller@technologist.com Abstract This paper presents a model to predict the convergence quality of genetic algorithms based on the size of the population. The model is based on an analogy between selection in GAs and one-dimensional random walks. Using the solution to a classic random walk problem—the gambler’s ruin—the model naturally incorporates previous knowledge about the initial supply of building blocks (BBs) and correct selection of the best BB over its competitors. The result is an equation that relates the size of the population with the desired quality of the solution, as well as the problem size and difﬁculty. The accuracy of the model is veriﬁed with experiments using additively decomposable functions of varying difﬁculty. The paper demonstrates how to adjust the model to account for noise present in the ﬁtness evaluation and for different tournament sizes. Keywords Population size, noise, decision making, building block supply. 1 Introduction The question of how to choose an adequate population size for a particular domain is difﬁcult and has puzzled practitioners for a long time. If the population is too small, it is not likely that the genetic algorithm (GA) will ﬁnd a good solution for the problem at hand. Therefore, it may appear reasonable that to ﬁnd solutions of high quality, the size of the populations must be increased as much as possible. However, if the population is too large, the GA will waste time processing unnecessary individuals, and this may result in unacceptably slow performance. The problem consists of ﬁnding a population size that is large enough to permit a correct exploration of the search space without wasting computational resources. The goal of this study is to provide a practical answer to the problem of ﬁnding suitable population sizes for particular domains. Hard questions are better approached using a divide-and-conquer strategy, and the population sizing issue is no exception. This paper identiﬁes two factors which depend on the population size and that inﬂuence the quality of the solutions that the GA may reach: the c 1999 by the Massachusetts Institute of Technology Evolutionary Computation 7(3): 231-253