The Gambler’s Ruin Problem, Genetic Algorithms, and the Sizing of Populations George Harik Illinois Genetic Algorithms Laboratory University of Illinois Urbana, IL 61801 USA gharik@illigal.ge.uiuc.edu Erick Cant ´ u-Paz Illinois Genetic Algorithms Laboratory University of Illinois Urbana, IL 61801 USA cantupaz@illigal.ge.uiuc.edu David E. Goldberg Illinois Genetic Algorithms Laboratory University of Illinois Urbana, IL 61801 USA deg@illigal.ge.uiuc.edu Brad L. Miller I2 Technologies Boston, MA 02139 USA bmiller@technologist.com Abstract This paper presents a model to predict the convergence quality of genetic algorithms based on the size of the population. The model is based on an analogy between selection in GAs and one-dimensional random walks. Using the solution to a classic random walk problem—the gambler’s ruin—the model naturally incorporates previous knowledge about the initial supply of building blocks (BBs) and correct selection of the best BB over its competitors. The result is an equation that relates the size of the population with the desired quality of the solution, as well as the problem size and difficulty. The accuracy of the model is verified with experiments using additively decomposable functions of varying difficulty. The paper demonstrates how to adjust the model to account for noise present in the fitness evaluation and for different tournament sizes. Keywords Population size, noise, decision making, building block supply. 1 Introduction The question of how to choose an adequate population size for a particular domain is difficult and has puzzled practitioners for a long time. If the population is too small, it is not likely that the genetic algorithm (GA) will find a good solution for the problem at hand. Therefore, it may appear reasonable that to find solutions of high quality, the size of the populations must be increased as much as possible. However, if the population is too large, the GA will waste time processing unnecessary individuals, and this may result in unacceptably slow performance. The problem consists of finding a population size that is large enough to permit a correct exploration of the search space without wasting computational resources. The goal of this study is to provide a practical answer to the problem of finding suitable population sizes for particular domains. Hard questions are better approached using a divide-and-conquer strategy, and the population sizing issue is no exception. This paper identifies two factors which depend on the population size and that influence the quality of the solutions that the GA may reach: the c 1999 by the Massachusetts Institute of Technology Evolutionary Computation 7(3): 231-253