Binary Differential Evolution Strategies A.P. Engelbrecht, Member, IEEE G. Pampar´ a Abstract— Differential evolution has shown to be a very powerful, yet simple, population-based optimization approach. The nature of its reproduction operator limits its application to continuous-valued search spaces. However, a simple discretiza- tion procedure can be used to convert ﬂoating-point solution vectors into discrete-valued vectors. This paper considers three approaches in which differential evolution can be used to solve problems with binary-valued parameters. The ﬁrst approach is based on a homomorphous mapping [1], while the second approach interprets the ﬂoating-point solution vector as a vector of probabilities, used to decide on the appropriate binary value. The third approach normalizes solution vectors and then discretize these normalized vectors to form a bitstring. Empirical results are provided to illustrate the efﬁciency of both methods in comparison with particle swarm optimizers. I. I NTRODUCTION Differential evolution (DE) is a stochastic, population- based search strategy developed by Storn and Price [2], [3]. Its reproduction operator consists of a mutation step to create a trial vector, which is then used by the cross- over operator to produce one offspring. Mutation step sizes are calculated as weighted differences between randomly selected individuals. It is this reliance on difference vectors that makes DE applicable to optimization problems with continuous-valued parameters. The standard DE algorithms can not be applied as is to solve problems with binary-valued parameters. Although DE was developed for optimizing continuous- valued parameters, discretization methods have been applied to the ﬂoating-point solution vectors to transform these vectors into discrete-valued vectors. Such a procedure has been used for solving integer and mixed-integer program- ming problems [4]–[9]. The discretization process is quite simple: each ﬂoating-point value of a solution vector is simply rounded to the nearest integer. For problems where an ordering exists among the values of a parameter, the index number in the ordered sequence is used as the discretized value [8]. This paper presents and evaluate three approaches to use DE to optimize binary-valued parameters: The angle modulated DE (AMDE) [1] uses the standard DE to evolve a bitstring generating function. The binary DE (binDE) treats each ﬂoating-point component of a solution vector as a probability of producing either bit 0 or bit 1. Lastly, the normalization DE (normDE) ﬁrst normalizes each solution vector such that all components are in the range [0, 1], and then produces a bitstring by using bit zero if the normalized component is less than 0.5; otherwise, bit 1 is used. A.P. Engelbrecht and G. Pampar´ a are both with the Department of Computer Science, University of Pretoria, South Africa (email: {engel,gpampara}@cs.up.ac.za The rest of this paper is organized as follows: A short overview of DE is given in Section II. The AMDE is described in Section III, while the binDE is presented in Section IV. The normalization approach is described in short in Section V. Results are presented and discussed in Section VI. II. DIFFERENTIAL EVOLUTION Contrary to more common EAs, the reproduction operator used in DE does not depend on some probability density function where elements in the individual are perturbed. The DE uses an arithmetic operator which alters the internal representation of individuals to generate deviations. The generated deviated vector, also known as a trial vector, is evaluated, and if the resulting ﬁtness is better than the main parent, then the newly generated individual replaces its main parent. For each individual, x i (t), of the popolation at generation t, generate a trial vector, x  i (t), as follows: Let x i (t) be the main parent. Then, select randomly from the population three other individuals, x i1 (t), x i2 (t) and x i3 (t), with i 1 = i 2 = i 3 = i, and i 1 ,i 2 ,i 3 ∼ U (1,...,s), where s is the population size. Select a random number, r ∼ U (1,...,n x ), where n x is the number of genes (or parameters to be optimized) of a single chromosome. Then for all parameters, j =1,...,n x , if U (0, 1) <P r , or if j = r, let x  ij (t)= x i3j (t)+ F × (x i1j (t) − x i2j (t)) (1) Otherwise, let x  ij (t)= x ij (t) (2) If f (x  i (t)) is better than f (x i (t)), then the latter is replaced with the offspring. In the above, P r is the probability of reproduction (with P r ∈ [0, 1]), F is a scaling factor (with F ∈ (0, ∞)), and x  ij (t) and x ij (t) respectively indicate the j -th parameter of the offspring and the main parent. It is important to note that the three individuals used in equation (1) are randomly selected with no bias towards more ﬁt individuals. Each individual has an equal chance of being selected. Price and Storn proposed a number of different DE strate- gies [10], [11], based on the individual being perturbed, and the number of weighted difference vectors used in equation (1). The strategy described above is denoted as DE/rand/1, meaning that the vector to be perturbed is randomly selected, and that only one difference vector is included. Other strate- gies include: • DE/best/1, where the individual to be perturbed is selected as the best performing individual, ˆ x of the 1942 1-4244-1340-0/07/$25.00 c 2007 IEEE