How Learning Can Aﬀect the Course of Evolution in Dynamic Environments Reiji SUZUKI Takaya ARITA Graduate School of Human Informatics, Nagoya University Furo-cho, Chikusa-ku, Nagoya 464-8601, Japan E-mail: {reiji, ari}@info.human.nagoya-u.ac.jp Abstract The Baldwin eﬀect is known as interactions between learning and evolution, which suggests that individ- ual lifetime learning can inﬂuence the course of evo- lution without the Lamarckian mechanism. Our con- cern is to consider the Baldwin eﬀect in dynamic en- vironments, especially when there is no explicit opti- mal solution through generations and it depends only on interactions among agents. We adopt the iterated Prisoner’s Dilemma as a dynamic environment and in- troduce phenotypic plasticity to strategies by using a meta-learning rule termed “Meta-Pavlov”. In this sim- ulation, the Baldwin eﬀect was observed as follows: First, strategies with enough plasticity spread, which caused a shift from defective population to coopera- tive. Second, these strategies were replaced by the strategy [x00x], which has a modest amount of plas- ticity. Keywords: baldwin eﬀect, learning and evolution, the iterated prisoner’s dilemma, artiﬁcial life. 1 Introduction There have been a lot of discussions about inter- actions between learning and evolution. The Baldwin eﬀect [1] is one of them, which suggests that individual lifetime learning can inﬂuence the course of evolution without the Lamarckian mechanism. This eﬀect has come to the attention recently not only of biologists, but also of the computer scientists with the evolution- ary simulation of Hinton and Nowlan [2]. Since Hinton and Nowlan, many studies have been conducted, most of which have discussed the eﬀect on the assumption that environments are static and the optimal solution is ﬁxed. However, as we see in the real world, learning could be more eﬀective and utilized in dynamic environ- ments, because the ﬂexibility of plasticity itself is ad- vantageous to adapt ourselves to the changing world. Therefore, it is very important to examine how learn- ing can aﬀect the course of evolution in dynamic envi- ronments. Our objective is to clarify the function and the mechanism of the Baldwin eﬀect in dynamic environ- ments focusing on balances between beneﬁt and cost of learning, while most of the studies concerning the Baldwin eﬀect have aimed at the static environments. In general, dynamic environments can be divided typ- ically into the following two types: the environments in which the optimal solution is changed as the envi- ronment changes, and the ones in which each agent’s ﬁtness is decided by interactions with other agents. As the former type of environments, Anderson [3] quantitatively analyzed how learning aﬀects evolution- ary process in the dynamic environment whose opti- mal solution changes through generations. Sasaki and Tokoro [4] studied the relationship between learning and evolution using a simple model, where individu- als learn to distinguish poison and food by modifying the connective weights of neural network. These stud- ies emphasized the importance of learning in dynamic environments. We adopted the iterated Prisoner’s Dilemma (IPD) as the latter type of environments, where there is no explicit optimal solution through generations and ﬁt- ness of agents depends only on interactions among them. This paper describes the Baldwin eﬀect brieﬂy, explains our evolutionary model and discusses how this eﬀect was observed in the evolutionary experiments. 2 Background The Baldwin eﬀect explains interactions between learning and evolution by paying attention to bal- ances between beneﬁt and cost of learning. The Bald- win eﬀect consists of the following two steps (Turney, Whitley and Anderson [5]): In the ﬁrst step, lifetime learning (phenotypic plasticity) gives individual agents chances to change their phenotypes. If the learned traits are useful for agents and make their ﬁtness in- crease, they will spread in the next population. In the second step, if the environment is suﬃciently stable, the evolutionary path ﬁnds innate traits that can re- place learned traits, because of the cost of learning.