Evolution of metamemory ability by artificial neural networks with neuromodulation Yusuke Yamato 1 , Reiji Suzuki 1 and Takaya Arita 1 1 Graduate School of Informatics, Nagoya University, yamato@alife.cs.i.nagoya-u.ac.jp Introduction Humans know whether, or how well, certain knowledge ex- ists in their own memory. This subjective monitoring and control of one’s memory, metamemory, has been studied widely as a type of metacognition in cognitive psychology. This study, as a constructive approach, aims to evolve ar- tificial neural networks that have a metamemory function. For this purpose, we use evolved plastic artificial neural net- works (EPANNs) (Soltoggio et al., 2018). Specifically, we use neuromodulation (Fig. 1), that has been recognized as an essential element in cognitive and behavioral processes play- ing an important role in, for example, facilitating the evo- lution of learning, adapting to dynamic environments, and acquisition of mental representation. Using EPANNs, we showed in one of the evolutionary experiments that evolved neural networks clearly had capacity for metamemory (Sudo et al., 2014), in the sense that they satisfy a measure based on a type of delayed matching-to-sample tasks (DMTSs) (Hampton, 2001) that were developed to ask whether mon- keys can have metamemory or not. However, metamemory is not something so simple (Call, 2010), because it is extremely difficult to conclude that a monkey subject can monitor her memory just by observing her behavior. That difficulty depends also on the difficulty in defining metamemory in the first place. In principle, we could analyze and understand all mechanisms and processes involved in artificial neural networks evolved in simulation, unlike the cases of using living subjects. We take the previ- ous evolutionary experiments (Sudo et al., 2014) as a start- ing point, and critically analyze the evolved networks and then refine the measure to exclude the evolution of networks whose mechanism or process seems different from that of metamemory. Our study scheme is based on the repetition of, evolutionary experiments, analysis of the evolved net- works, and refinement of the measure. Methodology Fig. 2 shows an overview of the task (Sudo et al., 2014), composed of 4 phases. In the study phase, an agent receives a target pattern composed of 5 binary digits. The delay phase Figure 1: Metamemory. Figure 2: The delayed match- to-sample task. follows, in which the agent receives 00000 as a distractor pattern on several occasions. Then, with a probability of 2/3, the choice phase starts during which the agent receives a signal meaning that it is in that phase. One output from the agent will be interpreted as the intention to decline the trial. We set the agent receives a small reward (0.3), and the trial ends. On the other hand, with a probability of 1/3, the choice phase is skipped as a compulsory condition. In the test phase, the agent receives all patterns one by one in random order. An output is interpreted as a response for each pattern. If it matches the target pattern it memorized in the study phase, the agent is rewarded with a large reward (1.0). Otherwise, it is rewarded with nothing. The neural network of an agent has 7 inputs and 2 out- puts. The topology of the networks evolves while keeping the number of the neurons (including standard and modu- latory neurons but excluding input neurons) not more than 16. Modulatory neurons are different from standard neu- rons, which affect the connection of standard neurons by changing their learning rate. We used an evolution strategy (ES) (B¨ ack et al., 1997) for evolution of topology and con- nection weights of neural networks, which is basically the same as the one used in Soltoggio et al. (2008). We defined the following three measures of metamemory one by one responding to the repetition in the study scheme, the one used in the escape response paradigm, the one which 461