Emergent Policy Discovery for Visual Reinforcement Learning through Tangled Program Graphs: A Tutorial Stephen Kelly 1 , Robert J. Smith 1 , and Malcolm I. Heywood 1 1 Faculty of Computer Science, Dalhousie University, Halifax, NS. Canada Article appears at GPTP XVI under Springer copyright 2019. ∗ https://link.springer.com/chapter/10.1007/978-3-030-04735-1_3 Abstract Tangled Program Graphs (TPG) represents a framework by which multiple pro- grams can be organized to cooperate and decompose a task with minimal a priori information. TPG agents begin with least complexity and incrementally coevolve to discover a complexity befitting the nature of the task. Previous research has demon- strated the TPG framework under visual reinforcement learning tasks from the Atari Learning Environment and VizDoom first person shooter game that are competitive with those from Deep Learning. However, unlike Deep Learning the emergent con- structive properties of TPG results in solutions that are orders of magnitude simpler, thus execution never needs hardware support. In this work, our goal is to provide a tutorial overview demonstrating how the emergent properties of TPG have been achieved as well as providing specific examples of decompositions discovered under the VizDoom task. 1 Introduction Visual reinforcement learning represents the direct application of reinforcement learn- ing algorithms to frame (pixel) data from camera or video sources. The learning agent is therefore able to interact with the environment more directly than previously pos- sible, i.e. there are no a priori decisions made regarding what features are useful/ important, potentially reducing sources of bias. To date, such approaches have been dominated by results from deep learning that have successfully minimized the amount of pre-processing necessary to the source images (typically no more than image cropping * This version of the article includes colour illustrations/screen captures that could not be included in the book chapter published by Springer. 1