The relationship between dynamic programming and active inference: the discrete, finite-horizon case Lancelot Da Costa l.da-costa@imperial.ac.uk Department of Mathematics Imperial College London London, SW7 2BU, UK Noor Sajid noor.sajid.18@ucl.ac.uk Wellcome Centre for Human Neuroimaging University College London London, WC1N 3AR, UK Thomas Parr thomas.parr.12@ucl.ac.uk Wellcome Centre for Human Neuroimaging University College London London, WC1N 3AR, UK Karl Friston k.friston@ucl.ac.uk Wellcome Centre for Human Neuroimaging University College London London, WC1N 3AR, UK Ryan Smith rsmith@laureateinstitute.org Laureate Institute for Brain Research Tulsa, OK 74136, United States Abstract Active inference is a normative framework for generating behaviour based upon the free energy principle, a theory of self-organisation. This framework has been successfully used to solve re- inforcement learning and stochastic control problems, yet, the formal relation between active inference and reward maximisation has not been fully explicated. In this paper, we consider the relation between active inference and dynamic programming under the Bellman equation, which underlies many approaches to reinforcement learning and control. We show that, on partially observable Markov decision processes, dynamic programming is a limiting case of active infer- ence. In active inference, agents select actions to minimise expected free energy. In the absence of ambiguity about states, this reduces to matching expected states with a target distribution encoding the agent’s preferences. When target states correspond to rewarding states, this max- imises expected reward, as in reinforcement learning. When states are ambiguous, active inference agents will choose actions that simultaneously minimise ambiguity. This allows active inference agents to supplement their reward maximising (or exploitative) behaviour with novelty-seeking (or exploratory) behaviour. This clarifies the connection between active inference and reinforcement learning, and how both frameworks may benefit from each other. Keywords: Active inference, reward maximisation, reinforcement learning, approximate Bayesian inference, stochastic optimal control. Contents 1 Introduction 2 2 Dynamic programming on finite horizon MDPs 4 2.1 Basic definitions ...................................... 4 2.2 Bellman optimal state-action policies ........................... 6 1 arXiv:2009.08111v3 [cs.AI] 22 Sep 2020