Towards MCTS for Creative Domains Cameron Browne Computational Creativity Group Imperial College London 180 Queens Gate, SW7 2RH, UK camb@doc.ic.ac.uk Abstract Monte Carlo Tree Search (MCTS) has recently demon- strated considerable success for computer Go and other difficult AI problems. We present a general MCTS model that extends its application from searching for optimal actions in games and combinatorial optimisa- tion tasks to the search for optimal sequences and em- bedded subtrees. The primary application of this ex- tended MCTS model will be for creative domains, as it maps naturally to a range of procedural content genera- tion tasks for which Markovian or evolutionary ap- proaches would typically be used. Introduction Ludi is a system for automatically generating and evaluat- ing board games modelled as rule trees (Browne, 2008). New artefacts are created by evolving existing rule trees and measuring the results for quality through self-play. Although this process proved successful by creating a game of notable quality that is now commercially pub- lished (Andres, 2009), it also highlighted some problems with the evolutionary approach for game design: Wastage: Thousands of bad games were generated for every good one. Focus: Creativity only became evident when introns (flawed rules) were allowed to proliferate and breed. Bias: The choice of initial population biased the output, and if not themselves well-formed would not likely produce any playable children at all. Due to the random nature of crossover and mutation, there is no guarantee that the evolutionary process will converge to an optimal result. Might there be a better way? Monte Carlo Tree Search (MCTS) has revolutionised computer Go and is now a cornerstone of the strongest AI players (Coulomb 2006). It works by running large num- bers of random simulations and systematically building a search tree from the results. It has produced world cham- pion AI players for Go, Hex, General Game Playing, and unofficial world champions for a number of other games. An attractive feature of MCTS is its generality. It can be applied to almost any domain that can be phrased in terms of states and actions that apply to those states, and has been applied to optimisation tasks other than move planning in games, such as workforce scheduling, power grid control, economic modelling, and so on. MCTS is also: Aheuristic: No heuristic domain knowledge is required. Asymmetric: The search adapts to fit the search space. Convergent: The search converges to optimal solutions. MCTS systematically explores a given search space by preferring high-reward choices while guaranteeing the (eventual) exploration of low-reward options, and only requires a fitness function for completed artefacts to oper- ate. This makes it an attractive proposition for procedural content generation in creative domains; however, such problems tend to be more complex than simple {state, ac- tion} pairs. They are typically modelled as sequences, grammars, rule systems, expression trees, and so on, which are outside the scope of the standard MCTS algorithm. We propose a generalisation of the MCTS algorithm and its extension from the search for optimal actions to the search for optimal sequences and subtrees. This should have direct applicability to procedural content generation in game design and other creative domains, where it might augment or even provide an alternative to existing methods for creating new high quality artefacts. MCTS Figure 1, from Chaslot et al (2006), shows the four basic steps of the MCTS algorithm. Each node represents a state s and each edge represents an action a that leads to an up- dated state s’. Each node maintains a record of its esti- mated value, number of visits, and a list of child actions. The algorithm repeats the following process: starting at the root node R, descend through the tree (choosing the optimal action with each step) until a leaf node L is reached. Then, expand the tree by adding a new node N, complete the game by random simulation, and backpropa- gate the result up the list of selected nodes. UCB The key to the algorithm’s success lies in the method it uses to select optimal actions from among lists of those available during tree descent. A variation of the Up- per Confidence Bounds (UCB) method (Auer et al, 2002) is typically used to select the node that maximises: where Xi is the estimated (mean) value of child i, ni is the number of times child i has been visited, and n is the num- ber of times the node itself has been visited. Proceedings of the Second International Conference on Computational Creativity 96