Nested Monte-Carlo Search Tristan Cazenave LAMSADE Universit´ e Paris-Dauphine Paris, France cazenave@lamsade.dauphine.fr Abstract Many problems have a huge state space and no good heuristic to order moves so as to guide the search toward the best positions. Random games can be used to score positions and evaluate their interest. Random games can also be improved us- ing random games to choose a move to try at each step of a game. Nested Monte-Carlo Search ad- dresses the problem of guiding the search toward better states when there is no available heuristic. It uses nested levels of random games in order to guide the search. The algorithm is studied theoreti- cally on simple abstract problems and applied suc- cessfully to three different games: Morpion Soli- taire, SameGame and 16x16 Sudoku. 1 Introduction When there is no available heuristic, it can be useful to per- form random playouts in order to evaluate the interest of de- veloping a position. Moreover, it is important to optimize moves at all stages of a game and not only near the root. Nested Monte-Carlo Search uses random playouts at its base level. A search at a given level uses searches at the lower level to decide which move to play in its game. Since a complete game is performed at each level the moves are optimized at all stages. The point of this paper is to show that random moves can be successfully used at the base level of a nested search algo- rithm, and that memorizing the best sequence is very useful in that case. A theoretical analysis of the algorithm as well as its successful application to three different games with huge state spaces are presented. The outline of this paper is as follows: the next section presents related work, section 3 presents the Nested Monte- Carlo Search algorithm, section 4 analyzes the algorithm on two simple abstract problems, section 5 gives experimental results for three different games. 2 Related work The simplest Monte-Carlo search algorithm is Iterative Sam- pling, it consists in playing random games until a solution is found or the search time is elapsed. Rollouts were successfully used by Tesauro and Galperin to improve their Backgammon program [Tesauro and Galperin, 1996]. Nested rollouts combined with an heuristic to choose the next move at the base level were used by Yan et al. to im- prove their Klondike solitaire program [Yan et al., 2005]. Nested rollouts have been used with heuristics that change with the stage of the game of Thoughtful Solitaire, a version of Klondike Solitaire in which the locations of all cards is known [Bjarnason et al., 2007]. These algorithms use a base heuristic which is improved with nested rollouts, whereas our algorithm uses random moves at the base level. A related algorithm is Reflexive Monte-Carlo search [Cazenave, 2007] which has been used to find long sequences at Morpion Solitaire. The idea of Reflexive Monte-Carlo search has some similarity with the nested rollouts idea, it consists in playing random playouts at the base level, and to play a few games at the lower level of a search in order to find the best move at the current level of the search. Games at the meta level give better results than games at the lower level. In Reflexive Monte-Carlo search, there is a fixed num- ber of games played at each level of the search before decid- ing the move to play. Whereas in Nested Monte-Carlo Search each possible move is tried only once before each lower level search. The use of Monte-Carlo methods in games has been re- cently very successful for the game of Go [Gelly and Silver, 2007]. 3 The algorithm Nested Monte-Carlo Search combines nested calls with ran- domness in the playouts and memorization of the best se- quence of moves. In nested rollouts the rollouts are based on a heuristic. It implies that nested rollouts always improves on rollouts and on simply following the heuristic. When the base level does not use a heuristic but random moves, it is possi- ble that a nested search gives worse results than a lower level search. It is then useful to memorize the best sequence found so far in order to follow it when the randomized searches give worse results than the best sequence. The basic sample function just plays a random game from a given position, we use the function play(position, move) which plays the move in the position and returns the resulting position: 456