The Neighborhood MCMC sampler for learning Bayesian networks Salem A. Alyami a,b , A. K. M. Azad a , and Jonathan M. Keith a a School of Mathematical Sciences, Monash University, Australia b Al Imam Mohammad Ibn Saud Islamic University (IMSIU), Saudi Arabia ABSTRACT Getting stuck in local maxima is a problem that arises while learning Bayesian networks (BNs) structures. In this paper, we studied a recently proposed Markov chain Monte Carlo (MCMC) sampler, called the Neighbourhood sampler (NS), and examined how efficiently it can sample BNs when local maxima are present. We assume that a posterior distribution f (N,E|D) has been defined, where D represents data relevant to the inference, N and E are the sets of nodes and directed edges, respectively. We illustrate the new approach by sampling from such a distribution, and inferring BNs. The simulations conducted in this paper show that the new learning approach substantially avoids getting stuck in local modes of the distribution, and achieves a more rapid rate of convergence, compared to other common algorithms e.g. the MCMC Metropolis-Hastings sampler. Keywords: Directed acyclic graph, structure inference, local maxima, graph space 1. INTRODUCTION Bayesian Networks (BNs) are directed acyclic graphs (DAGs) that are used as a probabilistic method to visually represent directed causal relationships among variables, learned from a dataset. Nodes of the graph represent random variables, and directed edges represent causal relationships. Sampling algorithms in the spaces of BNs are computationally intensive because the number of DAGs dramatically increases with the number of nodes. For example, there are 543 and 3 781 503 possible BNs in a graph space with 4 and 6 nodes, respectively. Learning a BN typically involves two conceptually different elements: structure learning and parameter learning. Structure learning involves inferring the variables that interact and the causal directions of those interactions i.e. it is inferring the set of edges connecting a set of candidate nodes. For a fixed structure, parameter learning involves quantitatively estimating probabilistic dependencies among variables. In practice, structure and parameter learning may be performed simultaneously. In this paper, both types of learning are explicitly considered. BN structures have been widely learned by score-based algorithms. This category of algorithms aim to maximise the pre-assigned score of each BN using a heuristic search algorithm. One of the most widely studied heuristic search methods is Greedy Algorithms (GAs). 1 GAs typically update a given BN by either adding, deleting or reversing a particular directed edge at each step. Among the most widely used special GAs are Hill-Climbing (HC) algorithm and Tabu Search (TS) algorithm. 2, 3 The HC algorithm iteratively starts with an arbitrary BN, and then applies a local search to its neighbors in the hope to find a neighboring network with a better score. It repeats this process until no further improvements can be obtained. The TS algorithm also runs a local search similar to the HC, however, it intentionally enhances the performance of local search by relaxing its acceptance function i.e. when the search gets stuck at a local mimimum and no improving move is available, worsening moves can then be accepted. The TS algorithm also uses a memory structure that describes all visited solutions. If a particular BN has been previously visited but not improved the score, it is then marked as "tabu" and not considered again. However, heuristic search is a problem when the immediate neighbours of a network do not provide any better solution. The category of constraint-based algorithms is another main class. It aims to analyse the probabilistic relations entailed by the Markov property of BNs with conditional independence tests and then construct a BN that satisfies the corresponding d-separation statements. One common algorithm attributed to this category is the Grow-Shrink (GS) approach. 4 It constructs BNs by identifying the Markov blanket for each node, and then connect nodes. This is in order to avoid producing dense nets. Corresponding author to Salem A. Alyami, E-mail: salem.alyami@monash.edu, Telephone: 1 505 123 1234 First International Workshop on Pattern Recognition, edited by Xudong Jiang, Guojian Chen, Genci Capi, Chiharu Ishii, Proc. of SPIE Vol. 10011, 100111K © 2016 SPIE · CCC code: 0277-786X/16/$18 · doi: 10.1117/12.2242708 Proc. of SPIE Vol. 10011 100111K-1 Downloaded From: http://spiedigitallibrary.org/ on 07/12/2016 Terms of Use: http://spiedigitallibrary.org/ss/TermsOfUse.aspx