Submitted to Bernoulli Constraints in Gaussian Graphical Models BOHAO YAO 1,* and ROBIN J. EVANS 1,** 1 Department of Statistics, University of Oxford. E-mail: * bohao.yao@stats.ox.ac.uk; ** evans@stats.ox.ac.uk In this paper, we consider the problem of ﬁnding the constraints in bow-free acyclic directed mixed graphs (ADMGs). ADMGs are a generalisation of directed acyclic graphs (DAGs) that allow for certain latent variables. We ﬁrst show that minimal generators for the ideal I (G) containing all the constraints of a Gaus- sian ADMG G corresponds precisely to the pairs of non-adjacent vertices in G. The proof of this theorem naturally leads to an eﬃcient algorithm that ﬁts a bow-free Gaussian ADMG by maximum likelihood. In particular, we can test for the goodness of ﬁt of a given data set to a bow-free ADMG. Keywords: Graphical model, ADMG, BAP, SEM, model selection, ﬁtting. 1. Introduction Graphical models provide a powerful formalism for dealing with uncertainty for probabilistic mod- elling and inference, by encoding independence constraints into graphical representations. One popu- lar graphical model representing the joint probability distribution is a directed acyclic graph (DAG), where each vertex corresponds to a random variable and each arrow represents a ‘direct eﬀect’. The use of DAGs as a language for describing casual models has a long history in statistics, beginning with the seminal works by Wright [24, 25] with an emphasis on genetics. These models were later applied to econometrics [12] and the social sciences [3]. Today, DAGs are widely used in machine learning, bioinformatics and many other applications [14]. An important parametric subclass of DAG models are the linear structural equation models (SEMs). In fact, Wright’s work was originally within the SEM class. For the Gaussian case, given a DAG G =(V,E), the linear SEM is given as X i = X j∈pa(i) λ ji X j +  i , i ∈ V, where pa(i) represents the set of parents of the vertex i, each λ ij is the regression coeﬃcient obtained from regressing X i on X j (this coeﬃcient is an unknown) and each  i is an independent and centered Gaussian random variable with mean zero. The random vector X that solves the above SEM will follow a Gaussian distribution with mean zero and a structured covariance matrix. Each SEM naturally corresponds to a set of covariance matrices that can exist in this model, which we denote M(G). In particular, for a DAG G, the set of conditional independences yields an implicit description of M(G)[10, 13]. The conditional independences can be found graphically using the concept of d-separation [16]. We will provide a more rigorous deﬁnition of M(G) in Section 2. The popularity of DAGs stems from their well-understood theory, and several structural learning algorithms use observed conditional independences to ﬁnd all compatible DAGs [20]. Often, however, we might not be able to observe all relevant variables. The resulting marginal distribution over the observed variables might satisfy additional constraints resulting from the marginalisation, as we 1 imsart-bj ver. 2014/10/16 file: main.tex date: October 19, 2021 arXiv:1911.12754v1 [math.ST] 28 Nov 2019