arXiv:2111.02306v1 [stat.ME] 3 Nov 2021 A Causality-based Graphical Test to obtain an Optimal Blocking Set for Randomized Experiments Abhishek K. Umrawal Purdue University West Lafayette, 47906 aumrawal@purdue.edu Abstract Randomized experiments are often performed to study the causal effects of inter- est. Blocking is a technique to precisely estimate the causal effects when the ex- perimental material is not homogeneous. We formalize the problem of obtaining a statistically optimal set of covariates to be used to create blocks while perform- ing a randomized experiment. We provide a graphical test to obtain such a set for a general semi-Markovian causal model. We also propose and provide ideas towards solving a more general problem of obtaining an optimal blocking set that considers both the statistical and economic costs of blocking. 1 Introduction Studying the causal effect of some variable(s) on the other variable(s) is of common interest in social sciences, computer science, and statistics. However, a mistake that people usually make is, confusing the causal effect with an associational effect. For instance, if high levels of bad cholesterol and presence of a heart disease are observed at the same time, it doesn’t mean that the heart disease is caused by the high levels of bad cholesterol. The question is then how do we get to know if at all a variable causes the other? If the answer is yes, then what is the direction (positive or negative) and what is the magnitude, of the causal effect? Fisher (1992) provided the framework of randomized experiments to study the causal effect, where the variable whose causal effect is to be studied also known as treatment or cause, is randomized over the available experimental material like humans, rats, agricultural plots etc. and changes in the variable on which the causal effect is to be studied also known as response or effect, are recorded. A statistical comparison of values of the response with or without the treatment can therefore be done to study the existence, direction and magnitude of the cause-effect relationship of interest. Randomized experiments work on three basic principles viz. randomization, replication, and local control. Randomization states that the assignment of the treatment has to be random, replication states the treatment should be given to multiple but homogeneous units, i.e. there are multiple observations of the effect variable for both with and without the treatment. Hence, as long as the entire experimental material is homogeneous for instance, the fertility of all the agricultural plots is same, the responsiveness of all the humans is same for the drug, etc. then a ‘good’ randomized experiment can be carried out using the first two principles viz. randomization and replication, which gives rise to something called completely randomized design (CRD). But the cases when the entire experimental material is not homogeneous, i.e. some attributes of experimental units also known as covariates differ from each other, then the causal effect may get influenced by the covariates causing non-homogeneity like fertility, responsiveness etc. The remedy Accepted for presentation at Causal Inference Challenges in Sequential Decision Making: Bridging Theory and Practice (CSDNeurIPS) workshop. 35th Conference on Neural Information Processing Systems (NeurIPS 2021), Sydney, Australia.