Mining Plausible Hypotheses from the Literature via Meta-Analysis Jooyong Yi Ulsan National Institute of Science and Technology South Korea jooyongyi@acm.org Vladimir Ivanov Giancarlo Succi Innopolis University Russia {v.ivanov, g.succi}@innopolis.ru Abstract—Meta-analysis is highly advocated in many fields of empirical research such as medicine and psychology, due to its capability to synthesize quantitative evidence of effects from the literature, based on statistical analysis. However, the adoption of meta-analysis to software engineering is still suffering from inertia, despite the fact that many software engineering researchers have long been arguing the need for it. As an attempt to move beyond the lockstep, we in this paper explore a different use of meta-analysis. Our proposition is that meta-analysis is useful for mining hypotheses because their plausibility is backed by evidence accumulated in the literature, and thus researchers could focus their effort on the areas that are of particular need. We assess our proposition by conducting a lightweight case study on the literature of defect prediction. We found that three out of five hypotheses we extract from our meta-analysis were indeed investigated in separate papers, indicating the usefulness of our approach. We also recognize two uninvestigated hypotheses whose validity we plan to investigate in the future. I. I NTRODUCTION In this paper, we provide a new perspective on an existing idea we believe worth of much more attention. The existing idea we advocate here is meta-analysis. Meta-analysis refers to the statistical analysis and synthesis of results from a series of primary studies, i.e., individual studies under investiga- tion [3], [10]. Common use of meta-analysis is to synthesize consolidated evidence from a body of primary studies often containing conflicting results. Meta-analysis has been advocated and widely used in many fields of empirical research, such as medicine [10], psychology [1], education [5], and business [13], enabling evidence-based practice and decision-making. In software en- gineering, despite the community-wide attention to Empirical and Evidence-Based Software Engineering movement initiated by Victor Basili, Barry Boehm, and Dieter Rombach in the 80’s [2], the generalization of findings across experiment in general, and the adoption of meta-analysis specifically remain limited to a fraction of papers such as [22], [25]. Further adoption of meta-analysis to software engineering research has been called for by many researchers, for example in [6], [14], [20], [24], to name a few. Our work is in line with those pieces of work, but also moves beyond them, by advocating the new use of meta-analysis for mining hypotheses whose plausibility is backed by evidence accumulated in the literature. Considering that setting up a proper hypothesis is a critical first step of empirical research, and performing a hypothesis test typically takes substantial amount of time of researchers, it is our proposition that using meta-analysis to mine a hypothesis would produce a significant advance of the discipline, since researchers could focus their effort on the areas that are of particular need. How can hypotheses be mined from the literature via meta- analysis? As will be shown with a concrete example in this paper, we make use of two standard mechanisms of meta- analysis: a forest plot and subgroup analysis (an example is shown in Fig. 1). With a forest plot, one can see whether or not similar results are observed across multiple studies in a consis- tent manner. If that is the case, the observed consistency can form a plausible hypothesis. Otherwise, subgroup analysis can be performed to see which factors contribute to inconsistent observations, and a hypothesis can be made accordingly for each of those factors. In other fields such as medicine, meta-analysis has already been used in suggesting a new hypothesis [7]. Our key con- tribution in this paper is to provide early assessment of using meta-analysis in generating evidence-backed hypotheses in the context of software engineering. We assess our proposition by (1) extracting five hypotheses from the literature on defect prediction via meta-analysis, and (2) subsequently checking whether those hypotheses were indeed tested in separate papers in the literature. The rationale of the second step is as follows. The fact that a hypothesis was tested in the literature indicates its worthiness for consideration. We found that two hypotheses were indeed studied in recent papers, one hypothesis was partially investigated, and two hypotheses remain to be studied. II. BACKGROUND A common way to perform a meta-analysis is to construct a forest plot such as Fig. 1. Information that can be shown with a forest plot includes the following. Individual Studies: In Fig. 1, each square represents the mean effect size of the corresponding study, with its size proportional to its weight reflecting the precision of the study. Meanwhile, each horizontal line crossing across a square represents the 95% confidence interval of the corre- sponding study. Summary Effect: In Fig. 1, the diamond in the bottom represents the summary effect, that is, the weighted mean of 33 2019 IEEE/ACM 41st International Conference on Software Engineering: New Ideas and Emerging Results (ICSE- NIER) 978-1-7281-1758-4/19/$31.00 ©2019 IEEE DOI 10.1109/ICSE-NIER.2019.00017