  Citation: Hardy, N. “A Bias Recognized Is a Bias Sterilized”: The Effects of a Bias in Forecast Evaluation. Mathematics 2022, 10, 171. https://doi.org/10.3390/ math10020171 Academic Editors: Javier Perote, Andrés Mora-Valencia and Trino-Manuel Ñíguez Received: 16 November 2021 Accepted: 1 January 2022 Published: 6 January 2022 Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional afﬁl- iations. Copyright: © 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/). mathematics Article “A Bias Recognized Is a Bias Sterilized”: The Effects of a Bias in Forecast Evaluation Nicolas Hardy 1,2 1 Facultad de Economía y Empresa, Universidad Diego Portales, Santiago 8370179, Chile; nicolas.hardy@udp.cl 2 School of Business, Universidad Adolfo Ibáñez, Santiago 8380629, Chile Abstract: Are traditional tests of forecast evaluation well behaved when the competing (nested) model is biased? No, they are not. In this paper, we show analytically and via simulations that, under the null hypothesis of no encompassing, a bias in the nested model may severely distort the size properties of traditional out-of-sample tests in economic forecasting. Not surprisingly, these size distortions depend on the magnitude of the bias and the persistency of the additional predictors. We consider two different cases: (i) There is both in-sample and out-of-sample bias in the nested model. (ii) The bias is present exclusively out-of-sample. To address the former case, we propose a modiﬁed encompassing test (MENC-NEW) robust to a bias in the null model. Akin to the ENC-NEW statistic, the asymptotic distribution of our test is a functional of stochastic integrals of quadratic Brownian motions. While this distribution is not pivotal, we can easily estimate the nuisance parameters. To address the second case, we derive the new asymptotic distribution of the ENC-NEW, showing that critical values may differ remarkably. Our Monte Carlo simulations reveal that the MENC-NEW (and the ENC-NEW with adjusted critical values) is reasonably well-sized even when the ENC-NEW (with standard critical values) exhibits rejections rates three times higher than the nominal size. Keywords: forecasting; random walk; out-of-sample; bias; prediction; mean square prediction error 1. Introduction “Fortunately for serious minds, a bias recognized is a bias sterilized” Benjamin Haydon. Diebold and Mariano (1995) [1] and West’s (1996) [2] seminal papers are typically pinpointed as the Big Bang of the forecast evaluation literature in economics and ﬁnance. Even though both papers propose asymptotically normal tests for forecast evaluation, the proper environment of each paper is different. The former considers the case of comparing forecasts (i.e., the forecasts are assumed to be given), while the latter focuses on the case of comparing models (i.e., the forecasts are constructed through estimated parametric models). Put simply, the key contribution of West’s asymptotic theory is that it accounts for parameter uncertainty. While the asymptotic theory of [2] is quite general and allows a variety of estimation techniques and loss functions, it is not universal. One of the key assumptions in West’s theory is a full rank condition over the long-run variance of the loss function when parame- ters are set at their true values. One of the most iconic cases in which this condition is not fulﬁlled is the comparison of two competing nested models: Under the null hypothesis of no encompassing, both models are identical, and standard tests of forecast evaluation become degenerate. As pointed out by Clark and West (2006) [3] and West (2006) [4], for the case of MSPE, this degeneracy is not only important in a theoretical sense, but also in an empirical one: “[ ... ] use of standard critical values usually results in very poorly sized tests, with far too few rejections. As well, the usual statistic has very poor power.”[4] p. 119 (A note of caution here. We are not arguing that this degeneracy necessarily imply that tests become undersize. Some of the simulations in [5] suggest both types of size distortions: sometimes tests are undersized, sometimes they are oversized.). Mathematics 2022, 10, 171. https://doi.org/10.3390/math10020171 https://www.mdpi.com/journal/mathematics