Population statistics cannot be used for reliable individual prediction Richard Kennaway School of Information Systems University of East Anglia, Norwich NR4 7TJ, U.K. 9 December 1997 Abstract It is well-known that predictions about individuals from statistical data about the pop- ulation are in general unreliable. However, the size of the problem is not commonly re- alised, and predictions about individuals are in practice often made. For a number of ways of predicting information about one variable from another with which it is correlated, we compute the reliability of such predictions, given the correlation. Assuming a bivariate normal distribution, we demonstrate that unless the correlation is at least 0.99, not even the sign of a variable can be predicted with 95% reliability in an individual case. The other prediction methods we consider do no better. We do not expect our results to be substantially different for other distributions or statistical analyses. Correlations as high as 0.99 are almost unheard of in areas where correlations are routinely calculated. Where reliable prediction of one variable from another is required, measurement of correlations is irrelevant, except to show when it cannot be done. An empirical study of correlations reported in the sociological literature ([McP71]) found that only 1% of the correlations reported in the papers studied were over 0.4, and only two out of 281 correlations exceeded 0.6. In that area, a correlation of 0.8 is generally considered high, and a correlation of 0.2 is publishable as demonstrating a connection between two variables. We consider the question of what such correlations imply for the task of reliably and/or accurately predicting the value of one variable from the other. We demonstrate that it is impos- sible to reliably estimate even the sign of the variable relative to its mean unless the correlation is at least 0.99. For lesser correlations, such a prediction will do better than chance on average, and for some purposes, this is all that is required. However, such correlations are useless for making reliable predictions in individual cases. Correlations of this level are virtually unheard of in almost every discipline where statistical methods are commonly employed. 1