DRAFT 1 Fooled by Correlation: Common Misinterpretations in Social "Science" Nassim Nicholas Taleb March 2019 Abstract—We present consequential mistakes in uses of correlation in social science research: 1) use of subsampling since (absolute) correlation is severely subadditive 2) misinterpretation of the informational value of corre- lation owing to nonlinearities, 3) misapplication of correlation and PCA/Factor analysis when the relationship between variables is nonlinear, 4) How to embody sampling error of the input variable 5) Intransitivity of correlation 6) Other similar problems mostly focused on psychomet- rics (IQ testing is infected by the "dead man bias") 7) How fat tails cause R 2 to be fake. We compare to the more robust entropy approaches. CONTENTS I Correlation is subaditive (in absolute value) 1 I-A Intuition via one-dimensional represen- tations .................. 3 I-B Mutual Information is Additive ..... 3 I-C Example of Quadrants .......... 3 II Rescaling: A 50% correlation doesn’t mean what you think it means 4 II-A Variance method ............. 4 II-A1 Drawback .......... 4 II-A2 Adjusted variance method . . 4 II-B The ϕ function .............. 4 II-C Mutual Information ........... 5 II-D PCA with Mutual Information ..... 5 III Embedding Measurement Error 6 IV Transitivity of Correlations 6 V Nonlinearities and other defects in "IQ" studies and psychometrics in general 7 V-A Using a detector of disease as a detector of health ................. 7 V-A1 Sigmoidal functions ..... 8 V-B ReLu type functions (ramp payoffs) .. 8 V-C Dead man bias .............. 8 V-D State dependent correlation (Proof that psychometrics fail in their use of the "g") 8 VI Statistical Testing of Differences Between Vari- ables 8 VII Fat Tailed Residuals in Linear Regression Mod- els 9 Appendix 10 1 Mean Deviation vs Standard Deviation ........... 10 2 Relative Standard Deviation Error ............. 10 3 Relative Mean Deviation Error 10 4 Finalmente, the Asymptotic Relative Efficiency For a Gaussian ........... 10 A Effect of Fatter Tails on the "efficiency" of STD vs MD ............. 10 References 11 I. CORRELATION IS SUBADITIVE ( IN ABSOLUTE VALUE) 0.525618 0.181141 0.525618 0.181141 -2 -1 0 1 2 -2 -1 0 1 2 Fig. 1. Total correlation is .75, but quadrant correlations are .52 (second and fourth quadrant) and .18 (first and third). If in turn we make the "quadrants" smaller, say the 2 nd one into Q = (0, 2), (0, 2), correlation will be even lowe, .38 (next figure).