DRAFT 1 On Heritabilty in the Presence of a Stochastic Environment Nassim Nicholas Taleb June 2023 I NTRODUCTION AND BACKGROUND The genetics of twin studies are designed to show more heredity than reality, owing to a statistical artifact,[1],[2]. The twin studies for heredity is based on comparing the correlation between 2 identical twins minus that between 2 fraternal ones (assumed to be sharing half their genes). The use of fraternal twins as control is assumed to extract the "environmental" factors. The convention is to [ref] apply Falconer’ s Formula: h 2 =2(ρ i - ρ f ) , (1) where ρ is a Pearson correlation index and f is index for fraternal and i for identical. Let’s forget the rationale behind how h was obtained for now and focus on ρ. Assume further that ρ is from a linear dependence with Gaussian errors, etc. The problem is that correlation is conditional and in ac- counts is typically presented as unconditional[3]. The math is entirely different. Example: Consider the heritability of diabetes under the following two conditions. • Environment A : Minneapolis, the correlation between the identical twins is .80 , that between the fraternal twins is .40 • Environment B: The Kawalahari desert. Both correlations are the same, hence the Falkoner h 2 is going to be 0. Now add environments c, d, etc. What would h 2 be ? The unconditional correlation is clearly much lower than the conditional – either because the environment might change (you might end up spending some time in a desert) or because you might do something about the exposure, say join a Tuesday and Thursday evening cycling club. Note that the same applies to intelligence. Intelligence in Cambridge, Mass has different attributes than the one in a more organic environment where tasks are different, say during an urban guerrilla operation –unless one buys the g theory of domain general intelligence and that such theory can work under nonlinearity of payoff. Now, for some formalism, consider a continuous mixing of distributions, D s . The vector X = X 1 ,X 2 (two-dimensional to simplify for now). X ≈D ω with proportion g(ω) (or probability). The unconditional characteristic function χ t D = E ω ( E X ( e itX(ω) )) , the conditional one: χ t (ω) D = E X ( e itX(ω) ) . Let g be a distribution of continuous states indexed by ω, χ t = ∞ 0 g(ω)e itX(ω) dω, with ∞ 0 g(ω)dω =1 (we can discretize later). What is the correlation in that situation? For all distributions (though the meaning is weakened outside the Gaussian), it is defined as : ρ (X 1 ,X 2 ) (unconditional) D = E ((X 1 - μ 1 )(X 2 - μ 2 )) E (X 1 - μ 1 ) 2 E (X 2 - μ 2 ) 2 , (2) which can be calculated using the log characteristic functions as ρ (X 1 ,X 2 )= (- ˜ i) 2 ∂ 2 log(χt) ∂(t1)∂(t2) ∂ 2 log(χt) ∂(t1) 2 ∂ 2 log(χt) ∂(t2) 2 | t1=0,t2=0 We consider the Gaussian Case: χ (t 1 ,t 2 )= ∞ 0 g(ω) exp j (t 1 μ 1 (ω)+ t 2 μ 2 (ω))-t 1 t 2 ρ(ω)σ 2 (ω)σ 1 (ω) - 1 2 t 2 1 σ 1 (ω) 2 - 1 2 t 2 2 σ 2 (ω) 2 dω (3) Allora (4) ρ = - ∞ 0 - g(ω)(μ 1 (ω)μ 2 (ω)+ ρ(ω)σ 1 (ω)σ 2 (ω)) dω + ∞ 0 ig(ω)μ 1 (ω)dω ∞ 0 ig(ω)μ 2 (ω)dω ∞ 0 -g(ω)(μ 1 (ω) 2 + σ 1 (ω) 2 ) dω - ∞ 0 ig(ω)μ 1 (ω)dω 2 × ∞ 0 -g(ω)(μ 2 (ω) 2 + σ 2 (ω) 2 ) dω - ∞ 0 ig(ω)μ 2 (ω)dω 2 Discrete regimes Let’ s simplify the world, and discretize to apply to n regimes for a bivariate Gaussian, with regimes indexed by i and probabilities (weights) ω i , Σω i =1: (X = X 1 ,X 1 ) ∼N [{μ 1,i ,μ 2,i } , {σ 1,i ,σ 2,i } ,ρ i ] w.p. ω i And the characteristic function: