EXPENDITURE MODELING WITH A MIXTURE OF LOGNORMAL DISTRIBUTIONS Stanislav Kolenikov ! , University of North Carolina at Chapel Hill and Centre for Economic and Financial Research, Moscow, and Serguei A. Aivazian, Central Economics and Mathematics Institute, Russian Academy of Sciences ! Corresponding author. Mail address: 117 New West, Cameron Ave, UNC CB#3260, Chapel Hill, NC 27599-3260, US. E-mail: skolenik@unc.edu Keywords: Finite Mixture, Income Distribu- tion, Missing Data, Maximum Likelihood, Parametric Bootstrap, RLMS. AMS subject classification: primary 62P20. JEL classification numbers: I32, C13, C15, D31, P29. Motivation The adequate evaluation of success of market reforms in transition economies necessarily in- cludes the assessment of the reform social cost, including welfare redistribution. The main source of information on the distribution of income, expenditures and wealth are population surveys [1,2]. Various distortions and deficiencies of the available survey micro data complicate this as- sessment. Because of wage arrears, as well as high shares of informal economic activities, in- cluding home production, the welfare of a household is better represented by (per capita) expenditures than by the officially reported in- come. Besides, survey participation rates tend to differ in different welfare groups. One of the manifestations of those deficiencies is a huge discrepancy between the mean income as found from the macroeconomic statistics, and one found from survey data. For the time period analyzed in this paper, the macroeconomic mean income for the Q4 1998 as reported in [3] is 1211 rub., while the sample mean from the raw data [2] is 913 rub. The distributional model currently used by the Russian statistical authority, Goskomstat (The State Committee on Statistics) is the lognormal distribution [4], for which the location parameter (mean or mode) is estimated from macroeco- nomic trade statistics, and the variance parameter is estimated from sample income data. We propose several refinements to this model. The first one is to use expenditure information that seems to represent the household financial situation better than income. The second is to approximate the shape of expenditure distribu- tion by a univariate mixture of lognormal com- ponents. Such a model can be estimated by the maximum likelihood method from survey data, with special attention paid to the choice of the appropriate number of the mixture components. Third, we introduce weights to account for pro- pensity to avoid disclosing income information. Finally, having estimated the above model, we use a parametric bootstrap to reconstruct the ob- servations from the range of very high expendi- tures not touched upon by the sample. The esti- mates of the expenditure distribution thus ob- tained are used to construct popular inequality and poverty indices. The results suggest that un- adjusted estimates of income inequality and pov- erty (including the officially reported poverty rates and the values of Gini index) might be seri- ously biased downwards. Assumptions and Hypotheses The following assumptions and hypotheses are used throughout the analysis. Hypothesis H 1 states that the per capita expen- ditures distribution of Russian households can be adequately described by a mixture of lognormal laws. This hypothesis can be verified by fit crite- ria such as the Pearson χ 2 or Kolmogorov- Smirnov test (the latter is known to have low power when there are parameters to be estimated though). A justification for such discrete mixture is that the contemporary Russian society is be- lieved to be stratified into several income groups, including “old economy” workers, “new econ- omy” workers, and entrepreneurs, with incomes varying about an order of magnitude between groups. Assuming that the distribution within each group is lognormal, the discrete mixture is a reasonable approximation once the groups are well separated. The hypothesis H 1 provides a flexible not-so-parametric approach to density estimation. Hypothesis H 2 states that the probability of household refusal to participate in the official budget survey is a function of its social, eco- nomic, and geographical characteristics. This hypothesis was suggested by E. B. Frolova, the Head of Living Standards Department of Go- skomstat, and was taken from field experience. As long as we use panel data (see description below in the Data section), we can find both fi-