Acta Mathematicae Applicatae Sinica, English Series Vol. 25, No. 3 (2009) 445–456 DOI: 10.1007/s10255-008-8815-1 http://www.ApplMath.com.cn Acta Mathemaca Applicatae Sinica, English Series © The Editorial Office of AMAS & Springer-Verlag 2009 Approximating Conditional Density Functions Using Dimension Reduction Jian-qing Fan 1 , Liang Peng 2 , Qi-wei Yao 3, , Wen-yang Zhang 4 1 Department of Operations Research, and Financial Engineering, Princeton University, Princeton, NJ 08540, USA (E-mail: jqfan@princeton.edu) 2 School of Mathematics, Georgia Institute of Technology, Atlanta, GA 30332-0160, USA (E-mail: peng@math.gatech.edu) 3 Department of Statistics, London School of Economics, London WC2A 2AE, UK (E-mail: q.yao@lse.ac.uk) 4 Department of Mathematical Sciences, University of Bath, Bath BA2 7AY, UK (E-mail: wz217@maths.bath.ac.uk) Abstract We propose to approximate the conditional density function of a random variable Y given a de- pendent random d-vector X by that of Y given θ τ X, where the unit vector θ is selected such that the average Kullback-Leibler discrepancy distance between the two conditional density functions obtains the minimum. Our approach is nonparametric as far as the estimation of the conditional density functions is concerned. We have shown that this nonparametric estimator is asymptotically adaptive to the unknown index θ in the sense that the first order asymptotic mean squared error of the estimator is the same as that when θ was known. The proposed method is illustrated using both simulated and real-data examples. Keywords Conditional density function; dimension reduction, Kullback-Leibler discrepancy; local linear regression; nonparametric regression; Shannon’s entropy 2000 MR Subject Classification 62E17; 62G05; 62G20 1 Introduction Estimating a conditional probability density function features in many statistical problems, including, prediction and forecasting [15,16,18,20] , measuring the initial-value sensitivity in a stochastic nonlinear dynamical system [11,29] , asset pricing [7] , inference on conditional heterosced- asticity [4] etc.. In most of these problems we are interested in estimating the conditional density function p(y|x) of a scalar random variable Y , given a random d-vector X = x. Double smooth- ing in both y- and x-directions is required in order to estimate p(y|x) in a purely nonparametric manner. Even for d as small as 2 or 3, such an estimator may suffer from the poor accuracy due to the so-called “curse-of-dimensionality”. In this paper we propose a new and pragmatic approach. Instead of estimating p(y|x) directly, we suggest to approximate it by a conditional density function g(y|θ τ x) g θ (y|θ τ x) of Y given θ τ X = θ τ x, where θ is independent of y and x, and is selected such that the (average) Kullback-Leibler discrepancy measure E{log p(Y |X)}- E{log g(Y |θ τ X)} (1.1) Manuscript received February 3, 2009. 1 Fan’s work was partially supported by US National Science Foundation grant DMS-0704337 and National Natural Science Foundation of China (No. 10628104). 3 Yao was partially supported by an EPSRC research grant EP/C549058/1. * Corresponding author.