Acta Mathematicae Applicatae Sinica, English Series
Vol. 25, No. 3 (2009) 445–456
DOI: 10.1007/s10255-008-8815-1
http://www.ApplMath.com.cn
Acta Mathemaca Applicatae
Sinica, English Series
© The Editorial Office of AMAS
& Springer-Verlag 2009
Approximating Conditional Density Functions Using
Dimension Reduction
Jian-qing Fan
1
, Liang Peng
2
, Qi-wei Yao
3,∗
, Wen-yang Zhang
4
1
Department of Operations Research, and Financial Engineering, Princeton University, Princeton, NJ 08540,
USA (E-mail: jqfan@princeton.edu)
2
School of Mathematics, Georgia Institute of Technology, Atlanta, GA 30332-0160, USA
(E-mail: peng@math.gatech.edu)
3
Department of Statistics, London School of Economics, London WC2A 2AE, UK (E-mail: q.yao@lse.ac.uk)
4
Department of Mathematical Sciences, University of Bath, Bath BA2 7AY, UK
(E-mail: wz217@maths.bath.ac.uk)
Abstract We propose to approximate the conditional density function of a random variable Y given a de-
pendent random d-vector X by that of Y given θ
τ
X, where the unit vector θ is selected such that the average
Kullback-Leibler discrepancy distance between the two conditional density functions obtains the minimum. Our
approach is nonparametric as far as the estimation of the conditional density functions is concerned. We have
shown that this nonparametric estimator is asymptotically adaptive to the unknown index θ in the sense that
the first order asymptotic mean squared error of the estimator is the same as that when θ was known. The
proposed method is illustrated using both simulated and real-data examples.
Keywords Conditional density function; dimension reduction, Kullback-Leibler discrepancy;
local linear regression; nonparametric regression; Shannon’s entropy
2000 MR Subject Classification 62E17; 62G05; 62G20
1 Introduction
Estimating a conditional probability density function features in many statistical problems,
including, prediction and forecasting
[15,16,18,20]
, measuring the initial-value sensitivity in a
stochastic nonlinear dynamical system
[11,29]
, asset pricing
[7]
, inference on conditional heterosced-
asticity
[4]
etc.. In most of these problems we are interested in estimating the conditional density
function p(y|x) of a scalar random variable Y , given a random d-vector X = x. Double smooth-
ing in both y- and x-directions is required in order to estimate p(y|x) in a purely nonparametric
manner. Even for d as small as 2 or 3, such an estimator may suffer from the poor accuracy
due to the so-called “curse-of-dimensionality”.
In this paper we propose a new and pragmatic approach. Instead of estimating p(y|x)
directly, we suggest to approximate it by a conditional density function g(y|θ
τ
x) ≡ g
θ
(y|θ
τ
x)
of Y given θ
τ
X = θ
τ
x, where θ is independent of y and x, and is selected such that the
(average) Kullback-Leibler discrepancy measure
E{log p(Y |X)}- E{log g(Y |θ
τ
X)} (1.1)
Manuscript received February 3, 2009.
1
Fan’s work was partially supported by US National Science Foundation grant DMS-0704337 and National
Natural Science Foundation of China (No. 10628104).
3
Yao was partially supported by an EPSRC research grant EP/C549058/1.
*
Corresponding author.