The Influence Function of Stahel-Donoho Type Methods for Robust PCA M. Debruyne 1 and M. Hubert 1 1 K.U.Leuven, Department of Mathematics, W. De Croylaan 54, B-3001 Leuven, Belgium Keywords: influence function, covariance, Stahel-Donoho, PCA, PLS 1 Stahel-Donoho Consider a p-dimensional sample X =(x 1 ,...,x n ) of size n. In this paper we will concentrate on Stahel-Donoho type estimators of covariance. By this, we mean estimators based on the Stahel- Donoho outlyingness r(x i , X), defined as follows (Stahel, 1981; Donoho, 1982): r(x i , X) = sup aR p a t x i - m(a t X) s(a t x) where m(.) and s(.) are univariate robust estimators of location and scale. In order to obtain robust estimates of the covariance matrix, we want to concentrate on those data points with small outlyingness. We consider two options. A first approach consists of downweighting all observations according to their outlyingness. We will call this estimator weighted Stahel-Donoho (SD w ) from now on. Several choices for the weighting function have been proposed, see Maronna and Yohai (1995), Zuo et al. (2004) and Gervini (2002). A second approach was proposed in Hubert et al. (2005). A proportion 0 <α< 1/2 is chosen. Only the (1 - α)n observations with smallest outlyingness are used in the estimation. We will call this estimator Stahel-Donoho with smallest outlyingness (SD so ) from now on. In this paper we derive the influence function of the SD so estimator of covariance. This will allow us to consider following topics. Visualising the effects of outliers by plotting the influence function in the two or three di- mensional case. Give some insight in the robustness of the estimator by calculating gross error sensitivities for general α and p. Calculating asymptotic efficiencies for several values of α and p. All these results will be compared to the corresponding results for SD w and MCD, obtained by Gervini (2002) and Croux and Haesbroeck (1999). 2 Applications in PCA and PLS PCA is a very popular technique for analyzing multivariate data. It consists of finding orthogonal directions which maximize the variance captured in the data. These directions can be computed as the eigenvectors of an estimate of the covariance matrix. Classical PCA uses the classical sample covariance matrix to do so and thus outliers can have a very damaging effect. Recently ROBPCA, a robust PCA algorithm was proposed by Hubert et al. (2004). In this method, the SD so estimator of covariance plays a crucial role. A widely used technique for high dimensional regression is PLS. A robust version of this method was introduced by Hubert and Vanden Branden (2003). In this RSIMPLS algorithm, the SD so estimator again occurs.