Principal component analysis for compositional data with outliers Peter Filzmoser 1 , Karel Hron 2 and Clemens Reimann 3 1 Department of Statistics and Probability Theory, Vienna University of Technology, Wiedner Hauptstr. 8-10, 1040 Vienna, Austria; e-mail: P.Filzmoser@tuwien.ac.at 2 Department of Mathematical Analysis and Applications of Mathematics, Palack“ y University Olomouc, Tomkova 40, CZ-77100 Olomouc, Czech Rep.; e-mail: hronk@seznam.cz 3 Geological Survey of Norway (NGU), N-7491 Trondheim, Norway; e-mail: Clemens.Reimann@ngu.no SUMMARY Compositional data (almost all data in geochemistry) are closed data, i.e. they sum up to a constant (e.g. 100 weight percent). Thus the correlation structure of compositional data is strongly biased and results of many multivariate techniques become doubtful without a proper transformation of the data. The centered logratio transformation (clr) is often used to open closed data. However the transformed data do not have full rank following a logratio transformation and cannot be used for robust multivariate techniques like principal com- ponent analysis (PCA). Here we propose to use the isometric logratio transformation (ilr) instead. However, the ilr transformation has the disadvantage that the resulting new vari- ables are no longer directly interpretable in terms of the originally entered variables. Here we propose a technique how the resulting scores and loadings of a robust PCA on ilr trans- formed data can be back-transformed and interpreted. The procedure is demonstrated using a real data set from regional geochemistry and compared to results from non-transformed and non-robust versions of PCA. It turns out that the procedure using ilr transformed data and robust PCA delivers superior results to all other approaches. The examples demonstrate that due to the compositional nature of geochemical data PCA should not be carried out without an appropriate transformation. Furthermore a robust approach is preferable if the dataset contains outliers. KEY WORDS: robust statistics; compositional data; isometric logratio transformation; prin- cipal component analysis