Journal of Analytical Toxicology, Vol. 9, July/August 1985 LETTERS TO THE EDITORB A Discussion of Principal Component Analysis To the Editor: Musumarra et al. (1) used principal component analysis (PCA) for the purpose of system evaluation and substance identification. They investigated a data set consisting of R~ values for 54 drugs measured in eight different TLC systems. Although we share their conclusion that PCA can be fruitful in analytical toxicology, we have some comments. Their fl:-/:t, and 0.,-0, plots seem to be the standard raw output plots of the SIMCA program package (2), which as such are not very well suited for publication purposes. These plots have different scalings along PC axes 1 and 2, which makes comparison of distances impossible. For a proper interpretation, the ~2-~, plot should be compressed by a factor of 6 in the PC 1 direction, and the 0~-0, plot by a factor of 2.3 in the PC 2 direction. In fact, for the same PC model both plots can be combined as shown in Figure 1 for the data of Musumarra et al. Such a plot is commonly called a PC plot or eigenvector plot. it is important that the origin (displaced to the Rr mean values ~) is indicated. The data point projections (0,, 02) are represented as numbered points. The projections of the original axes (/3,, /3~) can, of course, also be represented as points in the plane; but more insight is obtained by connecting these points with the origin, which places the emphasis more on direction and less on location. As the values for r are almost always small compared to the values for 0, the/3-vectors are elongated with an arbitrary factor to allow a better visual interpretation. Now it can be seen that all TLC sytems are highly covariant on the first PC axis. This axis can therefore be thought of as representing the general separating ability common to all TLC systems investigated, which is governed by general eluent characteristics such as polarity and "solvent strength." PC 2 represents the more selective separating abilities of the TLC systems with respect to each other, due to specific molecular interactions. Musumarra et al. applied autoscaling to the original values, i.e. division of each score y,j by the total standard deviation s~ of all scores on variable i. The result is that all variables are given the same importance initially. In our opinion, this approach is only optimal when there exists no prior information about the variables. One often knows, however, how well an individual observation can be reproduced and it is clear that in TLC within the same range of RF values a higher reproducibility allows more distinctions between substances. Thus, when the analytical error is known, which is the case for the data of Musumarra et al., reproducibility scaling, i.e. division of all scores y~j by the error value for system i, seems appropriate. The resulting scores all have the same measurement error (equal to I). .4S PC 2 ~ x,m\ 11 ! "S' \\\ 4 30 .z9 ~\\ T i 7. 1S" 2. 13 "31 .26 .25 .9 27. Vl 2g" .32 -~ 48 19. .~ 2"3 2.~22 --2 33. 3S. ~7" 36. 34 ~ PC1 "3/ "~1 "39 Figure 1. PC plot of the autoscaled data. The projections of the original axes on the PC plane are elongated by an arbitrary factor of 6. 185