Extracting Experimental Information from Large Matrixes. 1. A New Algorithm for the Application of Matrix Rank Analysis Ga ´ bor Peintler,* ,† Istva ´ n Nagypa ´ l, † Attila Jancso ´ , † Irving R. Epstein,* ,‡ and Kenneth Kustin ‡ Institute of Physical Chemistry, Attila Jo ´ zsef UniVersity, H-6701 Szeged, P.O. Box 105, Hungary, and Department of Chemistry, Brandeis UniVersity, P.O. Box 9110, Waltham, Massachusetts 02254-9110 ReceiVed: January 8, 1997; In Final Form: July 16, 1997 X For many, especially complex, systems, modern spectroscopic measurements can be generated as large experimental data sets in matrix form. We report a new algorithm for the application of matrix rank analysis to extract significant experimental information from these large matrixes. The algorithm may be used to detect and remove erroneous rows and/or columns from the matrixes and to monitor the most significant experimental information along the rows and/or columns of the data sets. A new method for determining the number of absorbing species and a new concept for the treatment of experimental errors are presented. The algorithm is illustrated on real experimental examples. Introduction Matrix rank analysis (MRA) of spectroscopic data is a widely used method to determine the number of independent absorbing species (NIAS) either in chemically reacting or in equilibrium systems. 1-6 Its importance is increasing, because of the widespread use of solid-state photodetectors in modern data acquisition systems. The large matrix of data produced by such detection systems can be a disadvantage, however, compromis- ing MRA and causing the method to yield ambiguous results. In this paper we examine MRA and propose a new method for its reliable and unambiguous implementation on large matrixes. MRA can be applied to any experimental data set, provided that the Beer-Lambert law (or any similar linear relation) is valid: where the A ij ’s are the elements of the absorption matrix (A), absorbances normalized for unit length; n is the number of absorbing species; p is the number of samples; and q is the number of wavelengths. The symbol c ik stands for the concentration of the kth absorbing species, which has a molar absorption coefficient of ǫ kj at the jth wavelength. The meaning of “large matrix” to characterize the system is that p . n and/ or q . n. Wallace 1 and Ainsworth 2 pointed out that the rank of A gives the number of absorbing species. They also examined how the rank changes in closed systems due to stoichiometric constraints. Since then, three different algorithms have been developed for the determination of NIAS. (1) The algorithm developed by Wallace and Katz 3 and by Katakis 4 is based on Gauss-Jordan elimination with full pivoting. 7 The result of the calculation is a vector P, the ith element of which is the largest valuesin the sense of absolute valuessof the residual of A after the (i - 1)th elimination step. The number of nonzero elements of the vector calculated this way gives the NIAS. (2) The method developed by Hugus and El-Awady 5 is based on the eigenvalues of AA ) A T × A (if p g q) or AA ) A × A T (if p e q). 8 The determination of NIAS is therefore the same problem as solving the AAx ) λx equation for all λ’s and finding the nonzero eigenvalues. (3) The third method is essentially a graphic, linearized representation of the first one, developed by Coleman et al. 6 This nomographical technique is not as accurate as digital computation, and the method is not further analyzed by us. These three procedures are mathematically equivalent. Be- cause of unavoidable experimental errors, however, the rank calculated is always larger than n. When MRA is applied, the real problem is to decide which elements of P or which eigenvalues of AA are sufficiently small to discard them as data due to experimental errors. There are several statistical procedures developed to solve this problem: One possibility is a calculation of standard errors for either elements of P or eigenvalues of AA. Both procedures require an initial estimation of the standard error of the measured data. The reproducibility of a measurementswhich is 0.002-0.003 absorbance unit (AU) with a modern diode-array spectrometers helps us to estimate it. The distribution of these errors is generally assumed to be Gaussian. Wallace and Katz 3 calculated the propagation of errors 9 in parallel with the process of elimination, which handles random errors adequately. Katakis 4 took into account computational as well as experimental errors. Hugus and El-Awady 5 intro- duced a relation between the standard errors of eigenvalues and the original experimental errors. In each procedure, the ap- propriate NIAS can be estimated by comparing the values to their standard errors. Hugus and El-Awady 5 used the  2 test. 7 They also used the differences between the calculated and measured absorbances (residuals). They counted as significant those values that were larger than 3 times the estimated error. In connection with factor analysis, Malinowski and Howery 10 summarized statistical criteria found in the literature. Since the goal of factor analysis is very close to that of MRA, these criteria can also be applied for MRA. Despite these efforts, applying statistical criteria is still the most uncertain part of MRA. The conclusion from any error treatment is highly dependent on the accuracy of the initial error estimation. 11 Different statistical criteria may lead to different † Attila Jozsef University. ‡ Brandeis University. X Abstract published in AdVance ACS Abstracts, September 15, 1997. A ij ) ∑ k)1 n c ik ∈ kj , i ∈ {1,...,p}, j ∈ {1,...,q} (1) 8013 J. Phys. Chem. A 1997, 101, 8013-8020 S1089-5639(97)00136-9 CCC: $14.00 © 1997 American Chemical Society