Extracting Experimental Information from Large Matrixes. 1. A New Algorithm for the
Application of Matrix Rank Analysis
Ga ´ bor Peintler,*
,†
Istva ´ n Nagypa ´ l,
†
Attila Jancso ´ ,
†
Irving R. Epstein,*
,‡
and Kenneth Kustin
‡
Institute of Physical Chemistry, Attila Jo ´ zsef UniVersity, H-6701 Szeged, P.O. Box 105, Hungary, and
Department of Chemistry, Brandeis UniVersity, P.O. Box 9110, Waltham, Massachusetts 02254-9110
ReceiVed: January 8, 1997; In Final Form: July 16, 1997
X
For many, especially complex, systems, modern spectroscopic measurements can be generated as large
experimental data sets in matrix form. We report a new algorithm for the application of matrix rank analysis
to extract significant experimental information from these large matrixes. The algorithm may be used to
detect and remove erroneous rows and/or columns from the matrixes and to monitor the most significant
experimental information along the rows and/or columns of the data sets. A new method for determining the
number of absorbing species and a new concept for the treatment of experimental errors are presented. The
algorithm is illustrated on real experimental examples.
Introduction
Matrix rank analysis (MRA) of spectroscopic data is a widely
used method to determine the number of independent absorbing
species (NIAS) either in chemically reacting or in equilibrium
systems.
1-6
Its importance is increasing, because of the
widespread use of solid-state photodetectors in modern data
acquisition systems. The large matrix of data produced by such
detection systems can be a disadvantage, however, compromis-
ing MRA and causing the method to yield ambiguous results.
In this paper we examine MRA and propose a new method for
its reliable and unambiguous implementation on large matrixes.
MRA can be applied to any experimental data set, provided
that the Beer-Lambert law (or any similar linear relation) is
valid:
where the A
ij
’s are the elements of the absorption matrix (A),
absorbances normalized for unit length; n is the number of
absorbing species; p is the number of samples; and q is the
number of wavelengths. The symbol c
ik
stands for the
concentration of the kth absorbing species, which has a molar
absorption coefficient of ǫ
kj
at the jth wavelength. The meaning
of “large matrix” to characterize the system is that p . n and/
or q . n.
Wallace
1
and Ainsworth
2
pointed out that the rank of A gives
the number of absorbing species. They also examined how the
rank changes in closed systems due to stoichiometric constraints.
Since then, three different algorithms have been developed for
the determination of NIAS.
(1) The algorithm developed by Wallace and Katz
3
and by
Katakis
4
is based on Gauss-Jordan elimination with full
pivoting.
7
The result of the calculation is a vector P, the ith
element of which is the largest valuesin the sense of absolute
valuessof the residual of A after the (i - 1)th elimination step.
The number of nonzero elements of the vector calculated this
way gives the NIAS.
(2) The method developed by Hugus and El-Awady
5
is based
on the eigenvalues of AA ) A
T
× A (if p g q) or AA ) A ×
A
T
(if p e q).
8
The determination of NIAS is therefore the
same problem as solving the AAx ) λx equation for all λ’s
and finding the nonzero eigenvalues.
(3) The third method is essentially a graphic, linearized
representation of the first one, developed by Coleman et al.
6
This nomographical technique is not as accurate as digital
computation, and the method is not further analyzed by us.
These three procedures are mathematically equivalent. Be-
cause of unavoidable experimental errors, however, the rank
calculated is always larger than n. When MRA is applied, the
real problem is to decide which elements of P or which
eigenvalues of AA are sufficiently small to discard them as data
due to experimental errors. There are several statistical
procedures developed to solve this problem:
One possibility is a calculation of standard errors for either
elements of P or eigenvalues of AA. Both procedures require
an initial estimation of the standard error of the measured data.
The reproducibility of a measurementswhich is 0.002-0.003
absorbance unit (AU) with a modern diode-array spectrometers
helps us to estimate it. The distribution of these errors is
generally assumed to be Gaussian.
Wallace and Katz
3
calculated the propagation of errors
9
in
parallel with the process of elimination, which handles random
errors adequately. Katakis
4
took into account computational
as well as experimental errors. Hugus and El-Awady
5
intro-
duced a relation between the standard errors of eigenvalues and
the original experimental errors. In each procedure, the ap-
propriate NIAS can be estimated by comparing the values to
their standard errors.
Hugus and El-Awady
5
used the
2
test.
7
They also used the
differences between the calculated and measured absorbances
(residuals). They counted as significant those values that were
larger than 3 times the estimated error.
In connection with factor analysis, Malinowski and Howery
10
summarized statistical criteria found in the literature. Since the
goal of factor analysis is very close to that of MRA, these criteria
can also be applied for MRA.
Despite these efforts, applying statistical criteria is still the
most uncertain part of MRA. The conclusion from any error
treatment is highly dependent on the accuracy of the initial error
estimation.
11
Different statistical criteria may lead to different
†
Attila Jozsef University.
‡
Brandeis University.
X
Abstract published in AdVance ACS Abstracts, September 15, 1997.
A
ij
)
∑
k)1
n
c
ik
∈
kj
, i ∈ {1,...,p}, j ∈ {1,...,q} (1)
8013 J. Phys. Chem. A 1997, 101, 8013-8020
S1089-5639(97)00136-9 CCC: $14.00 © 1997 American Chemical Society