Optimal eigenvectors of spectral datasets:
sequential selection from one set vs a
collection from two sets
Morteza Maali Amiri and Seyed Hossein Amirshahi*
Department of Textile Engineering, Amirkabir University of Technology (Tehran Polytechnic),
No. 424 Hafez Avenue, Tehran, 15914, Iran
Email: hamirsha@aut.ac.ir
Received: 8 August 2015; Accepted: 5 July 2016
In this paper, the influence of spectral datasets and the method of selection of the corresponding feature
vectors on the compression and reconstruction of data is scrutinised. To fulfil this aim, two different sets of
reflectance data with the least spectral similarity are selected from different sets of spectral databases and
the most optimal eigenvectors are chosen using different strategies. Six and 12 arrangements of
eigenvectors obtained from different individual or combined databases are then used for the compression
of reflectance spectra of learning sets, as well as those that have not been used in extraction of eigenvectors.
The validity of the desired reduced subspaces is assessed by computing the spectral errors between the
actual and the reconstructed spectra of samples of learning sets. Moreover, the efficiencies of designed
compressed subspaces are evaluated through the numbers of out-of-range reconstructed spectra, as well as
the spectral and colorimetric deviations between the actual and compressed-reconstructed reflectance
spectra of samples of datasets that were not employed in learning sequence. The results show that in the
restricted subspaces, i.e. six-dimensional subspace, the most effective results are achieved when the
reduced subspace is created from a collection of two separate sets of eigenvectors of two different datasets
with the maximum degree of dissimilarity, and the reduced spaces that have been made from six
eigenvectors of individual datasets lead to higher errors.
Coloration
Technology
Society of Dyers and Colourists
Introduction
Compression and representation of reflectance spectra of
objects in reduced spaces have been the aim of several
studies, and different methodologies have been proposed
for the extraction of the most important features of such
high-dimensional data [1–6]. Based on the fact that the
reflectances of surfaces in the visible spectrum are the
smooth function of wavelengths for non-fluorescent
samples, many researchers have focused on the extrac-
tion of the hidden patterns of such data. Cohen was
possibly the first to analyse the reflectance spectra of
some Munsell specimens using the technique of dimen-
sionality reduction based on the eigen-decomposition
method [7]. The extracted eigenvectors were then
arranged based on corresponding eigenvalues to make
the reduced subspace with orthonormal coordinates.
While the eigenvectors of spectral data exhibit positive-
negative spectral behaviours, the all-positive bases that
display more feasible bases are also recommended in the
literature [8,9].
Indeed, the so-called eigen-decomposition method is the
most popular technique among spectral reduction meth-
ods, going under the general name of matrix diagonalisa-
tion. The principal component analysis (PCA) technique
provides the most significant eigenvectors based on the
Karhunen–L oeve transformation algorithm. A set of orthog-
onal basis vectors can span a vector subspace in n-
dimensional vector space, and PCA extracts the basis
vectors by generating the basis of a vector subspace [4].
The number of orthogonal basis vectors is chosen in a
manner to ensure an effective subspace. The dimensions
of the original vector space and the reduced subspace
determine the compression factor. As expected, the com-
pression causes a reconstruction error in the restored data.
In spectral compression practice, the error between the
original and reconstructed spectra can be judged through
different views and metrics such as percentage of cumu-
lative energy and the root mean square (RMS) error
between the actual and the reconstructed spectra. It is
worth mentioning that PCA is recognised as one of the
most powerful methods for minimising such inevitable
errors [4].
The PCA method has been greatly implemented in
different aspects of colour science, and several modifica-
tions have been made to improve the performance of the
method in compression of spectral data and estimation of
valuable spectral information from colorimetric tristimu-
lus values [10–14]. One of the most challenging problems
associated with the application of PCA in the spectral
compression procedure is the answer to the question:
How many basis functions are required for an accurate
representation of spectral data? Several papers have
discussed the dimensionality of different spectral data-
bases, and the suitable number of principal components
based on the type of samples, the range of electromagnetic
wavelengths, and the range of tolerable error has been
recommended [3,10,15]. It is generally accepted that the
reflectance spectra of non-fluorescent surfaces could be
properly represented by employing 6–8 eigenvectors of
the employed dataset [15]. For Munsell chips, Maloney
suggested 5–7 eigenvectors, while Lenz et al. used six
eigenvectors and Parkkinen et al. employed eight eigen-
vectors [2,15,16]. Dannemiller used three and/or four
eigenvectors in the compression of reflectance spectra of
natural objects; however, Chiao et al. concluded that three
eigenvectors suffice to obtain 98% of reflectance spectral
variance [17,18]. In this regard, Laamanen et al. [19]
stated that 20 eigenvectors are required to generate a set
of general basis.
© 2016 The Authors. Coloration Technology © 2016 Society of Dyers and Colourists, Color. Technol., 132,1–8 1
doi: 10.1111/cote.12236