Optimal eigenvectors of spectral datasets: sequential selection from one set vs a collection from two sets Morteza Maali Amiri and Seyed Hossein Amirshahi* Department of Textile Engineering, Amirkabir University of Technology (Tehran Polytechnic), No. 424 Hafez Avenue, Tehran, 15914, Iran Email: hamirsha@aut.ac.ir Received: 8 August 2015; Accepted: 5 July 2016 In this paper, the inﬂuence of spectral datasets and the method of selection of the corresponding feature vectors on the compression and reconstruction of data is scrutinised. To fulﬁl this aim, two different sets of reﬂectance data with the least spectral similarity are selected from different sets of spectral databases and the most optimal eigenvectors are chosen using different strategies. Six and 12 arrangements of eigenvectors obtained from different individual or combined databases are then used for the compression of reﬂectance spectra of learning sets, as well as those that have not been used in extraction of eigenvectors. The validity of the desired reduced subspaces is assessed by computing the spectral errors between the actual and the reconstructed spectra of samples of learning sets. Moreover, the efﬁciencies of designed compressed subspaces are evaluated through the numbers of out-of-range reconstructed spectra, as well as the spectral and colorimetric deviations between the actual and compressed-reconstructed reﬂectance spectra of samples of datasets that were not employed in learning sequence. The results show that in the restricted subspaces, i.e. six-dimensional subspace, the most effective results are achieved when the reduced subspace is created from a collection of two separate sets of eigenvectors of two different datasets with the maximum degree of dissimilarity, and the reduced spaces that have been made from six eigenvectors of individual datasets lead to higher errors. Coloration Technology Society of Dyers and Colourists Introduction Compression and representation of reﬂectance spectra of objects in reduced spaces have been the aim of several studies, and different methodologies have been proposed for the extraction of the most important features of such high-dimensional data [1–6]. Based on the fact that the reﬂectances of surfaces in the visible spectrum are the smooth function of wavelengths for non-ﬂuorescent samples, many researchers have focused on the extrac- tion of the hidden patterns of such data. Cohen was possibly the ﬁrst to analyse the reﬂectance spectra of some Munsell specimens using the technique of dimen- sionality reduction based on the eigen-decomposition method [7]. The extracted eigenvectors were then arranged based on corresponding eigenvalues to make the reduced subspace with orthonormal coordinates. While the eigenvectors of spectral data exhibit positive- negative spectral behaviours, the all-positive bases that display more feasible bases are also recommended in the literature [8,9]. Indeed, the so-called eigen-decomposition method is the most popular technique among spectral reduction meth- ods, going under the general name of matrix diagonalisa- tion. The principal component analysis (PCA) technique provides the most signiﬁcant eigenvectors based on the Karhunen–L oeve transformation algorithm. A set of orthog- onal basis vectors can span a vector subspace in n- dimensional vector space, and PCA extracts the basis vectors by generating the basis of a vector subspace [4]. The number of orthogonal basis vectors is chosen in a manner to ensure an effective subspace. The dimensions of the original vector space and the reduced subspace determine the compression factor. As expected, the com- pression causes a reconstruction error in the restored data. In spectral compression practice, the error between the original and reconstructed spectra can be judged through different views and metrics such as percentage of cumu- lative energy and the root mean square (RMS) error between the actual and the reconstructed spectra. It is worth mentioning that PCA is recognised as one of the most powerful methods for minimising such inevitable errors [4]. The PCA method has been greatly implemented in different aspects of colour science, and several modiﬁca- tions have been made to improve the performance of the method in compression of spectral data and estimation of valuable spectral information from colorimetric tristimu- lus values [10–14]. One of the most challenging problems associated with the application of PCA in the spectral compression procedure is the answer to the question: How many basis functions are required for an accurate representation of spectral data? Several papers have discussed the dimensionality of different spectral data- bases, and the suitable number of principal components based on the type of samples, the range of electromagnetic wavelengths, and the range of tolerable error has been recommended [3,10,15]. It is generally accepted that the reﬂectance spectra of non-ﬂuorescent surfaces could be properly represented by employing 6–8 eigenvectors of the employed dataset [15]. For Munsell chips, Maloney suggested 5–7 eigenvectors, while Lenz et al. used six eigenvectors and Parkkinen et al. employed eight eigen- vectors [2,15,16]. Dannemiller used three and/or four eigenvectors in the compression of reﬂectance spectra of natural objects; however, Chiao et al. concluded that three eigenvectors sufﬁce to obtain 98% of reﬂectance spectral variance [17,18]. In this regard, Laamanen et al. [19] stated that 20 eigenvectors are required to generate a set of general basis. © 2016 The Authors. Coloration Technology © 2016 Society of Dyers and Colourists, Color. Technol., 132,1–8 1 doi: 10.1111/cote.12236