Optimal eigenvectors of spectral datasets: sequential selection from one set vs a collection from two sets Morteza Maali Amiri and Seyed Hossein Amirshahi* Department of Textile Engineering, Amirkabir University of Technology (Tehran Polytechnic), No. 424 Hafez Avenue, Tehran, 15914, Iran Email: hamirsha@aut.ac.ir Received: 8 August 2015; Accepted: 5 July 2016 In this paper, the inuence of spectral datasets and the method of selection of the corresponding feature vectors on the compression and reconstruction of data is scrutinised. To full this aim, two different sets of reectance data with the least spectral similarity are selected from different sets of spectral databases and the most optimal eigenvectors are chosen using different strategies. Six and 12 arrangements of eigenvectors obtained from different individual or combined databases are then used for the compression of reectance spectra of learning sets, as well as those that have not been used in extraction of eigenvectors. The validity of the desired reduced subspaces is assessed by computing the spectral errors between the actual and the reconstructed spectra of samples of learning sets. Moreover, the efciencies of designed compressed subspaces are evaluated through the numbers of out-of-range reconstructed spectra, as well as the spectral and colorimetric deviations between the actual and compressed-reconstructed reectance spectra of samples of datasets that were not employed in learning sequence. The results show that in the restricted subspaces, i.e. six-dimensional subspace, the most effective results are achieved when the reduced subspace is created from a collection of two separate sets of eigenvectors of two different datasets with the maximum degree of dissimilarity, and the reduced spaces that have been made from six eigenvectors of individual datasets lead to higher errors. Coloration Technology Society of Dyers and Colourists Introduction Compression and representation of reectance spectra of objects in reduced spaces have been the aim of several studies, and different methodologies have been proposed for the extraction of the most important features of such high-dimensional data [16]. Based on the fact that the reectances of surfaces in the visible spectrum are the smooth function of wavelengths for non-uorescent samples, many researchers have focused on the extrac- tion of the hidden patterns of such data. Cohen was possibly the rst to analyse the reectance spectra of some Munsell specimens using the technique of dimen- sionality reduction based on the eigen-decomposition method [7]. The extracted eigenvectors were then arranged based on corresponding eigenvalues to make the reduced subspace with orthonormal coordinates. While the eigenvectors of spectral data exhibit positive- negative spectral behaviours, the all-positive bases that display more feasible bases are also recommended in the literature [8,9]. Indeed, the so-called eigen-decomposition method is the most popular technique among spectral reduction meth- ods, going under the general name of matrix diagonalisa- tion. The principal component analysis (PCA) technique provides the most signicant eigenvectors based on the KarhunenLoeve transformation algorithm. A set of orthog- onal basis vectors can span a vector subspace in n- dimensional vector space, and PCA extracts the basis vectors by generating the basis of a vector subspace [4]. The number of orthogonal basis vectors is chosen in a manner to ensure an effective subspace. The dimensions of the original vector space and the reduced subspace determine the compression factor. As expected, the com- pression causes a reconstruction error in the restored data. In spectral compression practice, the error between the original and reconstructed spectra can be judged through different views and metrics such as percentage of cumu- lative energy and the root mean square (RMS) error between the actual and the reconstructed spectra. It is worth mentioning that PCA is recognised as one of the most powerful methods for minimising such inevitable errors [4]. The PCA method has been greatly implemented in different aspects of colour science, and several modica- tions have been made to improve the performance of the method in compression of spectral data and estimation of valuable spectral information from colorimetric tristimu- lus values [1014]. One of the most challenging problems associated with the application of PCA in the spectral compression procedure is the answer to the question: How many basis functions are required for an accurate representation of spectral data? Several papers have discussed the dimensionality of different spectral data- bases, and the suitable number of principal components based on the type of samples, the range of electromagnetic wavelengths, and the range of tolerable error has been recommended [3,10,15]. It is generally accepted that the reectance spectra of non-uorescent surfaces could be properly represented by employing 68 eigenvectors of the employed dataset [15]. For Munsell chips, Maloney suggested 57 eigenvectors, while Lenz et al. used six eigenvectors and Parkkinen et al. employed eight eigen- vectors [2,15,16]. Dannemiller used three and/or four eigenvectors in the compression of reectance spectra of natural objects; however, Chiao et al. concluded that three eigenvectors sufce to obtain 98% of reectance spectral variance [17,18]. In this regard, Laamanen et al. [19] stated that 20 eigenvectors are required to generate a set of general basis. © 2016 The Authors. Coloration Technology © 2016 Society of Dyers and Colourists, Color. Technol., 132,18 1 doi: 10.1111/cote.12236