A computational study of spectral matching algorithms for identifying Raman spectra of polycyclic aromatic hydrocarbons Xiaofeng Tan, a,b * Xiangling Chen b and Shuzhong Song b,c To facilitate experimental development of a Raman-based chemical sensor for identifying 16 carcinogenic polycyclic aromatic hydrocarbons (PAHs) in food and selection of an appropriate spectral matching algorithm for use with the sensor, computer simulations are carried out for evaluating the performance of several spectral matching algorithms in identifying Raman spectra of target PAHs in the presence of strong interference from co-existent PAH spectra. The studied algorithms are the following: the Pearson correlation coefficient, the Euclidean distance, and the cosine distance in the spectral space and in a normalized principal component space (CD-NPCA). The simulations are performed with mixture Raman spectra synthetized from a reference Raman spectral library of the 16 PAHs in the 10001700 cm À1 fingerprint spectral range that is calculated using density functional calculations. Receiver operating curves are generated for each target PAH and spectral algorithm pair to assess the performance of the algorithms. It is shown from the study that the CD-NPCA outperforms the others in terms of speed and discriminating power for identifying the target spectra in the mixture spectra due to dimensionality reduction and an angular augmentation effect of input spectral data. This study provides a cost-effective way for designing the Raman-based sensor for PAH detection and paves the way for future experimental development of such a sensor. Copyright © 2016 John Wiley & Sons, Ltd. Additional supporting information may be found in the online version of this article at the publishers web site. Keywords: spectral matching algorithms; polycyclic aromatic hydrocarbons; surface-enhanced Raman spectroscopy; density functional theory; principal component analysis Introduction There has been significant interest in detecting and quantifying carcinogenic polycyclic aromatic hydrocarbons (PAHs) in food ever since a European Union (EU) legislation introduced in early 2005 set the maximum level for benzo[a]pyrene, [1] which may be used as a marker for the occurrence and effect of carcinogenic PAHs in food. After that 16 genotoxic PAHs have been recommended by EU as the priority PAHs to monitor in food: benz[a]anthracene, benzo[b] fluoranthene, benzo[c]fluorene, benzo[j]fluoranthene, benzo[k] fluoranthene, benzo[ghi]perylene, benzo[a]pyrene, chrysene (CRY), cyclopenta[cd]pyrene, dibenz[a,h]anthracene, dibenzo[a,e] pyrene, dibenzo[a,h]pyrene, dibenzo[a,i]pyrene, dibenzo[a,l]pyrene, indeno[1,2,3-cd]pyrene, and 5-methylchrysene (5MC). The two primary analytical methods nowadays for detection and quantification of PAHs in food are gas chromatography and high- performance liquid chromatography. [27] Although both of these methods are sufficiently sensitive for detecting trace amount of PAHs in food, both of them require laborious and time-consuming sample preparation and pre-concentration steps. These prepro- cessing steps require great deal of effort and make both methods unsuitable for routine rapid in-place detection and quantification of PAHs in food. Moreover, both methods are expensive in terms of cost of instrumentation and overheads incurred by sample prep- aration. In recent years, Raman spectroscopic methods in particular the surface-enhanced Raman spectroscopy (SERS) has shown great promise as a faster, cheaper, and more portable method for detect- ing trace PAHs because of its very high sensitivity, selectivity, and free of fluorescence interference. [826] It has been shown in some of the works that the limit of detection of some PAHs by SERS exceeds or approaches the levels set by the EU. We recently started an effort to develop a SERS-based chemical sensor for detecting and quantifying the 16 EU PAHs in food. There are two practical problems that yet need to be solved before we carry out experimental development. The first problem is the lack of a complete experimental reference Raman spectral library of the 16 EU PAHs in spite of some experimental works on Raman spectra of PAHs in liquid and solid states [2629] ; The second problem is to select a fast and highly discriminative spectral matching algorithm for identifying target PAHs in presence of strong interference from other PAH spectra giving the fact that PAHs often co-exist in groups in food. To address the aforementioned problems, we have carried out a computational effort to calculate the Raman spectra of the 16 EU PAHs and to evaluate the performance of several spectral matching algorithms in identifying target PAH spectra in presence of strong * Correspondence to: Xiaofeng Tan, Newton Scientific, Inc., 1 Bramble Way, Acton, MA 01720, USA. E-mail: x.tan@jhu.edu a Newton Scientific Inc., 1 Bramble Way, Acton, MA, 01720, USA b SIDA Science and Technology Innovation, Ltd., C5-305 666 Gaoxin Rd, District Donghu, Wuhan, Hubei, China c Dalian Institute of Chemical Physics, 457 Zhongshan Rd, Dalian, Liaoning, China J. Raman Spectrosc. 2017, 48, 113118 Copyright © 2016 John Wiley & Sons, Ltd. Research article Received: 12 April 2016 Revised: 7 June 2016 Accepted: 8 June 2016 Published online in Wiley Online Library: 8 July 2016 (wileyonlinelibrary.com) DOI 10.1002/jrs.4978 113