For Peer Review Feature selection and rapid characterization of bloodstains on different substrates Journal: Applied Spectroscopy Manuscript ID Draft Manuscript Type: Submitted Manuscript Date Submitted by the Author: n/a Complete List of Authors: Gautam, Rekha; Vanderbilt University, Biomedical Engineering Peoples, Deandra; Vanderbilt University, Biomedical Engineering Jansen, Kiana; Vanderbilt University, Biomedical Engineering O'Connor, Maggie; Vanderbilt University, Biomedical Engineering Thomas, Giju; Vanderbilt University, Biomedical Engineering Vanga, Sandeep; Episode Solutions LLC Pence, Isaac; Vanderbilt University, Biomedical Engineering; Massachusetts General Hospital Mahadevan-Jansen, Anita ; Vanderbilt University, Biomedical Engineering Manuscript Keywords: Machine Learning, Forensic, Raman Spectroscopy, LASSO Regression Abstract: Establishing the precise timeline of a crime can be challenging due to the need for rapid and non-destructive analysis of body fluids encountered at crime scenes. Raman spectroscopy has demonstrated great potential in forensic science as it provides direct information about the structural and molecular changes without the need for processing or extracting samples. However, its current applicability is limited to pure body fluids as signals from the substrate underlying these fluids greatly influences the current models used for age estimation. In this study, we utilized Raman spectroscopy to identify selective spectral markers that delineates the bloodstain age in presence of interfering signal from the substrate. Least absolute shrinkage and selection operator (LASSO) regression was employed to guide feature selection process in the presence of interference from substrates to accurately predict bloodstains age. Substrate specific regression models guided by automated feature selection algorithm depicted low values of predictive root-mean-squared-error (0.207, 0.204, 0.222) and high R2 (0.924, 0.926, 0.913) on test data consisting of blood spectra on floor-tile, facial-tissue and linoleum substrates respectively. This framework of automated feature selection algorithm relies entirely on pure bloodstains spectra to train substrate specific models for estimating the age of composite (blood on substrate) spectra. The model can thus be easily applied to any new composite spectra and highly scalable to new environments. This study demonstrates that Raman spectroscopy coupled with LASSO can serve as a reliable and nondestructive technique to determine age of bloodstains on any surface while aiding forensic investigations in real-world scenarios. https://mc.manuscriptcentral.com/asp Applied Spectroscopy