Determining the Composition of Bronze Alloys by means of High-dimensional Feature Selection and Artificial Neural Networks Eleonora D’Andrea and Beatrice Lazzerini Department of Information Engineering University of Pisa Largo Lucio Lazzarino 1, 56122 Pisa, Italy eleonora.dandrea@for.unipi.it, b.lazzerini@iet.unipi.it Vincenzo Palleschi and Stefano Pagnotta Applied and Laser Spectroscopy Laboratory, Institute of Chemistry of Organometallic Compounds Research Area of CNR Via G. Moruzzi 1, 56124 Pisa, Italy vincenzo.palleschi@cnr.it, stefanopagnotta@yahoo.it Abstract—In this paper we exploit Artificial Neural Networks (ANN) to model the functional relationship between LIBS spectra and the corresponding composition of bronze alloys, expressed in terms of concentrations of the four elements constituting the alloy. The typical approach to Laser-Induced Breakdown Spectroscopy (LIBS) quantitative analysis uses calibration curves, suitably built based on appropriate reference standards. More recently, statistical methods relying on the principles of ANNs are increasingly used. In particular, an ANN can be used for a preliminary exploration of the LIBS spectra in order to find out the most significant areas of the spectrum, which will be used by another ANN dedicated to the calibration. In this paper we will show that the use of ANNs to deal with LIBS spectra provides a viable, fast and robust method for LIBS quantitative analysis. Actually, this approach requires a relatively limited number of reference samples for the training of the network, with respect to the current approaches, and can automatically analyze a large number of samples. Keywords—artificial neural networks; feature selection; high- dimensional data; laser-induced breakdown spectroscopy. I. INTRODUCTION The problem of determining the composition of alloys, soils, and materials measured by means of Laser-Induced Breakdown Spectroscopy (LIBS) quantitative analysis is frequently tackled in the literature with different techniques, e.g., Artificial Neural Networks (ANNs), statistical analysis, and the use of reference standards [1]-[4]. In this paper we propose an ANN-based methodological approach for determining the composition of a set of physical samples of modern bronze alloy previously measured by means of LIBS analysis. LIBS analysis is a well-known spectroscopic technique for the identification of material and chemical composition of several kinds of samples (e.g., soils, rocks, metal alloys). It is an optimal alternative to other spectroscopic, mass spectrometric, or X-ray techniques, given that it is non- destructive, rapid, it can be applied on samples of arbitrary shapes in solid, liquid or gas state, and it is applicable in situ, thereby avoiding sampling and sample preparation. Moreover, quantitative LIBS analysis is ideal for the accurate compositional characterization of the materials analyzed, since it leads to absolute concentration values for each chemical element [5]. The LIBS technique allows obtaining qualitative and quantitative information about the composition of samples. During the LIBS analysis process (Fig. 1), a high power laser beam is focused on the surface of the sample, causing the ablation of a very small amount (of the order of ng or pg) of mass from the sample, due to a rapid rise in the temperature of the locally heated region. The ablated mass, by interacting with the laser pulse, causes the formation of a high-temperature plasma on the surface of the sample. Subsequently, when the laser pulse ends, the plasma expands and cools, by allowing the observation of the characteristic atomic emission lines of the elements. More in detail, the plasma emits light with discrete spectral peaks which are collected by means of a spectrograph, and analyzed to extract the chemical composition of the sample, given that each chemical element present in the periodic table is associated with unique LIBS spectral peaks. The feasibility of LIBS as an analytical technique has been demonstrated by a number of applications on solid, liquid and gas samples [6]-[8]. However, the wide literature accumulated in the last years has also demonstrated the main problems related to LIBS analysis, i.e. its limited sensitivity and, most of all, the poor precision of the technique, which also affects the global accuracy of the results [9]. The usual approach to LIBS quantitative analysis is based on the use of calibration curves, suitably built using appropriate reference standards. More precisely, the calibration curves (emission line intensity vs. concentration of the corresponding element) are built using a few samples with known composition, and then are used to determine the composition of unknown samples. The main drawbacks of this approach are the need for calibration samples similar to the unknown ones, and constant experimental conditions. An alternative method, which overcomes the first problem, is called Calibration-Free LIBS (CF-LIBS) [10]. In this case, the composition of the samples is determined by analyzing the LIBS spectrum along with the plasma temperature and the electron number density, and by making strict experimental condition assumptions. The drawbacks are long time analysis and the need for detection of one line of