*correspondence: luigi.dimaggio@polito.it Randomized eigen-spectrograms extraction for an effective fault diagnosis of bearings Eugenio Brusa ID 1 , Cristiana Delprete ID 1 , and Luigi Gianpio Di Maggio ID 1 1 Department of Mechanical and Aerospace Engineering (DIMEAS) Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129 Torino, Italy Abstract The Intelligent Fault Diagnosis of rotating machinery proposes some captivating challenges in light of the imminent big data era. Large amounts of data are expected to populate the Internet of Things (IoT) diagnostic services. Consequently, today’s deep learning strategies are evolving towards effective approaches such as transfer learning to uncover hidden paths in extensive vibration data. However, this field is characterized by several open issues. Models’ interpretation is still buried under the foundations of data driven science, thus requiring attention to the development of new opportunities also for machine learning theories. This study proposes a diagnosis model, based on intelligent spectrogram recognition, via image processing. The novel approach is embodied by the introduction of the eigen-spectrograms and randomized linear algebra in fault diagnosis. The eigen-spectrograms hierarchically display inherent structures underlying spectrogram images. Also, different combinations of eigen-spectrograms are expected to describe multiple machine health states. Randomized algebra and eigen-spectrograms enable the construction of a significant feature space, which nonetheless emerges as a viable device to explore models’ interpretations. The computational efficiency of randomized approaches further collocates this methodology in the big data perspective and provides new reading keys of well-established statistical learning theories, such as the Support Vector Machine (SVM). The conjunction of randomized algebra and Support Vector Machine for spectrogram recognition shows to be extremely accurate and efficient as compared to state of the art results and transfer learning strategies. Keywords Intelligent Fault Diagnosis · Machine Learning · Rolling Bearings · SVM · Structural Health Monitoring 1 Introduction The growing complexity of industrial rotating systems has found in predictive maintenance strategies and condition monitoring techniques some key assets to enhance the production perfor- mance, by reducing maintenance costs, machine failures and repair downtimes [1]. Actually, the problem of developing some robust monitor- ing techniques for condition-based maintenance has gradually flanked the crucial issue of building reliable models able to esti- mate wear and fatigue of such complex systems, for scheduled maintenance. In particular, Rolling Element Bearings (REB) are among the most critical components in industrial rotating machinery, since their durability suffers of a wide statistical dispersion [2], which is one of the prominent aspects that makes time-based maintenance approaches unadvisable. Moreover, the REB performance is directly influenced by the interaction with the specific system in which they are included. Thus, the Re- maining Useful Life (RUL) assessment and the machine health management based on current condition [3, 4] are clearly more reliable than wear and fatigue models of scheduled maintenance. Some remarkable scientific efforts carried out in the last decades have brought to light many signal processing tools for REB vibration analysis, relying on physics-based speculations. For instance, it is worth mentioning the great attention that has been paid, since the 1980s, to amplitude demodulation by means of the envelope analysis [5, 6, 7, 8, 9], whose effectiveness as a diagnostic tool was widely proven over the past decades [10, 11, 12, 13, 14, 15, 16, 17, 18]. Also, no less research was inspired by the consequent issues which have arisen for the choice of optimal demodulation bands in non-stationary signals [19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32]. In parallel to the physics-based paradigms, which have led sci- ence and engineering from its early days, the past decade has seen the rapid development of data driven science in many en- gineering fields, also due to the paramount thrust of computer companies, which developed ad-hoc and high-level program- ming libraries and cloud computational services, for machine learning and deep learning tasks. Experts’ knowledge of physi- cal phenomena underpinning human-inferred models is replaced in these approaches by learning algorithms, which apprehend from training data, shaping themselves based on some received inputs. Essentially, the supervised learning process acts by op- timizing the parameters of specific classifiers and regressors to minimize errors occurring between model results and real outcomes, as long as these are known. Nevertheless, minimizing errors in this field is not enough. Indeed, the risk of pursuing models that minimize errors on training data, but overfit these latter is quite tangible when handling a large amount of sam- ples, especially in deep learning applications. For example, one of the undesirable manifestation of overfitting appears when models are unable to generalize the learned knowledge to new observations, thus showing lower accuracies once applied to test datasets. That is the well-known problem of generalization affecting the Artificial Intelligence’s (AI) models. However, several strategies such as cross-validation and the adoption of metrics based on information criteria were developed to face this fundamental issue [33, 34, 35]. arXiv:2103.03608v1 [eess.SP] 5 Mar 2021