Semi-supervised Corrupted Face Classification via Graph Learning Yisheng Zhong, Ao Li {951208709@qq.com(Yisheng Zhong), dargonbuy@126.com(Ao Li)} School of Computer Science and Technology, Harbin University of Science and Technology, Harbin China Abstract— Semi-supervised learning aims to training model with both of labeled and unlabeled data by exploring the relationships among them. Graph-based semi-supervised learning is an classical representative method that learning the class indicator matrix by propagating the similarity within the well designed graph constructed by data. However, for face data, they often happen to pixel missing or occlusion, which will degrade the graph learning performance, leading awful semi-supervised classification results. To address this problem, a novel semi-supervised corrupted face classification method via graph learning is proposed, in which the dynamic graph is learned by the completion face data recovered from the low-rank subspace. In our proposed method, the robust data representation and graph learning are implemented alternatively to obtain the overall optimal solutions. Experimental results demonstrate that our proposed method outperforms comparison methods on both of classification accuracy and robustness. Keywords: Semi-supervised face classification, Graph learning, Self- representation model, Low-rank constraint. I. Introduction In the face of massive high-dimensional data, how to conduct effective data analysis and processing has become a major problem in machine learning and other fields [1]. In recent years, studying map automatically is hot, which is one of the important methods for adaptive neighbor method. We construct a matrix by setting each data point as the probability that the current data can be used as the neighborhood of another data, and this probability is used as the similarity between the two data points [2], which does not require similarity measures sensitive to noise and outliers [3], so the result obtained is of high precision. However, in the process of graph learning, there is noise or interference in the original data, so the graph obtained may be inaccurate or suboptimal, and cannot accurately describe the true relationship between the data. In order to solve this problem, Zhao Kang[4] proposed a new robust graph learning scheme based on the adaptive neighbor method, which decomposes the original data into a low-rank matrix D ("clean data") and a sparse matrix E ( "Noise / Error"). They can then use the adaptive neighbor method to build graphs on clean data D. So they can remove image disturbances and learn to map at the same time. However, when there is occlusion or partial absence of data, the results of this method are not reliable or even the learning results are not available. The Laplace Score (LS) method proposed in [5] introduced the analysis of the local structure of the data based on MaxVar. But these two methods only consider the characteristics of the data itself, ignore the correlation between the characteristics, and cannot guarantee the optimal feature subset. Inspired by the self-similarity of images, Zhu [6] believe that images should not only have self-similarity in structure, but also have the ability to express themselves in terms of feature expression. They proposed an unsupervised feature selection method based on regularized self-representation by constructing a regularized self-representation (RSR) model. This method constructs a self-representation model by assuming that each feature in the high- dimensional data can be expressed as a linear combination of other features, and removes insignificant features by adding !,# norm constraints to the feature's weight matrix W. We borrowed the method of regularized self-representation to solve the problem of large-scale image interference or missing. In the rest of this article, we will introduce graph learning and our multi-source robust graph learning technique in the second section. Details of the algorithm are then given in Section III. The fourth part evaluates the clustering task experimentally. The fifth part discusses semi-supervised applications and compares the data recovery effect of the sixth part. EAI MOBIMEDIA 2020, August 27-28, Harbin, People's Republic of China Copyright © 2020 EAI DOI 10.4108/eai.27-8-2020.2296556