A Fast Clustering Based Evolutionary Algorithm for Super-Large-Scale Sparse Multi-Objective Optimization Ye Tian, Yuandong Feng, Xingyi Zhang, Senior Member, IEEE, and Changyin Sun Abstract—During the last three decades, evolutionary algo- rithms (EAs) have shown superiority in solving complex opti- mization problems, especially those with multiple objectives and non-differentiable landscapes. However, due to the stochastic search strategies, the performance of most EAs deteriorates dras- tically when handling a large number of decision variables. To tackle the curse of dimensionality, this work proposes an efficient EA for solving super-large-scale multi-objective optimization problems with sparse optimal solutions. The proposed algorithm estimates the sparse distribution of optimal solutions by optimiz- ing a binary vector for each solution, and provides a fast cluster- ing method to highly reduce the dimensionality of the search space. More importantly, all the operations related to the decision variables only contain several matrix calculations, which can be directly accelerated by GPUs. While existing EAs are capable of handling fewer than 10 000 real variables, the proposed algo- rithm is verified to be effective in handling 1 000 000 real vari- ables. Furthermore, since the proposed algorithm handles the large number of variables via accelerated matrix calculations, its runtime can be reduced to less than 10% of the runtime of exist- ing EAs. Index Terms—Evolutionary computation, fast clustering, sparse multi-objective optimization, super-large-scale optimization. I. Introduction M ANY scientific and engineering fields such as artificial intelligence [1], data mining [2], software engineering [3], bioinformatics [4], and economics [5] include complex optimization problems with multiple conflicting objectives and a large number of decision variables, which are collec- tively known as large-scale multi-objective optimization prob- lems (LMOPs). These problems are generally NP-hard with complicated landscapes, and have global optima which are hard to obtain by exact methods; by contrast, multi-objective evolutionary algorithms (MOEAs) can find quasi-optimal solutions for LMOPs in polynomial time [6]. Since the first MOEA was suggested for solving LMOPs in 2013 [7], a number of MOEAs have been proposed to handle the high-dimensional search space using various techniques, including decision variable grouping, decision variable analy- sis, and decision space reduction. The decision variable group- ing based MOEAs randomly divide the decision variables into several groups and optimize each group of decision variables alternately [7], [8], so that the LMOP can be split into small- scale problems and solved easily. Since the random grouping strategy may divide two interacting decision variables into dif- ferent groups and drive the population into local optima, the decision variable analysis based MOEAs divide the decision variables according to their correlations to the other decision variables and the objective functions [9], [10], which can improve both population diversity and the probability of find- ing global optima. The decision space reduction based MOEAs facilitate the solving of LMOPs by reducing the dimensions of the decision space, with the assistance of prob- lem transformation [11] and dimensionality reduction [12] techniques. While conventional MOEAs are effective for problems with less than 100 variables [13], the MOEAs tailored for LMOPs have shown promising performance on problems with 1000 to 10 000 variables [10], [12], [14]. Nevertheless, they are not applicable to the problems with much more variables, which are termed super-large-scale multi-objective optimization problems (SLMOPs) in this work. SLMOPs widely exist in many research fields, such as large-scale feature selection tasks with about 45 000 candidate features [15], deep neural network training tasks with more than 150 000 weights [16], and time-varying ratio error estimation tasks with up to 300 000 variables [14]. For decision variable grouping based MOEAs, if the 300 000 decision variables of an SLMOP are randomly divided into 100 groups, each group will contain 3000 decision variables that still form an LMOP; if the deci- sion variables are divided into many more groups, the conver- gence speed will highly deteriorate as the large number of Manuscript received August 19, 2021; revised September 3, 2021; accepted September 30, 2021. This work was supported in part by the National Key Research and Development Program of China (2018AAA0100100), the National Natural Science Foundation of China (61822301, 61876123, 61906001), the Collaborative Innovation Program of Universities in Anhui Province (GXXT-2020-051), the Hong Kong Scholars Program (XJ2019035), and Anhui Provincial Natural Science Foundation (1908085QF271). Recommended by Associate Editor Shangce Gao. (Corresponding author: Xingyi Zhang.) Citation: Y. Tian, Y. D. Feng, X. Y. Zhang, and C. Y. Sun, “A fast clustering based evolutionary algorithm for super-large-scale sparse multi- objective optimization,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 4, pp. 1048–1063, Apr. 2023. Y. Tian is with the Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei 230601, China (e-mail: field910921@gmail.com). Y. D. Feng is with the School of Computer Science and Technology, Anhui University, Hefei 230601, China (e-mail: yuandongfeng@stu.ahu.edu.cn). X. Y. Zhang is with the School of Artificial Intelligence, Anhui University, Hefei 230601, China (e-mail: xyzhanghust@gmail.com). C. Y. Sun is with the School of Automation, Southeast University, Nanjing 210096, China (e-mail: cysun@seu.edu.cn). Digital Object Identifier 10.1109/JAS.2022.105437 1048 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 10, NO. 4, APRIL 2023