Stochastic Coordinate Descent for Nonsmooth Convex Optimization Qi Deng Department of CISE, University of Florida qdeng@cise.ufl.edu Jeffrey Ho Department of CISE, University of Florida jho.jeffrey@gmail.com Anand Rangarajan ∗ Department of CISE, University of Florida anand@cise.ufl.edu Abstract Stochastic coordinate descent, due to its practicality and efﬁciency, is increasingly popular in machine learning and signal processing communities as it has proven successful in several large-scale optimization problems , such as l1 regularized regression, Support Vector Machine, to name a few. In this paper, we consider a composite problem where the nonsmoothness has a general structure that is compatible with a coordinate partition, and we solve the nonsmooth optimization problem using a sequence of smooth approximations. In particular, we extend Nesterov’s estimate sequence technique by incorporating smooth approximation and coordinate randomization. By studying the effect of smooth approximation, we develop rules for selecting smooth approximations that not only guarantee the algorithm’s convergence but also provide better convergence rate than the subgra- dient black-box model. Speciﬁcally, we obtain the convergence rate of O ( 1 K ) for nonsmooth convex functions and O ( 1 K 2 ) for strongly convex functions. The convergence analysis developed in this paper and the results, to the best of our knowledge, have not been shown previously for stochastic coordinate descent. 1 Introduction In this paper, we are interested in applying stochastic coordinate descent to solve the optimization of a (convex) composite function f (x)= h(x)+ g(x) where the objective f (x) contains two parts: h(x) is differentiable with Lipschitz continuous gradient and g(x) is a general convex nonsmooth function (satisfying conditions speciﬁed later). Although this problem has been investigated quite extensively in the literature, none of the earlier work has studied the problem from the perspective of stochastic coordinate descent. For huge-scale optimization problems that are increasingly common in machine learning and other application domains, coordinate descent is often the only available method due to its practicality, and accordingly, it is experiencing a resurgence of interest recently, e.g., [Nes10]. In this context, we propose an accelerated coordinate-descent scheme for minimizing the convex composite function f (x) based on the idea of sequential coordinate smooth approxima- tion. Speciﬁcally, we introduce a sequence g μ k (x) of increasingly-accurate smooth approximations of g(x) such that the smoothing parameter μ k indexed by the iteration counter k is a nonnegative real number converging to zero as k →∞. At each iteration, our scheme uses the smooth approximation f μ k (x)= h(x)+ g μ k (x) as the surrogate and with suitable assumptions on the form of g μ k , accel- erated coordinate descent scheme can be developed for the smooth approximation sequence f k (x). Furthermore, the sequential smoothing can be incorporated into the estimate sequence framework of Nesterov for analyzing the convergence complexity, with the smoothing parameter μ k playing the crucial role of balancing the degree of smoothing and the rate of convergence. We study differ- ent smoothing strategies in terms of using different sequences of μ k , and with appropriately-chosen * This work was partially supported by NSF IIS 1065081 1