arXiv:1303.5259v1 [cs.CG] 21 Mar 2013 Efﬁcient Sparseness-Enforcing Projections Markus Thom 1 and Günther Palm 2 Abstract. We propose a linear time and constant space algorithm for computing Euclidean projections onto sets on which a normalized sparseness measure attains a constant value. These non-convex target sets can be characterized as intersections of a simplex and a hypersphere. Some previous methods required the vector to be projected to be sorted, resulting in at least quasilinear time complexity and linear space complexity. We improve on this by adaptation of a linear time algorithm for projecting onto simplexes. In conclusion, we propose an efﬁcient algorithm for computing the product of the gradient of the projection with an arbitrary vector. 1 Introduction In a great variety of classical machine learning problems, sparse solutions are appealing because they provide more efﬁcient representations compared to non-sparse solutions. Several formal sparseness measures have been proposed in the past and their properties have been thoroughly analyzed [1]. One remarkable sparseness measure is the normalized ratio of the L 1 norm and the L 2 norm of a vector, as originally proposed by [2]: σ : R n \{ 0 }→ [0, 1] , x → √ n − ‖x‖ 1 ‖x‖ 2 √ n − 1 . Here, higher values of σ indicate more sparse vectors. The extreme values of 0 and 1 are achieved for vectors where all entries are equal and vectors where all but one entry vanish, re- spectively. Further, σ is scale-invariant, that is σ(αx)= σ(x) for all α = 0 and all x ∈ R n \{ 0 }. The incorporation of explicit sparseness constraints to existing optimization problems while still being able to efﬁciently compute solutions to them was made possible by [2] through propo- sition of an operator, which computes the Euclidean projection onto sets on which σ attains a desired value. In other words, given a target degree of sparseness σ ∗ ∈ (0, 1) with respect to σ, numbers λ 1 , λ 2 > 0 can be derived such that σ ≡ σ ∗ on the non-convex set D :=  s ∈ R n ≥0   ‖s‖ 1 = λ 1 and ‖s‖ 2 = λ 2  . Clearly, either of λ 1 and λ 2 has to be ﬁxed to a pre-deﬁned value, for example by setting λ 2 := 1 for achieving normalized vectors, as only their ratio is important in the deﬁnition of σ. By restricting possible solutions to certain optimization problems to lie in D, projected gradient descent methods [3] can be used to achieve solutions that fulﬁll explicit sparseness constraints. 1 driveU / Institute of Measurement, Control and Microtechnology, Ulm University, Ulm, Germany 2 Institute of Neural Information Processing, Ulm University, Ulm, Germany E-mail addresses: markus.thom@uni-ulm.de, guenther.palm@uni-ulm.de 1