Information Processing Letters 110 (2010) 1031–1036 Contents lists available at ScienceDirect Information Processing Letters www.elsevier.com/locate/ipl On spectral windows in supervised learning from data ✩ Giorgio Gnecco, Marcello Sanguineti ∗ Department of Communications, Computer, and System Sciences (DIST), University of Genoa, Via Opera Pia 13, 16145 Genova, Italy article info abstract Article history: Received 10 June 2009 Received in revised form 20 August 2010 Accepted 24 August 2010 Available online 20 September 2010 Communicated by P.M.B. Vitányi Keywords: Analysis of algorithms Learning from data Regularization Suboptimal solutions Empirical error functionals Probabilistic estimates For Tikhonov regularization in supervised learning from data, the effect on the regularized solution of a joint perturbation of the regression function and the data is investigated. Spectral windows in the finite-sample and population cases are compared via probabilistic estimates of the differences between regularized solutions. 2010 Elsevier B.V. All rights reserved. 1. Introduction For a nonempty set X ⊆ℜ d and a probability measure ρ on X ×ℜ, Statistical Learning Theory [27] - a branch of Computational Learning Theory [19] - models the super- vised learning problem as the minimization of the expected error functional E ( f ) = X ×ℜ ( f (x) − y ) 2 dρ , where f : X →ℜ belongs to a suitable space H of func- tions, called hypothesis space. We assume that there exists N > 0 such that y ∈ [−N, N] and that ρ is nondegenerate (i.e., nonempty open subsets of X × [−N, N] have strictly positive measure) and has a nondegenerate marginal prob- ability measure on X , denoted by ν . Usually, ρ is unknown and one has at disposal, for a positive integer m,a data sample ✩ The authors were partially supported by a PRIN grant from the Italian Ministry for University and Research, project “Adaptive State Estimation and Optimal Control”. * Corresponding author. E-mail addresses: giorgio.gnecco@dist.unige.it (G. Gnecco), marcello.gnecco@dist.unige.it (M. Sanguineti). z = (x, y) = (x i , y i ) ∈ X ×ℜ, i = 1,..., m , where (x i , y i ), i = 1,..., m, are random variables indepen- dent and identically distributed (i.i.d.) according to ρ . The information provided by the data sample can be exploited to minimize a suitable approximation of the expected error functional, instead of the expected error itself. Typically, the problem of supervised learning from data is ill-posed [6] and regularization [26] can be used to cope with this drawback. A widespread regularization approach consists in minimizing over H the regularized empirical er- ror functional E z ( f ) + γ Ψ( f ), where E z : H →ℜ, defined as E z ( f ) = 1 m m i =1 ( f (x i ) − y i ) 2 , is called empirical error functional, Ψ : H →ℜ is a func- tional called stabilizer, and γ > 0 is a regularization param- eter. The parameter γ controls the trade-off between the following two requirements: i) fitting to the data sample (via the value E z ( f ) of the empirical error associated with f ) and ii) penalizing solutions f that provide a large value of the stabilizer Ψ( f ). 0020-0190/$ – see front matter 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.ipl.2010.08.011