Information Processing Letters 110 (2010) 1031–1036 Contents lists available at ScienceDirect Information Processing Letters www.elsevier.com/locate/ipl On spectral windows in supervised learning from data Giorgio Gnecco, Marcello Sanguineti Department of Communications, Computer, and System Sciences (DIST), University of Genoa, Via Opera Pia 13, 16145 Genova, Italy article info abstract Article history: Received 10 June 2009 Received in revised form 20 August 2010 Accepted 24 August 2010 Available online 20 September 2010 Communicated by P.M.B. Vitányi Keywords: Analysis of algorithms Learning from data Regularization Suboptimal solutions Empirical error functionals Probabilistic estimates For Tikhonov regularization in supervised learning from data, the effect on the regularized solution of a joint perturbation of the regression function and the data is investigated. Spectral windows in the finite-sample and population cases are compared via probabilistic estimates of the differences between regularized solutions. 2010 Elsevier B.V. All rights reserved. 1. Introduction For a nonempty set X ⊆ℜ d and a probability measure ρ on X ×ℜ, Statistical Learning Theory [27] - a branch of Computational Learning Theory [19] - models the super- vised learning problem as the minimization of the expected error functional E ( f ) = X ×ℜ ( f (x) y ) 2 dρ , where f : X →ℜ belongs to a suitable space H of func- tions, called hypothesis space. We assume that there exists N > 0 such that y ∈ [−N, N] and that ρ is nondegenerate (i.e., nonempty open subsets of X × [−N, N] have strictly positive measure) and has a nondegenerate marginal prob- ability measure on X , denoted by ν . Usually, ρ is unknown and one has at disposal, for a positive integer m,a data sample The authors were partially supported by a PRIN grant from the Italian Ministry for University and Research, project “Adaptive State Estimation and Optimal Control”. * Corresponding author. E-mail addresses: giorgio.gnecco@dist.unige.it (G. Gnecco), marcello.gnecco@dist.unige.it (M. Sanguineti). z = (x, y) = (x i , y i ) X ×ℜ, i = 1,..., m , where (x i , y i ), i = 1,..., m, are random variables indepen- dent and identically distributed (i.i.d.) according to ρ . The information provided by the data sample can be exploited to minimize a suitable approximation of the expected error functional, instead of the expected error itself. Typically, the problem of supervised learning from data is ill-posed [6] and regularization [26] can be used to cope with this drawback. A widespread regularization approach consists in minimizing over H the regularized empirical er- ror functional E z ( f ) + γ Ψ( f ), where E z : H →ℜ, defined as E z ( f ) = 1 m m i =1 ( f (x i ) y i ) 2 , is called empirical error functional, Ψ : H →ℜ is a func- tional called stabilizer, and γ > 0 is a regularization param- eter. The parameter γ controls the trade-off between the following two requirements: i) fitting to the data sample (via the value E z ( f ) of the empirical error associated with f ) and ii) penalizing solutions f that provide a large value of the stabilizer Ψ( f ). 0020-0190/$ – see front matter 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.ipl.2010.08.011