TRANS. IMAGE PROC. 2 Robust Regression of Scattered Data with Adaptive Spline–Wavelets Daniel Casta˜ no and Angela Kunoth Abstract— A coarse-to-fine data fitting algorithm for irregu- larly spaced data based on boundary–adapted adaptive tensor– product semi–orthogonal spline–wavelets has been proposed in [CK1]. This method has been extended in [CK2] to include regularization in terms of Sobolev and Besov norms. In this paper, we develop within this least–squares approach some statistical robust estimators to handle outliers in the data. Our wavelet scheme yields a numerically fast and reliable way to detect outliers. Index Terms— Scattered data, outlier detection, robust estima- tion, adaptive wavelets, coarse–to–fine algorithm. I. I NTRODUCTION In [CK1], we have proposed an adaptive method of least squares data fitting based on wavelets which progresses on a coarse–to–fine basis. The algorithm works on structured grids and is, therefore, particularly useful to implement us- ing tensor products in more than one spatial dimensions. A favourable feature of the scheme is that there is no explicit grid construction like when using, e.g., data–dependent trian- gulations. In this sense, the data representation is independent of the original cloud of data points. These advantages have made approaches on structured grids a well established, much followed and universally applicable procedure, covering the whole range from strictly mathematical analysis to applica- tions in various areas of sciences, see, e.g., [GG], [GHJ], [GH], [He], [HPMM], [HR], [LWS], [PS], [SHLS], [Sch], [Z]. Wavelets as basis functions for the representation of scattered data provide additional features in the problem formulation regarding computational efficiency as well as sparseness of the representation, such as good conditioning and a natural built– in potential for adaptivity (see, e.g., [Ch] for an introduction to basic wavelet theory). We briefly wish to recall for further reference our approach in [CK1], [CK2] and some properties of the wavelets we em- ploy here. Suppose we are given some set X = {x i } i=1,...,N , consisting of irregularly spaced and pairwise disjoint points x i ∈ Ω := [0, 1] n , n ∈{1, 2, 3}. For each i, we denote by z i ∈ IR the corresponding data making up the set Z . The problem of scattered data fitting with regularization (of Tikhonov type) can be formulated as follows: Find a function Manuscript received March 11, 2005. This work was supported by the Deutsche Forschungsgemeinschaft, Grant KU 1028/7–1, and by the SFB 611, Universit¨ at Bonn. Daniel Casta˜ no was also supported by a BFI grant of the Basque Government. European Molecular Biology Laboratory, Meyerhofstr. 1, 69117 Heidel- berg, Germany, castano@embl.de Institut f¨ ur Numerische Simulation, Universit¨ at Bonn, Wegelerstr. 6, 53115 Bonn, Germany, kunoth@ins.uni-bonn.de. f :Ω → IR that approximates the cloud of points (X,Z ) in a least squares sense, that is, f is to minimize the functional J (f ) := N i=1 (z i − f (x i )) 2 + ν ‖f ‖ 2 Y . (I.1) Here Y may be a Sobolev space Y = H α or a Besov space with smoothness index α, and the parameter ν ≥ 0 balances the data fit with a user–specified regularity of the representa- tion f . In particular, we want to construct an expansion of f of the form f (x)= λ∈Λ d λ ψ λ (x),x ∈ Ω. (I.2) The set of basis functions {ψ λ } λ∈Λ is a subset of the wavelets described in [SDS]. They have the following properties: each ψ λ is a tensor product of a certain boundary–adapted linear combination of linear B–splines, denoted as (pre)wavelets or shortly wavelets. From a computational point of view, this is very advantageous since we can practically work with piecewise polynomials. The index set Λ is a lacunary set of indices resulting from an adaptive coarse–to–fine procedure explained below. We denote the infinite set of all possible indices by II . Each index λ ∈ II is of the form λ =(j, k, e), where j =: |λ| denotes the level of resolution or refinement scale, k the spatial location, and e ∈{0, 1} n the type of wavelet in more than one spatial dimension which is induced by tensor products, see e.g. [D1], [DKU], [SDS]. Each wavelet ψ λ has compact support satisfying diam (supp ψ λ ) ∼ 2 −|λ| . The relation a ∼ b always is to mean that a can be estimated from above and below by a constant multiple of b independent of all parameters on which a or b may depend. In view of the finite domain and the compact support of the basis functions, there is by construction a coarsest refinement level j 0 (j 0 =1 in the case of piecewise linear wavelets considered here). Basis elements with multi–index e = (0, 0) only occur on level j 0 . They are called scaling functions and are tensor products of piecewise linear B–Splines. For e = (0, 0), ψ λ represents detail information of higher frequencies. Consequently, f (x) in (I.2) can be split into a scaling function term and a wavelet term, f (x)= λ∈Λ,j=j0, e=(0,0) d λ ψ λ (x)+ λ∈Λ,j≥j0, e=(0,0) d λ ψ λ (x). (I.3) The complete collection of scaling functions and wavelets {ψ λ : λ ∈ II } constitutes a Riesz basis for L 2 (Ω). Moreover, one has norm equivalences for functions in Sobolev spaces