Change-Point Monitoring for Online Stochastic Approximations Kim Levy a , Felisa J. V´ azquez-Abad b a Department of Mathematics and Statistics, University of Melbourne, 3010, Australia b Department of Computer Science, Hunter College, the City University New York Abstract We consider stochastic approximations in a quickly changing non-stationary environment. We assume the parameters of the system are subject to sudden discontinuous changes, which we refer to as regime-switching. We are interested in problems characterized by frequent signiﬁcant jumps with no a priori knowledge about the regimes. Our approach is based on constant step size stochastic approximation. While larger step sizes have the advantage of fast adaptation, smaller step sizes provide more precise estimates of the target value once the process is close to stationary. We propose to use a small constant step size combined with change-point monitoring, and to reset the process at a value closer to the new target when a change is detected. Stochastic approximation and change-point monitoring complement each other by achieving high precision as well as cutting down the convergence time. We give a theoretical characterization and discuss the tradeoff between precision and fast adaptation. We also introduce a new monitoring scheme, the regression-based hypothesis test, which performs comparably well to Page-Hinkley’s test and the CUSUM of residuals. The novelty of our approach is a) the combination of change-point monitoring to stochastic approximation in a regime-switching environment and b) the introduction of a new monitoring scheme. We provide an asymptotic analysis of this method and we show weak convergence to a limiting switching ODE for the non reset method, and to a hybrid DE for a reset method that we propose. Key words: stochastic approximation; online tracking; non-stationary environment; weak convergence, change-point monitoring 1 Introduction This paper deals with stochastic approximation (SA) algo- rithms of the form: θ n = θ n−1 + γY n , (1) where n ∈ N + is a discrete-time iteration index, θ n ∈ R q is a column vector of estimates of a target parameter vector θ ∗ ∈ R q , γ is a ﬁxed q × q diagonal matrix with diagonal entries γ i , i = 1,..., q, and Y n ∈ R q is a noisy feedback vector from the system, which provides a “direction of improvement” towards θ ∗ based on past observations of a process {ξ m , m ≤ n}. In this paper we study the SA under a regime switching environment, Let ψ t label the regime present at time t . Then θ ∗ n solves an inverse problem of the form G ψ t (θ ∗ n )= 0, but the function G is not known analytically. The feedback Y n is a on-line estimator of G, as will be formulated in detail in Section 2. There are many problems where stochastic approximations require fast reaction to regime changes. Examples are iden- Email addresses: k.levy@csiro.au (Kim Levy), felisav@hunter.cuny.edu (Felisa J. V´ azquez-Abad). tiﬁcation of changes in channels for mobile communica- tions to adjust for quality, following biological and chemical changes for dose adjustment, locating a target, and cruise control algorithms, among others. Target tracking with slow time-varying parameters, have been studied using SA algorithms [3,10,15]. For particu- lar classes of tracking problems, optimal constant step sizes have been identiﬁed using SA. A two-time-scale adaptive step size approach was proposed by [3], resulting in a re- cursive SA scheme. It consists of updating two parameters at once, the second parameter generally being the step size for the ﬁrst one. Convergence proofs are in [10,15]. A particular important example of our model is when G ψ t (θ )=(θ ∗ t − θ ), , assuming that noisy but unbiased ob- servations of the true “target location” θ ∗ n are available on- line Target tracking under abrupt changes has only recently been studied in [17,5]. The case where the regime-switching is modulated by a discrete-time Markov chain with infre- quent jumps is in [17]. We consider the same model for regime switching as in [5], where regime changes aren’t rare events and no assumptions are made about an underlying model. Information is obtained through online observations of the system. Therefore the data must be analyzed in real time and the average length of stay in any particular regime Preprint submitted to Automatica 9 June 2010