Optimal Selection of Embedding Parameters for Time Series Modelling Michael Small * Chi K. Tse * Abstract — Time delay embedding is the ﬁrst step in reconstruction of deterministic nonlinear dynam- ics from a time series. Unfortunately, there is no generic way to select the best time delay embed- ding. We show that for time series modelling it is possible to apply information theoretic arguments which lead to optimal selection of embedding win- dow. Our results show that selection of embedding dimension and embedding lag should be considered not as part of the embedding process but as part of the modelling procedure. Nonlinear time series modelling results show qualitative and quantitative improvement in both long term and short term dy- namics. 1 INTRODUCTION Takens’ embedding theorem [1] is very often in- voked as the motivation for applying a time de- lay embedding to reconstruct multi-dimensional dy- namics from a scalar variable. Let x t be the scalar observable observed at integer times t = 0, 1, 3, 4,...,N . The usual incarnation of the time delay embedding is to obtain vector variables v t such that v t = (x t ,x t-τ ,x t-2τ ,...,x t-(de-1)τ ) (1) and, by appealing to the theorem of Takens one claims that for suitable τ , and suﬃciently large d e and N the evolution of v t is topologically equivalent to the underlying dynamical system. Unfortunately, N will normally be constrained and there is no generic rule for the selection of d e and τ . Within the dynamical systems commu- nity methods such as minimum mutual informa- tion, false nearest neighbours and plateau onset of dynamical invariant are commonly applied [2]. En- gineers would be more familiar with the Nyquist limit which implies an absolute criteria in the case of systems which exhibit ﬁnite bandwidth (this is not strictly applicable to deterministic aperiodic nonlinear systems). Very often, the aim of reconstructions such as (1) is to be able to successfully estimate dynamic invariants of the underlying system (such as corre- lation dimension and the leading Lyapunov expo- * Department of Electronic and Information Engineering, Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong, China, e-mail: [ensmall,encktse]@polyu.edu.hk, tel.: +852 2766 4744, fax: +852 2362 8439. nent) [2]. In this case one can appeal to theoreti- cal results that suggest d e > 2d c + 1 [1], d e >d c [3], or that only suitable selection of τ is signiﬁ- cant [4]. Contradictory numerical results have also shown that the crucial parameter is actually the embedding window d e τ [5] In this paper we ask a slightly diﬀerent question, and naturally arrive at a diﬀerent answer. We are interested not in correct estimation of dynamic in- variant but only optimal reconstruction of the un- derling dynamics for a speciﬁc ﬁnite noisy time se- ries. We ﬁnd that in this situation choice of em- bedding lag τ should be left to the modelling algo- rithm. In fact, our results suggest that embedding and modelling are two parts of the same process and it is generally not possible to ﬁnd the opti- mal embedding parameters without ﬁrst building a model (and vice versa!). We derive an expression for the optimal embedding window as a function of the underlying dynamics, the system noise and the observation length N . Using this measure we provide an algorithm which can be used to estimate this embedding win- dow and show that this method can produce su- perior modelling results. In section 2 we discuss the necessary theoretical framework. Section 3 de- scribes the numerical modelling algorithm and in section 4 we present some modelling results. 2 THE CRITERION We ﬁrst need to deﬁne what we mean by the “best” model. Suppose that a time series x t of N observa- tions has been observed and that we wish to con- struct an embedding such that z t = (x t-ℓ1 ,x t-ℓ2 ,x t-ℓ3 ,...,x t-ℓn ) (2) where the embedding lags ℓ i satisfy 0 ≤ ℓ 1 ≤ ℓ i <ℓ i+1 ≤ ℓ n = d w . Notice that (2) repre- sents a slight generalisation of (1). Equation (2) is completely deﬁned by d w and a binary vector a = (a 1 ,a 2 ,...a n ) ∈ {0, 1} dw such that a j = 1 ⇐⇒ j = ℓ i for some i. Our objective is obtain a model f of the under- lying dynamics from (2) such that x t+1 = f (z t )+ e t (3)