Gaussian process prediction for time series of structured data Benjamin Paassen, Christina Göpfert and Barbara Hammer * CITEC center of excellence Bielefeld University - Germany (This is a preprint of the publication [12], as provided by the authors.) Abstract Time series prediction constitutes a classic topic in machine learning with wide-ranging applications, but mostly restricted to the domain of vectorial sequence entries. In recent years, time series of structured data (such as sequences, trees or graph structures) have become more and more important, for example in social network analysis or intelligent tutoring systems. In this contribution, we propose an extension of time series mod- els to strucured data based on Gaussian processes and structure kernels. We also provide speedup techniques for predictions in linear time, and we evaluate our approach on real data from the domain of intelligent tutoring systems. 1 Introduction Time series prediction constitutes a classic topic in machine learning with wide- ranging and successful applications in physics, sociology and medicine [18]. In recent years, time series of structured data (sequences, trees or graphs) have become more and more important, describing for example the development of social networks [17] or learner solutions in intelligent tutoring systems over time [8]. Classic time series prediction models such as ARIMA, NARX, Kalman filters, recurrent networks or reservoir models focus on vectorial data represen- tations, and they are not equipped to handle time series of structured data [18]. In this contribution, we propose an extension of Gaussian process (GP) regression, which is capable of predicting time series of structured data. GP regression has been successfully applied on time series of vectorial data before [16, 19], but not yet for structured data. To extend GP regression to structured data, we rely on two observations: First, GPs are based on kernel values for the given data as input. Hence we can build upon the vast literature of distance measures and kernels for structured data, such as alignment distances, * Funding by the DFG under grant number HA 2719/6-2 and the CITEC center of excellence (EXC 277) is gratefully acknowledged. 1