Product Multi-Kernels for Sensor Data Analysis Petra Vidnerová and Roman Neruda Institute of Computer Science, Czech Academy of Sciences, Prague Introduction Our approach is based on the model called regu- larization network (RN). RNs benefit from a good theoretical background, they has been proved to be the solution of the problem of learning from exam- ples when formulated as a regularized minimiza- tion problem [1, 2]. The key step of the RN learning is a choice of kernel function. Different kernel functions are suitable for different data types, but data are often heteroge- neous. We proposed the composite kernel functions that reflect better the character of data. Such ap- proach can be ranked among the so called multi- kernel models. RN and Kernels Learning problem formulation Given the data set {( x i ,y i ) R n × R} N i=1 , obtained by random sampling of a unknown function f , our goal is to find the function f or its approximation. Regularization Network Minimize H [f ]= 1 N N i=1 (f ( x i ) y i ) 2 + γ Φ[f ], where Φ is some stabilizer and γ> 0. The solution is unique and has the form f ( x)= N i=1 w i K x i ( x), (NγI + K ) w = y, where I is the identity matrix, K is the matrix K i,j = K ( x i , x j ), and y =(y 1 ,...,y N ). Meta- parameters γ , and a type of kernel K are given. Kernel function The choice of the kernel function K is an important part of the learning. It corresponds to the choice of a stabilizer and reflects our prior knowledge of the problem. Multi-Kernels Common elementary kernel functions: linear K ( x, y )= x T y polynomial K ( x, y )=(αx T y + r ) d ,α> 0 Gaussian K ( x, y )= exp(α|| x y || 2 ),α> 0 sigmoid K ( x, y ) = tanh(αx T y + r ) α, d and r are parameters of the kernel. Motivation for multi-kernel approach stems from the multi-modal nature of data. Each set of features may require a different notion of similarity (i.e., a different kernel). Instead of building a specialized kernel for such applications, it is possible to define just one kernel for each of these modes, and combine them. It this work, two types of composite kernels are considered: Product kernels K 1 and K 2 are some kernel functions defined on R n 1 and R n 1 , n 1 +n 2 = n. Then, a product kernel is defined: K ( x, y )= K ( x 1 , x 2 , y 1 , y 2 )= K 1 ( x 1 , y 1 )K 2 ( x 2 , y 2 ). Sum kernels K ( x, y )= K 1 ( x, y )+ K 2 ( x, y ), where K 1 and K 2 are kernel functions. We can combine different kernel functions or two kernel functions of the same type but with different parameters, such as two Gaussians of different widths (but the same centre). Evolution of Kernels We deploy genetic algorithm (GA) to search for op- timal composite kernels. It works with population of possible kernels (individuals) and evolves them using operators selection, crossover and mutation. Individual Encoding Elementary kernel function: I = {K, p, γ },. Product kernel: I = {K 0 ,p 0 ,K 1 ,p 1 ,i 1 ,...,i n },. Example: I ={Gaussian, 0.84, Inverse_Multiquadric, 1.58, [0, 0, 1, 0, 1, 1, 1, 1]=0.2}. Crossover and Mutation Crossover on elemen- tary kernels generates new values in the interval formed by the parents, i.e. γ = (1 r )γ 1 + 2 , where r ∈〈0, 1is a random number, γ 1 and γ 2 are parents’ values. Crossover on composite kernels swaps the subker- nels (and in case of product kernels runs one-point crossover on attribute vectors). Tournament Selection Use the kernel to create a network. Compute the crossvalidation error. The winner is the one with the lower error. Experiments Data set: a real-world data from the area of sensor networks for air pollution monitoring [3]. Tens of thousands measurements of gas multi-sensor devices recording concentrations of several gas pollutants for every hour. 5 input sensors and 3 target values( CO , NO 2 , and NOx concentrations). Methodology: GA was run for 300 generations, with 20 individuals, elite size 2. For fitness evaluation, the 10 folds crossvalidation was used. E = 100 1 N N i=1 ||y i f ( x i )|| 2 , each computation was repeated 10 times. -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 2820 2825 2830 2835 2840 2845 2850 2855 2860 data product kernel prediction gaussian kernel prediction Task I - sparse measurements The training data con- sists of 4 samples per day, the rest (values in- between) is then used for testing. Errors for CO (right), NO2 and NOx (bottom left and right). 0.115 0.12 0.125 0.13 0.135 0.14 0.145 0.15 0.155 0.16 E train E test E cross Gaussian Product Sum 0.24 0.26 0.28 0.3 0.32 0.34 0.36 0.38 0.4 0.42 0.44 0.46 E train E test E cross Gaussian Product Sum 0.12 0.14 0.16 0.18 0.2 0.22 0.24 E train E test E cross Gaussian Product Sum Task II - missing epochs The whole time period was split into five inter- vals, one for training, the rest for testing. Consider- ing mean values, improve- ment was achieved mainly on train errors. Minimal errors of product kernels are more promising, the GA should be improved. The resulting product ker- nels are mainly combi- nations of Gaussians and Inverse-Multiquadrics. References [1] Girosi, F., Jones, M., Poggio, T.: Regularization theory and Neural Networks architectures. Neural Computation 2 (7 1995) 219–269 [2] K˚ urková, V.: Some comparisons of networks with radial and kernel units. In: Neural Networks. LNCS 7553, Berlin, Heidelberg, Springer-Verlag (2012) 17–24 [3] Vito, S.D., Massera, E., Piga, M., Martinotto, L., Francia, G.D.: On field calibration of an electronic nose for benzene estimation in an urban pollution monitoring scenario. Sen- sors and Actuators B: Chemical 129(2) (2008) 750 – 757 Acknowledgment This work was partially supported by the Czech Grant Agency grant 15-18108S and institutional support of the Institute of Computer Science RVO 67985807.