Fusion Engineering and Design 85 (2010) 423–424 Contents lists available at ScienceDirect Fusion Engineering and Design journal homepage: www.elsevier.com/locate/fusengdes Empirically derived basis functions for unsupervised classification of radial profile data D.G. Pretty a, , J. Vega b , M.A. Ochando b , F.L. Tabarés b a Plasma Physics Laboratory, Research School of Physics and Engineering, Australian National University, Canberra ACT 0200, Australia b Asociatión EURATOM/CIEMAT para Fusión, Avda Complutense 22, 28040 Madrid, Spain article info Article history: Available online 16 February 2010 Keywords: Profile classification SVD Support vector machine abstract We present an analysis of empirically derived basis vectors for feature detection in radial profile data. Our aim is to classify broad and peaked profiles using unsupervised techniques. Radial data often con- tains a continuum of profile shapes from broad to peaked, as such clustering methods may be unreliable. Previously, ad hoc heuristic measures had been used for classification of profiles from raw data (without tomographic reconstruction), which required significant manual inspection of the data. Here, we apply a singular value decomposition (SVD) to a training data matrix consisting of a concatenation of multichan- nel bolometry time series data from 103 TJ-II plasma discharges with good representation of the range of profiles. The second largest spatial basis vector (topo) has radial roots either side of the plasma centre, and can intuitively be interpreted as a peakedness perturbation. The inverted topo matrix can be used to process new data for automated profile classification. Finally, we show an application of this method using support vector machines to locate other signals related to the radiation profile. © 2010 Elsevier B.V. All rights reserved. 1. Empirical spatial basis vectors Distinct “bell” and “dome” shaped profiles are observed in TJ-II bolometry (CBOL) profile data. In order to allow rapid or real- time classification of the profiles, it is desirable to use raw data, and tomographically reconstruct data only for specific shots of interest. Previously, an ad hoc comparison of outer bolometry chan- nels b = CBOL13/CBOL14 was used to detect changes in profile shape from raw data. A transition of b from constant low value (1 b 4) to higher values with increased fluctuation was found to correspond with a bell to dome transition. However, as the b parameterisation is not suitable for quantitative or unsupervised use we instead use basis vectors, empirically derived using a singu- lar value decomposition (SVD) of a data matrix selected to describe the profile features we wish to classify. The SVD is defined as S = UAV [1] where, in the context of this paper, the rows of S are the separate bolometry timeseries channels, the columns of U and V contain the spatial (topo) and temporal (chrono) orthonormal singular vectors respectively, V denotes the conjugate transpose of V , and the diagonal matrix A contains the non-negative singular values. The SVD is closely related to principle component analysis (PCA); if the channels means are subtracted from S, the topos U correspond to the principle components of SS T . Corresponding author at: Plasma Physics Laboratory, Research School of Physical Sciences, Australian National University, Canberra, Australia. Tel.: +61 402 305 212. E-mail address: david.pretty@anu.edu.au (D.G. Pretty). Both SVD and PCA methods are widely used with fusion data, e.g. for interferometry profile inversions [2] tokomak q-profile con- trol [3] and fluctuation mode analysis [4,5]. While these methods are generally used for feature extraction (dimensionality reduc- tion) and noise removal, here our goal is profile classification with respect to a predetermined feature, namely the peakedness of the profile. As described below, when the desired feature is expressed as a singular vector we benefit from the quantification of classifi- cation uncertainty. To generate the basis vectors, we construct an N c × N s data matrix S by concatenating the N c = 16 bolometry channels of 103 TJ-II discharges, where N s = number of shots × samples per shot, with pre- and post-shot noise removed. The 103 discharges selected represent a wide range of typical TJ-II plasmas in which bell and dome profiles have been observed. The dominant topos (columns of U with largest correspond- ing singular values) are shown in Fig. 1; the largest topo (topo 0) gives the general radial profile, and the topo 1 is the perturbation which defines a peaked (flattened) profile when the corresponding chrono vector is positive (negative). Tomographic reconstruction of timeseries generated from these two largest topos show this inter- pretation from the raw data corresponds to the observed bell and dome shaped profiles. The occurrence of topo 1 as a measure of peakedness is not guaranteed by the SVD but is a consequence of the training set S being selected from shots in which the bell–dome modification is the profile perturbation with greatest signal energy. The confidence in topo 1 as a reliable direct parameterisation of the basic profile shape can be quantified as: p n1 = a 2 1 c 1 / i=1,2,...,15 a 2 i c i 0920-3796/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.fusengdes.2010.01.020