Received: 15 September 2009, Revised: 2 December 2009, Accepted: 3 December 2009, Published online in Wiley InterScience: 2 February 2010 One class classiﬁers for process monitoring illustrated by the application to online HPLC of a continuous process Sila Kittiwachana a , Diana L. S. Ferreira a , Gavin R. Lloyd a , Louise A. Fido b , Duncan R. Thompson b , Richard E. A. Escott c and Richard G. Brereton a * In process monitoring, a representative out-of-control class of samples cannot be generated. Here, it is assumed that it is possible to obtain a representative subset of samples from a single ‘in-control class’ and one class classiﬁers namely Q and D statistics (respectively the residual distance to the disjoint PC model and the Mahalanobis distance to the centre of the QDA model in the projected PC space), as well as support vector domain description (SVDD) are applied to disjoint PC models of the normal operating conditions (NOC) region, to categorise whether the process is in-control or out-of-control. To deﬁne the NOC region, the cumulative relative standard deviation (CRSD) and a test of multivariate normality are described and used as joint criteria. These calculations were based on the application of window principal components analysis (WPCA) which can be used to deﬁne a NOC region. The D and Q statistics and SVDD models were calculated for the NOC region and percentage predictive ability (%PA), percentage model stability (%MS) and percentage correctly classiﬁed (%CC) obtained to determine the quality of models from 100 training/test set splits. Q, D and SVDD control charts were obtained, and 90% conﬁdence limits set up based on multivariate normality (D and Q) or SVDD D value (which does not require assumptions of normality). We introduce a method for ﬁnding an optimal radial basis function for the SVDD model and two new indices of percentage classiﬁcation index (%CI) and percentage predictive index (%PI) for non-NOC samples are also deﬁned. The methods in this paper are exempliﬁed by a continuous process studied over 105.11 h using online HPLC. Copyright ß 2010 John Wiley & Sons, Ltd. Keywords: control charts; multivariate statistical process control; one class classiﬁers; quadratic discriminant analysis; support vectors 1. INTRODUCTION Multivariate statistical process control (MSPC) [1–7] involves using several variables to monitor the progress of a process, many classic examples involving the use of near infrared (NIR) spectroscopy [8,9]. These methods are complementary to those of univariate process control where the change in a single parameter is studied as a process evolves, e.g. by using Shewhart charts [10]. Most approaches to MSPC involve establishing control charts that show the change in one or more multivariate parameter over time; multivariate control limits can be established, and if the value of the multivariate parameter is outside these limits this is evidence that there is some problem with the process. A variety of statistics have been proposed for these control charts, including the D statistic based on Hotelling’s T 2 for principal component models [6,7,11] and the Q statistic or square prediction error (SPE) [6,7,11–13]. Many of these approaches are based on SIMCA [2,14] and involve ﬁnding a normal operating conditions (NOC) region which is considered to be a group of samples in-control and looking at future samples to see whether they can be considered part of this group at one or more predeﬁned conﬁdence limits. The problem of MSPC can be considered a classiﬁcation problem—involving setting up a model using the NOC region and seeing whether other samples are classiﬁed into this region. However, unlike many classiﬁcation problems we only know the origins of samples in one class — the NOC region or in-group. One class classiﬁers are a general way of deﬁning these sort of problems [11,15], where a model is formed just using one group of samples. Unknowns (or test samples) are then assigned to this group according to a predeﬁned conﬁdence limit. If outside the limit they are considered not to be part of this group. Whereas this is compatible with the philosophy of SIMCA, in fact one class classiﬁers are much more broadly based. SIMCA involves the application of Quadratic Discriminant Analysis (QDA) to disjoint principal component models (that is PCs formed only using in-group samples), and of course has the signiﬁcant advantage that no other samples are required for the model (i.e. samples that are not a member of the in-group are not needed provided the NOC region is correctly deﬁned, so it does not matter whether there is a deﬁned out-group or whether these samples are outliers or contain unexpected features for example). However, there are more ﬂexible solutions and one is that most of the (www.interscience.wiley.com) DOI: 10.1002/cem.1281 Special Issue Article * Correspondence to: R. G. Brereton, Centre for Chemometrics, School of Chemistry, University of Bristol, Cantocks Close, Bristol BS8 1TS, UK. E-mail: r.g.brereton@bris.ac.uk a S. Kittiwachana, D. L. S. Ferreira, G. R. Lloyd, R. G. Brereton Centre for Chemometrics, School of Chemistry, University of Bristol, Cantocks Close, Bristol BS8 1TS, UK b L. A. Fido, D. R. Thompson GlaxoSmithKline, Gunnels Wood Road, Stevenage, Hertfordshire SG1 2NY, UK c R. E. A. Escott GlaxoSmithKline, Old Powder Mills, Tonbridge, Kent TN11 9AN, UK J. Chemometrics 2010; 24: 96–110 Copyright ß 2010 John Wiley & Sons, Ltd. 96