Exceedance probability estimation for a quality test consisting of multiple measurements Satu Tamminen ⇑ , Ilmari Juutilainen, Juha Röning Department of Computer Science Engineering, P.O. Box 4500, FIN-90014 University of Oulu, Finland article info Keywords: Quality improvement Product design Charpy-V test GAM GAMLSS Quantile regression abstract The purpose of this study was to develop methods for exceedance probability estimation in the case of highly scattered measurement sets. The situation may occur when product quality is veriﬁed with several test samples, and thus, traditional point prediction based modelling methods are not sufﬁcient. Density forecasting methods are needed when not only the mean but also the deviance and the distri- bution shape of the response depend on the explanatory variables. Furthermore, with probability predic- tors, the ranking methods for the model selection should be chosen carefully, when models trained with different methods are compared. In this article, the impact toughness of the steel products was modelled. The rejection probability in Charpy-V quality test was predicted with mean and deviation models, distribution shape model and quantile regression model. The proposed methods were employed in two steel manufacturing applica- tions with different distributional properties. Ó 2013 Elsevier Ltd. All rights reserved. 1. Introduction Quality improvement is an essential part of manufacturing. Dif- ferent data mining techniques have been utilized widely, because vast amounts of process data is automatically collected and avail- able for these applications. Köksal, Batmaz, and Testik (2011) review extensively the data mining applications for quality improvement in manufacturing that have been reported in the literature from 1997 to 2007. Manufacturers monitor the quality of the product by testing different properties of the product. The quality is deﬁned by the requirements that the product should fulﬁl. If the test set consists only from one single measurement, the procedure is quite straight- forward, but in the case of several measurements, the rule for pass- ing the requirement test can get complicated. Generally, if the measured property has a high variability within the product, more test samples are needed for quality veriﬁcation. Risk probability predictor means a probabilistic forecasting method that predicts a risk of failure, for example, in meeting the product speciﬁcations. The predicted risk probability can be com- bined with the manufacturing and rejection costs, and implemented to an expert system that can assist the decision making in quality improvement and process planning. Probabilistic prediction, i.e. density forecasting, enables the prediction of the full probability dis- tribution of the response variable in contrast to dominantly utilized point forecasts, which offer no description of the uncertainty associated with the prediction. Probabilistic prediction has been ap- plied for example to econometric applications (Diks, Panchenko, & van Dijk, 2011; Tay & Wallis, 2000) and weather forecasting (Gneiting & Raftery, 2007; Laio & Tamea, 2007). Industrial process data is often heteroscedastic and non-Gauss- ian. In other words, it is not rare that the noise process of the model is input-dependent or that the dependent variable may be highly skewed. If the model simply estimates the conditional mean of the target data by minimizing the Sum of Squared Errors (SSE) function, and ignores these conditions, the performance will be poor. Furthermore, when predicting extreme events, as the rejec- tion in the quality test, the conventional SSE-model will consis- tently under-predict these events (Cawley, Janacek, Haylock, & Dorling, 2007). There are several possibilities to take into account this predic- tive uncertainty. Basic mean model can work as a probabilistic pre- dictor, but the distributional assumptions may be improved with the deviation model. The quantile regression model and distribu- tion shape models enable to include the form of the distribution into prediction. The need for density forecasting increases, when the predicted property consists of several measurements. Gener- ally, the property itself has a high variability within the product, but furthermore, the property may have clearly non-Gaussian distribution. Methods for joint modelling of mean and deviation in different industrial applications have been studied widely, (Carroll & Ruppert, 1988; Engel, 1992; Smyth, Huele, & Verbyla, 2001) and the typical method is to perform heteroscedastic regression with Generalised 0957-4174/$ - see front matter Ó 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.eswa.2013.01.056 ⇑ Corresponding author. Tel.: +358 294482538; fax: +358 85532612. E-mail address: satu.tamminen@ee.oulu.ﬁ (S. Tamminen). Expert Systems with Applications 40 (2013) 4577–4584 Contents lists available at SciVerse ScienceDirect Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa