1-4244-0551-3/06/$20.00 ©2006 IEEE 278 ISSE 2006 St. Marienthal, Germany Transformation of Data for Statistical Processing Pavel Mach, Josef Thuring, David Šámal Department of Electrotechnology, Faculty of Electrical Engineering, Czech Technical University in Prague, Prague, Czech Republic mach@fel.cvut.cz Abstract: The use of many statistical tools depends on normality of processed data. There are different methods for transformation of non-normally distributed data sets toward to normally distributed ones. The goal of the work has been to investigate usability of four types of transformations (Box-Cox, exponential, power and logarithmic) for transformation of data sets with four non-normal distributions (logarithmic-normal, exponential, gamma, and Weibull) toward to normally distributed data. The usability and efficiency of individual transformation functions for transformation of data sets with different types of distributions have been found. 1. Introduction The use of SPC (Statistical Production Control) tools as well as many other statistical tools depends strongly on normality of processed data [1], [6]. If distribution of data is not normal, the use of many statistical tools is not possible because false results are obtained. Our research is focused on investigation of properties of electrically conductive adhesives. It has been necessary, for analysis of correlation among changes of electrical and mechanical properties of joints fabricated of different types of electrically conductive adhesives, to check normality of measured data, to delete outliers (if they have been found), and, when the data have not been normally distributed, to transform them into normality. Efficiency of transformation depends strongly on selection of a proper type of a transformation function. Application of a central limit theorem instead transformation is also possible; however, if the grouping is of a higher order, the total number of processed values rapidly decreases. Therefore the use of this theorem seems to be, especially in cases, when limited volume of data is measured, disadvantageous. Therefore the research has been focused on examination of efficiency of different types of transformation functions for transformation of data sets with different types of distributions different from the normal distribution, toward to normally distributed data sets. 2. Basic types of distributions Data have been transformed toward to normally distributed ones. Probability density of normal distribution N (µ, σ) is described by the equation [2]: (1) Where µ … mean value, σ … standard deviation. Normal distribution is typical for the data, which are measured at the output of good stabilized fabrication processes, for the data obtained by repeated measurements, which are disturbed by random noise [3], [4]. Efficiency of transformation functions have been tested on data sets with following types of distributions: Logarithmic-normal distribution LN(µ, σ): ) ) (ln 2 )) (ln (ln exp( 2 ) (ln 1 f(ln x) 2 2 x x E x x σ π σ − − ⋅ ⋅ = (2) ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − − = 2 2 ) ( exp 2 1 ) ( σ μ π σ x x f