1-4244-0551-3/06/$20.00 ©2006 IEEE 278 ISSE 2006 St. Marienthal, Germany
Transformation of Data for Statistical Processing
Pavel Mach, Josef Thuring, David Šámal
Department of Electrotechnology, Faculty of Electrical Engineering, Czech Technical University in Prague,
Prague, Czech Republic
mach@fel.cvut.cz
Abstract: The use of many statistical tools depends on normality of processed data. There are different
methods for transformation of non-normally distributed data sets toward to normally distributed ones.
The goal of the work has been to investigate usability of four types of transformations (Box-Cox,
exponential, power and logarithmic) for transformation of data sets with four non-normal distributions
(logarithmic-normal, exponential, gamma, and Weibull) toward to normally distributed data. The
usability and efficiency of individual transformation functions for transformation of data sets with
different types of distributions have been found.
1. Introduction
The use of SPC (Statistical Production Control)
tools as well as many other statistical tools depends
strongly on normality of processed data [1], [6]. If
distribution of data is not normal, the use of many
statistical tools is not possible because false results are
obtained.
Our research is focused on investigation of
properties of electrically conductive adhesives. It has
been necessary, for analysis of correlation among
changes of electrical and mechanical properties of
joints fabricated of different types of electrically
conductive adhesives, to check normality of
measured data, to delete outliers (if they have been
found), and, when the data have not been normally
distributed, to transform them into normality.
Efficiency of transformation depends strongly on
selection of a proper type of a transformation
function. Application of a central limit theorem
instead transformation is also possible; however, if the
grouping is of a higher order, the total number of
processed values rapidly decreases. Therefore the use
of this theorem seems to be, especially in cases, when
limited volume of data is measured, disadvantageous.
Therefore the research has been focused on
examination of efficiency of different types of
transformation functions for transformation of data
sets with different types of distributions different from
the normal distribution, toward to normally distributed
data sets.
2. Basic types of distributions
Data have been transformed toward to normally
distributed ones. Probability density of normal
distribution N (µ, σ) is described by the equation [2]:
(1)
Where µ … mean value, σ … standard deviation.
Normal distribution is typical for the data, which are
measured at the output of good stabilized fabrication
processes, for the data obtained by repeated
measurements, which are disturbed by random noise
[3], [4].
Efficiency of transformation functions have been
tested on data sets with following types of
distributions:
Logarithmic-normal distribution LN(µ, σ):
)
) (ln 2
)) (ln (ln
exp(
2 ) (ln
1
f(ln x)
2
2
x
x E x
x
σ
π σ
−
− ⋅
⋅
=
(2)
⎟
⎟
⎠
⎞
⎜
⎜
⎝
⎛ −
− =
2
2
) (
exp
2
1
) (
σ
μ
π σ
x
x f