International Journal of Statistics and Applications 2015, 5(5): 208-212
DOI: 10.5923/j.statistics.20150505.04
Effect of Sampling Methods on Misclassification of
Fisher's Linear Discriminant Analysis
Ghasem Rekabdar
*
, Bahare Soleymani
Department of Mathematics, Abadan Branch, Islamic Azad University, Abadan, Iran
Abstract In this study, the effect of stratified sampling design has been studied on the accuracy of Fisher's linear
discriminant function or Anderson's
ˆ
W . For this purpose, we put on weighted estimators in function
ˆ
W instead of simple
random sampling estimators. The results of a simulation study indicated that the performance of
ˆ
W affected by alteration of
sampling methods. The performance of proposed discriminant function
st
ˆ
W in comparison to the classical discriminant
function is more appropriate. Specially, in case of the mean of strata have significant difference compared with the overall
mean of each group.
Keywords Fisher's linear discriminant function, Multivariate normal distribution, Stratified sample design
1. Introduction
The discrimination between two groups using multivariate
data has been recognized as an important problem that was
firstly studied by Fisher (1936). The linear discriminant
function (LDF) is a standard approach to yield optimal
results when the two groups have a conditional multivariate
normal distribution with distinct mean vectors and common
covariance matrix (Mardia & et al, 1979). Computing the
misclassification probabilities or error rates of the
discriminant function are interesting issues. When
competing groups have known parameters, the LDF
distribution can be obtained exactly by univariate normal
distribution (Johnson & Wichern, 1992). In practice, the
parameters of the LDF are unknown. Then we estimate these
parameters by means of independent random "training
samples". The sample distribution of LDF has been studied
by several authors. Anderson (1973) obtained the asymptotic
expansion of the distribution of the sample Fisher's linear
discriminant function
ˆ
W in terms of order
2
O(n )
−
.
Atakan (2009) compared the performance of seven well
known methods in literature to estimating probability of
misclassification by bootstrap percentile confidence
intervals. This research can provide a good literature review
for more study.
In several researches, the sampling design effects on
statistical methods have been studied. Especially, in
* Corresponding author:
ghasem_rekabdar@yahoo.com (Ghasem Rekabdar)
Published online at http://journal.sapub.org/statistics
Copyright © 2015 Scientific & Academic Publishing. All Rights Reserved
regression analysis effect of sampling designs on least square
estimator studied by some authors (DuMuchel & Duncan,
1981; Horton & Fitzmaurice, 2004). Also, in analysis of
variance about mean difference of groups, effect of cluster
sampling design on F ratio studied in social and
psychological survey, frequently (Hegges & Rhoads, 2011).
In multivariate statistical analysis, complex sampling design
lead to complicated methods. However, little study has been
dedicated to the effect sampling methods on LDF because
analytical complexity. Nonetheless, some researchers
examining the effect of sampling design on the
misclassification probability of the LDF (Kao & McCabe,
1991; Leu & Tsui, 1997). In light of stratified random
sampling, Tsui & Leu (1998) indicated that asymptotic
expansion of LDF has an error of order O(1) . Therefore,
using of LDF without correction can increases the
probability of misclassification. Recently, Shahrokh
Esfahani & Dougherty (2014) by simulation study showed
that separate sampling with an inappropriate sampling ratio
can significantly reduce classification accuracy of LDF.
The main contribution of the present paper is to
approximate LDF probability of misclassification using
weighted estimators. In some researches, we have auxiliary
information about the groups and it is beneficial to use it to
construct LDF. For example, we can be able to categorize
each group on the basis of a qualitative variable. In this case,
stratified sampling design can be used to draw data from
each group. In this study, we substitute unbiased weighted
estimators in LDF when the sample design is stratified. Also,
a comparison between two linear discriminant functions is
made by a simulation study.