Possibilistic Clustering based on Robust Modeling of Finite Generalized
Dirichlet Mixture
M. Maher Ben Ismail and Hichem Frigui
Multimedia Research laboratory, CECS dept. University of Louisville, USA
mmbeni01@louisville.edu h.frigui@louisville.edu
Abstract
We propose a novel possibilistic clustering algorithm
based on robust modelling of the Generalized Dirich-
let (GD) finite mixture. The algorithm generates two
types of membership degrees. The first one is a posterior
probability that indicates the degree to which the point
fits the estimated distribution. The second membership
represents the degree of “typicality” and is used to in-
dentify and discard noise points. The algorithm mini-
mizes one objective function to optimize GD mixture pa-
rameters and possibilistic membership values. This op-
timization is done iteratively by dynamically updating
the Dirichlet mixture parameters and the membership
values in each iteration. We compare the performance
of the proposed algorithm with an EM based approach.
We show that the possibilistic approach is more robust.
1. Introduction
During the last two decades, finite mixture models
[1] have emerged as a flexible and powerfull modelling
tool for probabilistic model based clustering. Finite
mixtures naturally model data samples which are as-
sumed to have been produced by one of a set of al-
ternatives random sources. Inferring the parameters of
these sources and identifying which source produced
each sample leads to the problem of data clustering. De-
spite all recent progress, this is still an open research
problem. The problem is more acute when the data are
corrupted by noise and is high dimensional. Gaussian
mixtures, with assumed diagonal covariance matrices
for components, have been used frequently [2]. How-
ever, Gaussian functions cannot approximate asymmet-
ric distributions.
Recently, Generalized Dirichlet (GD) mixture have
been adopted as a good alternative [3]. In [5], the au-
thors proved that GD is more appropriate for modelling
data that are compactly supported, such as data originat-
ing from videos, images, or text. Moreover, GD distri-
butions could be transformed to yield features that are
independent and follow Beta distributions. Thus, the
conditional independence assumption among features,
commonly used for data clustering [6] to model high-
dimensional data, becomes a fact for GD samples with-
out loss of accuracy.
The problem of estimating the parameters of GD mix-
ture has been the subject of diverse studies [8], and the
maximum likelihood method (ML) [1, 11] is the most
common approach. Another approach is to use the ex-
pectation maximization (EM) [5, 11]. However, these
methods do not perform well when the data are noisy.
In fact, noise points and outliers can drastically affect
the esimate of the model parameters and, hence, the fi-
nal clustering partition.
To overcome this limitation, we propose a possibilistic
approach for GD mixture parameter estimation and data
clustering. Our approach generates possibilistic mem-
bership functions which represent the “typicality” of
each data point. This is in addition to the posterior prob-
abilities which indicate how well each point fits within
the estimated distribution.
2. Possibilistic Clustering based on Robust
Geneneralized Dirichlet Mixture Model
Let Y = (
-→
Y
1
,
-→
Y
2
, ...,
--→
Y
N
) be a set of N points
where Y
i
∈ R
d
. We assume that Y is generated by
a mixture of GD distributions with parameters θ
*
=
(
-→
θ
*
1
,
-→
θ
*
2
, ...,
-→
θ
*
M
, p
1
, ..., p
M
), where
-→
θ
*
j
, is the parameter vec-
tor of the jth GD component and p
j
are the mixing
weights. The finite GD mixture models the data using
p(
- →
Y|θ
*
)=
M
j=1
p
j
p(
- →
Y|
-→
θ
*
j
), (1)
where p(
- →
Y|
-→
θ
*
j
) is the GD distribution. Each
-→
θ
*
j
=
(α
*
j1
,β
*
j1
,α
*
j2
,β
*
j2
, ..., α
*
jd
,β
*
jd
) is the set of parameters
2010 International Conference on Pattern Recognition
1051-4651/10 $26.00 © 2010 IEEE
DOI 10.1109/ICPR.2010.145
577
2010 International Conference on Pattern Recognition
1051-4651/10 $26.00 © 2010 IEEE
DOI 10.1109/ICPR.2010.145
577
2010 International Conference on Pattern Recognition
1051-4651/10 $26.00 © 2010 IEEE
DOI 10.1109/ICPR.2010.145
573
2010 International Conference on Pattern Recognition
1051-4651/10 $26.00 © 2010 IEEE
DOI 10.1109/ICPR.2010.145
573
2010 International Conference on Pattern Recognition
1051-4651/10 $26.00 © 2010 IEEE
DOI 10.1109/ICPR.2010.145
573