297
Copyright © 2010, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Chapter 18
The Power of Sampling and
Stacking for the PAKDD-2007
Cross-Selling Problem
Paulo J.L. Adeodato
NeuroTech Ltd. and Federal University of Pernambuco, Brazil
Germano C. Vasconcelos
NeuroTech Ltd. and Federal University of Pernambuco, Brazil
Adrian L. Arnaud
NeuroTech Ltd. and Federal University of Pernambuco, Brazil
Rodrigo C.L.V. Cunha
NeuroTech Ltd. and Federal University of Pernambuco, Brazil
Domingos S.M.P. Monteiro
NeuroTech Ltd. and Federal University of Pernambuco, Brazil
Rosalvo F. Oliveira Neto
NeuroTech Ltd. and Federal University of Pernambuco, Brazil
abstract
This article presents an effcient solution for the PAKDD-2007 Competition cross-selling problem. The
solution is based on a thorough approach which involves the creation of new input variables, effcient
data preparation and transformation, adequate data sampling strategy and a combination of two of the
most robust modeling techniques. Due to the complexity imposed by the very small amount of examples
in the target class, the approach for model robustness was to produce the median score of the 11 models
developed with an adapted version of the 11-fold cross-validation process and the use of a combination
of two robust techniques via stacking, the MLP neural network and the n-tuple classifer. Despite the
problem complexity, the performance on the prediction data set (unlabeled samples), measured through
KS2 and ROC curves was shown to be very effective and fnished as the frst runner-up solution of the
competition.