298 Int. J. Computer Applications in Technology, Vol. 60, No. 4, 2019
Copyright © 2019 Inderscience Enterprises Ltd.
A comparison of text classification methods using
different stemming techniques
Mariem Bounabi*
Computer Sciences, Imaging and
Numerical Analysis Laboratory (LIIAN),
USMBA University Fes,
Fez City, Morocco
Email: mariem.bounabi@usmba.ac.ma
*Corresponding author
Karim El Moutaouakil
Hoceima National School of Applied Sciences (ENSAH),
Mohammed First University,
Al-Hoceima, Morocco
Email: karimmoutaouakil@yahoo.fr
Khalid Satori
Computer sciences, Imaging and
Numerical Analysis Laboratory (LIIAN),
USMBA University Fes,
Fez City, Morocco
Email: khalidsatori@gmail.com
Abstract: In the retrieval of information, two factors have an important impact on the
performance of systems: the extract features and the matching process. In this work, we compare
three well-known stemming techniques: Lovins stemmer, iterated Lovins and snowball stemmer.
Concerning the classification phase, we compare, experimentally, six methods: BNET, NBMU,
CNB, RF, SLogicF, and SVM. Basing on this comparison, we propose a new retrieval system by
calling the voting method, as a matching tool, to improve the performance of the classical
systems. In this paper, we use the TF-IDF algorithm to extract features. The envisaged systems
are tested on two databases: BBCNEWS and BBCSPORT. The systems based on Lovins
stemmers and on the voting technique give the best results. In fact, for the first databases, the best
accuracy observed is for the system Lovins + Vote with a recognition rate of 97%. Concerning
the second database, the system snowball +Vote gives us 99% as recognition rate.
Keywords: NBMU; SVM; RF; NB; SLogiF; CNB; voting technique; classification; stemmer;
term-weighting.
Reference to this paper should be made as follows: Bounabi, M., El Moutaouakil, K. and Satori, K.
(2019) ‘A comparison of text classification methods using different stemming techniques’,
Int. J. Computer Applications in Technology, Vol. 60, No. 4, pp.298–306.
Biographical notes: Mariem Bounabi received the Master degree from Computer Science
Department at Faculty of Sciences of Fes (FSDM), Morocco, in 2015. She is currently pursuing
her PhD in the same department. Her main research interests include machine learning and
retrieval information.
Karim El Moutaouakil received the PhD degree from the Faculty of sciences and Technologies in
Fez, Morocco in 2011. He is currently an Assistant Professor of Computer Science at the
National School of Applied Sciences in Al-Hoceima, Morocco. His research interests include
artificial intelligence, machine learning and pattern recognition.
Khalid Satori received the PhD degree from the National Institute for the Applied Sciences INSA
at Lyon in 1993. He is currently a Full Professor of Computer Science at USMBA University in
Morocco. His is the director of the LIIAN Laboratory. His research interests include real-time
rendering, image based rendering, virtual reality, biomedical signal, camera self-calibration and
3D reconstruction and pattern recognition.