Natarajan Meghanathan et al. (Eds) : ICCSEA, WiMoA, SCAI, SPPR, InWeS, NECO - 2019
pp. 183-191, 2019. © CS & IT-CSCP 2019 DOI: 10.5121/csit.2019.91815
TOWARD MULTI-LABEL CLASSIFICATION
USING AN ONTOLOGY FOR WEB PAGE
CLASSIFICATION
Yaya Traoré
1
and Sadouanouan Malo
2
and Bassolé Didier
1
and Séré
Abdoulaye
2
1
University Joseph KI-ZERBO, Ouagadougou, BURKINA FASO
2
University Nazi Boni, Bobo-Dioulasso, BURKINA FASO
ABSTRACT
Automatic categorization of web pages has become more significant to help the search engines to
provide users with relevant and quick retrieval results. In this paper, we propose a method based
on Multi-label Classification (ML) using an ontology which allows the prediction of the categories
of a new web page created and tagged. It uses the ontology in the learning phase as well as in the
prediction phase. In the learning phase, the ontology is used to build the training set. In the
prediction phase, the ontology is used to place the new pages tagged in the most specific
categories. The experiment evaluation demonstrates that our proposal shows the substantial
results.
KEYWORDS
Multi-label classification (ML), ontology, categorization, prediction.
1. INTRODUCTION
Nowadays, many web platforms are used to allow collaboration between users of a community
for creating and sharing knowledge. The web pages are semantically annotated. The number of
web pages are continuously growing and can cover almost any information needed. However, the
huge amount of web pages and the organization of these pages make the retrieval of precise and
exact information more and more difficult for a user. So an efficient and accurate method for
classifying this huge amount of data is very essential if the web pages are to be exploited to its
full potential. There doesn’t exist any specific method to automate this task. We deal with this
problem as a Multi-label (ML) classification problem [1], [12] consisting in predicting the
categories of a new page according to its tags. In our context, categories are looked upon as text
labels.
In order to use the label relationships to build the training data, we associate ML method with
ontology. An ontology [2] is used to present the domain knowledge. In this paper, we propose a
novel method that uses a method of ML based on ontology to predict the categories of a new web
page. Experiments are implemented to evaluate the performance of the proposed approach on the
datasets of the uniprot
1
web site. The experimental results indicate that the approach has a better
performance.