International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 03 | Mar -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 243
Algorithm for calculating relevance of documents in
information retrieval systems
Roberto Passailaigue Baquerizo
1
, Paúl Rodríguez Leyva
2
, Juan Pedro Febles
3
, Hubert Viltres
Sala
4
, Vivian Estrada Sentí
5
1
Canciller Universidad Tecnológica (ECOTEC)
Guayaquil, Ecuador
2
Departamento de Soluciones Informáticas para Internet,
Universidad de las Ciencias Informáticas,
La Habana, Cuba
3
Departamento Metodológico de Postgrado,
Universidad de las Ciencias Informáticas,
La Habana, Cuba
4
Departamento de Preparación Profesional
Universidad de las Ciencias Informáticas,
La Habana, Cuba
3
Departamento Metodológico de Postgrado,
Universidad de las Ciencias Informáticas,
La Habana, Cuba
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - This research belongs to the field of information
retrieval and its main objective is the basis of an algorithm to
assign the value of relevance to a document concerning a
consultation inserted by users on information retrieval
systems. The concept of relevance is a fundamental aspect in
the design and development of information retrieval systems,
because although these tools perform a thorough search of the
web, a correct structuring of documents and an efficient
storage of the same, if the user it does not obtain the results
that actually respond to its search needs, then the quality of
the information retrieval system is penalized by the
acceptance criteria of the users. The algorithm is based
primarily on the classical mathematical expressions for
calculating similarity between groups, known as the cosine,
jaccard and dice formulas. It has the particularity variation of
the similarity based on the relationship established between
the search profile of users and categories of documents stored
in information retrieval system. In order to get these variables
are used text mining and web mining techniques allowing the
processing of the information generated by the registration of
user queries and metadata stored documents? The main
contribution of the research is an algorithm to calculate the
relevance of the documents that are provided as part of the
responses to queries made by users
Key Words: algorithm, similarity, queries, information
retrieval systems, relevance
1. INTRODUCTION
This document is template. We ask that authors follow some
simple guidelines. In essence, we ask you to make your paper
look exactly like this document. The easiest way to do this is
simply to download the template, and replace (copy-paste)
the content with your own material. Number the reference
items consecutively in square brackets (e.g. [1]). However
the authors name can be used along with the reference
number in the running text. The order of reference in the
running text should match with the list of references at the
end of the paper. Information Retrieval (IR) is not a new
area, but is being developed since the late fifties. However, it
now plays a more important role given the value of the
information. It can be argued that having or not having the
right information in a timely manner can lead to the success
or failure of an operation. Therefore, the importance of
information retrieval systems (SRI) can handle - with certain
limitations - these situations effectively and efficiently [1].
From 1950 to the present many concepts have addressed
this particular issue. According to Baeza Yates, one of the
most experienced researchers in this field, the term "deals
with representation, storage, organization and access to
information elements". This concept is defined by Salton as
"a field related to the structure, analysis, organization,
storage, search and retrieval of information" [11]. Croft
estimates that information retrieval is "the set of tasks by
which the user locates and accesses information resources
that are relevant to problem resolution." Documentary
languages, abstract techniques, description of the