Karim Gasmi et al., International Journal of Advanced Trends in Computer Science and Engineering, 9(2), March - April 2020, 2310 – 2319 2310 ABSTRACT The conceptual representation is one of the most commonly used approaches as a solution for semantic information retrieval. Most approaches apply NLP tools to map terms from queries and documents to concepts and then compute the relevance scores based on the concept representation. However, the mapping results are not perfect due to the erroneous concepts that are generated out of the document context. To overcome this problem, we propose to add a concept selection step in the indexing. Furthermore, we propose in this paper to study the use of semantic similarity distances in the matching step. Then, we propose a method based on adaptive genetic algorithm to combine two SSD. Key words : Information retrieval, semantic similarity concept, medical information, UML. 1. INTRODUCTION The traditional indexing methods are based on single words as an entity to represent the information in textual corpus. This representation is based on the co-occurrence of words in a text and does not take into account the semantic relationships that may exist between them. The problem of these models is that the meaning of a word can be expressed in different words, and one word can express different meanings in different contexts. This is due to the richness of the mechanisms of reflection and linguistic expression. Some studies [3, 33] have highlighted the inadequacy of document representation based on simple words. The authors in [12] showed that only 20% of Internet users use a 100% accurate application depending on their needs. Indeed, this wealth can be a source of ambiguity in a natural language. Although the solutions based on relevance feedback allow partially overcoming the problem of synonymy and helping to improve the recall, yet the problem of polysemy still persists. In recent years, much work has highlighted the inadequacy of words representation based on simple words[2, 33]. Yet, it has been difficult to go beyond the performance achieved so far. Some works have suggested exploring the semantic textual representation of information. Then appeared several studies attempting to incorporate the semantic information in the Information Retrieval process. Among such works, we can distinguish two main approaches: the conceptual indexing and the semantic indexing[34]. The latter is based on the meaning of words. These representations are based either on word-based correlations or dictionaries[24] for synonyms extraction. Lately, in [25] and [28] suggested expanding query terms by medical terms. The first 6 study converts automatically user terms to medical terms using UMLS. Then, it adds the 7 medical terms to the initial query. However, there has not been sufficient evaluation for 8 this method. On the hand, authors in [13], employed the MetaMap online tool to recognize 9 medical terms in queries through selecting the highest score mapping of each phrase. Using some statistics related to the query and the collection, corresponding medical terms were extracted, filtered (by a stop word list) and weighted to be added to the original query finally. Other studies suggested to select expansion terms by taking of irrelevant terms by employing different means like Quantum Mechanic, [54], the document frequency chisquare [35], and the Rank Score Method [21]. On the contrary, conceptual indexing is based on concepts extracted from semantic resources and taxonomies to index documents [52]. As part of information retrieval research, authors believe that the conceptual indexing can be seen as a generalization of the semantic indexing as the concepts convey the meanings of words or terms. The objective of the conceptual approaches is to identify all the terms of the document and could represent them as concepts using an external resource. The concepts extracted are based on the external resource which focuses on the keywords generated from the text[5, 27, 39]. Semantic Similarity Measures for Medical Information Retrieval Karim Gasmi 1 , Mouna Torjmen 2 1 College of Sciences and Arts of Tabarjal Jouf University , Saudi Arabia gasmikarim@yahoo.fr 2 ReDCAD Laboratory, National School of Engineering Sfax, Tunisia mouna.torjmen@redcad.org ISSN 2278-3091 Volume 9 No.2, March - April 2020 International Journal of Advanced Trends in Computer Science and Engineering Available Online at http://www.warse.org/IJATCSE/static/pdf/file/ijatcse213922020.pdf https://doi.org/10.30534/ijatcse/2020/213922020