Natural Language Asymmetry and Internet Infrastructures* Anna Maria Di Sciullo Université du Québec à Montréal Abstract We present the main features of an Information Retrieval and Extraction system based on natural language asymmetric relations. We show that, along with the identification of functional elements, asymmetric relations contribute to improve the performance of search engines. We compare an Information Retrieval and Extraction system based on the recovery of a subset of asymmetric relations with current operating search engines based on key word search and Boolean analysis. We show the superiority of the first system. We show that natural language asymmetries constitute a crucial ingredient of Internet Infrastructures ensuring greater precision to internet communication. 1. Internet infrastructures The semantic web aims to provide a universally accessible platform that allows data to be shared and processed by automatic tools as well as by users. It also aims to define and link the data on the web to improve information seeking, discovering, and reuseing across different applications. New languages are developed making more of the information on the web machine- readable. It aims to develop a new generation of technologies and toolkits, as well a new ways to assist the web user. (1) Semantic web developments include: a. Linking of databases (http://www.w3.org/XML ) b. Sharing content between applications using different XML DTDs or schemas (http://www.w3.org/XMl/Schema ), (http://www.w3.org/RDF ) (http://www.w3.org/TR/SOAP ) c. Combination of web services (http://www.w3.org/TR/rdf-schema/ ) (http://www.w3.org/2001/sw/WedOnt ) This sort of development of the web infrastructure has several merits, including the definition of a shared data model for the design of any query language and the linking of data from many different models. Its limitations however reside in the assumption that the processing of natural languages properties can be dispensed with and that natural language semantics reduces to shallow lexical semantic relations in conjunction with (fragments of ) the knowledge of the world. This is illustrated in (2) with SemTag []. (2) Consider a world in which all documents on the web contained semantic annotations based on TAP. So the sentence: ``The Chicago Bulls announced yesterday that Michael Jordan will...'' would appear as: The <resource ref="http://tap.stanford.edu/ BasketballTeam_Bulls">Chicago Bulls</resource> announced yesterday that <resource ref= "http://tap.stanford.edu/AthleteJordan,_Michael"> Michael Jordan</resource> will...'' Thus, the annotation: <resource ref="http://tap.stanford.edu/ AthleteJordan,_Michael">Michael Jordan</resource> says that the string ``Michael Jordan'' refers to the resource whose URI is ``http://tap.stanford.edu/AthleteJordan,_Michael.'' It is expected that querying this URI will result in encoded information which provides greater detail about this resource. ………. * This work is supported in part by funding from the Social Sciences and Humanities Research Council of Canada to the Asymmetry Project, grant number 214-97-0016, as well as by Valorisation-Recherche Québec, grant number 2200-006 attributed to Professor Anna Maria Di Sciullo at the University of Quebec in Montreal.