Intelligenza Artiﬁciale 6 (2012) 117–119 DOI 10.3233/IA-130039 IOS Press 117 Guest Editorial Natural language processing in the web era Roberto Basili a,* and Bernardo Magnini b a Department of Enterprise Engineering, University of Roma, “Tor Vergata”, Roma, Italy b Fondazione Bruno Kessler, Trento, Italy Abstract. Natural Language is still the main carrier for the deﬁnition, synthesis and exchange of knowledge in the real world, and this is entirely reﬂected by Web contents. No interpretation process over Web data is really possible without a more or less explicit reference to natural language(s), the primordial soup from which semantics emerges. The advantage and opportunities for NLP research are evident. This paper introduces the Special Issue of the journal on NLP in the Web era by ﬁrst discussing some opportunities for current NLP research and then summarizing the contribution gathered by the volume. Keywords: Natural language processing, web applications, corpus-based methods, social web analysis 1. Introduction Natural Language is still the main carrier for the deﬁ- nition, synthesis and exchange of knowledge in the real world, and this is entirely reﬂected by Web contents. Although the growing levels of integration, multichan- nel and modalities of the information made available in the current Web, thus including the Social Web bodies of resources, the central role of language in e-mails, blogs, twits as well as in multimedia pages cannot be denied. Even when multimedia information is made available (as for example, pictures, videos, audio ﬁles or digital artworks) natural language is still central as the core vehicle of explanations and complementary crucial information. The role of the annotation pro- cesses that enriches these data with linguistic metadata for the localization, retrieval and delivery of the under- lying information is evident. No hermeneutic process over such data is really possible without a more or less explicit reference to (possibly multiple) natural * Corresponding author. Roberto Basili, Department of Enterprise Engineering, University of Roma, “Tor Vergata”, Via del Politec- nico 1, 00133 Roma, Italy. Tel./Fax: +39 06 72597391; E-mail: basili@info.uniroma2.it. language(s), that are thus the “primordial soup” from which semantics emerges. The advantage and opportunities for NLP research are evident. In the Web, sources of rich information about language are largely and freely made available. In line with the 90’s studies on corpus-driven lin- guistic knowledge induction and lexical acquisition, several research initiatives (such as, for example the Web as a corpus one [7, 4]) use the Web as the source of useful observation about the lexicon and the syn- tax so that large scale linguistic knowledge bases can be obtained with reasonable efforts. Moreover, the emergence of novel tasks and applications, such as Classiﬁcation/ﬁltering, Web search methods, Opinion Mining [8] asks for the adoption of deep language processing methods that are growingly complex. It is also true that the sharing of large scale resources for language processing, from on-line dictionaries to large scale collaborative encyclopedic resources (such as Wikipedia, as discussed in [6]) supports complex forms of induction and linguistic inferences. Finally, it is the Web that promoted large scale benchmarking champaigns (in the spirit of Information Retrieval stan- dard competitions such as TREC), as in the case of the SemEval challenges [3]. 1724-8035/12/$27.50 © 2012 – IOS Press and the authors. All rights reserved