Intelligenza Artificiale 6 (2012) 117–119
DOI 10.3233/IA-130039
IOS Press
117
Guest Editorial
Natural language processing in the web era
Roberto Basili
a,*
and Bernardo Magnini
b
a
Department of Enterprise Engineering, University of Roma, “Tor Vergata”, Roma, Italy
b
Fondazione Bruno Kessler, Trento, Italy
Abstract. Natural Language is still the main carrier for the definition, synthesis and exchange of knowledge in the real world,
and this is entirely reflected by Web contents. No interpretation process over Web data is really possible without a more or less
explicit reference to natural language(s), the primordial soup from which semantics emerges. The advantage and opportunities
for NLP research are evident.
This paper introduces the Special Issue of the journal on NLP in the Web era by first discussing some opportunities for current
NLP research and then summarizing the contribution gathered by the volume.
Keywords: Natural language processing, web applications, corpus-based methods, social web analysis
1. Introduction
Natural Language is still the main carrier for the defi-
nition, synthesis and exchange of knowledge in the real
world, and this is entirely reflected by Web contents.
Although the growing levels of integration, multichan-
nel and modalities of the information made available in
the current Web, thus including the Social Web bodies
of resources, the central role of language in e-mails,
blogs, twits as well as in multimedia pages cannot be
denied. Even when multimedia information is made
available (as for example, pictures, videos, audio files
or digital artworks) natural language is still central as
the core vehicle of explanations and complementary
crucial information. The role of the annotation pro-
cesses that enriches these data with linguistic metadata
for the localization, retrieval and delivery of the under-
lying information is evident. No hermeneutic process
over such data is really possible without a more or
less explicit reference to (possibly multiple) natural
*
Corresponding author. Roberto Basili, Department of Enterprise
Engineering, University of Roma, “Tor Vergata”, Via del Politec-
nico 1, 00133 Roma, Italy. Tel./Fax: +39 06 72597391; E-mail:
basili@info.uniroma2.it.
language(s), that are thus the “primordial soup” from
which semantics emerges.
The advantage and opportunities for NLP research
are evident. In the Web, sources of rich information
about language are largely and freely made available.
In line with the 90’s studies on corpus-driven lin-
guistic knowledge induction and lexical acquisition,
several research initiatives (such as, for example the
Web as a corpus one [7, 4]) use the Web as the source
of useful observation about the lexicon and the syn-
tax so that large scale linguistic knowledge bases can
be obtained with reasonable efforts. Moreover, the
emergence of novel tasks and applications, such as
Classification/filtering, Web search methods, Opinion
Mining [8] asks for the adoption of deep language
processing methods that are growingly complex. It
is also true that the sharing of large scale resources
for language processing, from on-line dictionaries to
large scale collaborative encyclopedic resources (such
as Wikipedia, as discussed in [6]) supports complex
forms of induction and linguistic inferences. Finally,
it is the Web that promoted large scale benchmarking
champaigns (in the spirit of Information Retrieval stan-
dard competitions such as TREC), as in the case of the
SemEval challenges [3].
1724-8035/12/$27.50 © 2012 – IOS Press and the authors. All rights reserved