International Journal of Corpus Linguistics 21:1 (2016), 48–79. doi 10.1075/ijcl.21.1.03rue
issn 1384–6655 / e-issn 1569–9811 © John Benjamins Publishing Company
A lectometric analysis of aggregated
lexical variation in written Standard English
with Semantic Vector Space models
Tom Ruette
i
, Katharina Ehret
ii
and Benedikt Szmrecsanyi
i
i
KU Leuven /
ii
University of Freiburg
Lectometry is a corpus-based methodology that explores how multiple language-
external dimensions shape language usage in an aggregate perspective. Te paper
combines this methodology with Semantic Vector Space modeling to investigate
lexical variability in written Standard English, as sampled in the original Brown
family of corpora (Brown, LOB, Frown and F-LOB). Based on a joint analysis
of 303 lexical variables, which are semi-automatically extracted by means of a
SVS, we fnd that lexical variation in the Brown family is systematically related
to three lectal dimensions: discourse type (informative versus imaginative),
standard variety (British English versus American English), and time period
(1960s versus 1990s). It turns out that most lexical variables are sensitive to at
least one of these three language-external dimensions, yet not every dimension
has dedicated lexical variables: in particular, distinctive lexical variables for the
real time dimension fail to emerge.
Keywords: lectometry, lexis, aggregation, Semantic Vector Space models,
Standard English
1. Introduction
Tis paper presents a comprehensive analysis of lexical variation in written
Standard English. Drawing on state-of-the-art lectometric methods (Geeraerts et
al. 1999, Speelman et al. 2003), we explore the extent to which lexical choices in
the Brown family of Standard English corpora (Hinrichs et al. 2010) are systemati-
cally structured by three lectal dimensions, i.e. standard variety, discourse func-
tion, and real time period. Our goal is to ofer a data-driven vision for the study
of lexical variation by introducing Semantic Vector Space models as a means for