2011 IEEE International Conference on Fuzzy Systems
June 27-30, 2011, Taipei, Taiwan
978-1-4244-7317-5/11/$26.00 ©2011 IEEE
A Computational Linguistic Approach for the
Identification of Translator Stylometry using
Arabic-English Text
Heba El-Fiqi
School of Engineering and
Information Technology
University of New South Wales
ADFA campus
Canberra, Australia
H.El-Fiqi@student.adfa.edu.au
Eleni Petraki
Faculty of Arts and Design
University of Canberra
Canberra, Australia
Eleni.Petraki@canberra.edu.au
Hussein A. Abbass
School of Engineering and
Information Technology
University of New South Wales
ADFA campus
Canberra, Australia
H.Abbass@adfa.edu.au
Abstract- Translator Stylometry is a small but growing area of
research in computational linguistics. Despite the research
proliferation on the wider research field of authorship attribution
using computational linguistics techniques, the translator
stylometry problem is more challenging and there is no sufficient
literature on the topic. Some authors even claimed that this
problem does not have a solution; a claim we will challenge in this
paper. We present an innovative set of translator stylometric
features that can be used as signatures to detect and identify
translators. The features are based on the concept of network
motifs: small graph local substructures which have been used
successfully in characterizing global network dynamics. The text is
transformed into a network, where words become nodes and their
adjacencies in a sentence are represented through links. Motifs of
size 3 are then extracted from this network and their distribution
is used as a signature for the corresponding translator.
We then investigate the impact of sample size, method of
normalization and imbalance dataset on classification accuracy.
We also adopt the Fuzzy Lattice Reasoning Classifier (FLR)
among others, where FLR achieved the best performance with a
classification accuracy reaching the 70% mark.
Keywords-component; Translator Stylometry; Authorship
Attributions; Network Motifs; Decision Tree Analysis; Fuzzy
Classifier; Computational linguistics; Arabic-English Corpus.
I. INTRODUCTION
Identifying the author of a text is an important area of
research in “Computational Linguistics” [1, 2]. There are many
studies in “Authorship Attributions” [1-14]. These studies use
the stylometric features of the authors to identify the original
authors. These stylometric features include lexical, character,
syntactic, semantic, and application-specific features [1-14].
On the one hand, the problem of authorship attributions is a
difficult one and research challenges remain to exist in this area.
On the other hand, the sub-problem of translator attributions is
even harder and no solution for it exists so far.
Translation is a fascinating topic. While the original writer
of a text had a specific mental picture in her mind and an
intended message to be communicated through the text, a
translator faces a different type of challenge. Successful
translation necessitates that the translator needs to form the
same mental picture as the original author of the text. Good
translation does not stop at the level of mapping words, but
extends to mapping meaning, mental pictures, imagination, and
feelings. This is called the “loyalty” dilemma, where a wide
discussion in the literature exists on the importance of
maintaining the spirit of the original work.
If we compare author attributions to translator attributions,
we find that the former is expected to have more signatures or
discriminatory factors representing the choices made by the
authors. Authors have many more degrees of freedom, where
they can build their own identity as authors. Translators have
less. Being constrained with the original text is a non-trivial
limitation. This feature alone makes translator attributions a
more difficult problem than author attributions. Nevertheless,
we conjecture that translators attempt to have their own touch,
signatures that can be used to detect who translated what. This is
the hypothesis we hold in this paper.
The problem of how to identify translator stylometry is
under-studied in the literature; probably because it is a harder
problem. Some argue that translated work is considered as the
original author’s literature work rather than the translator’s own
work; however, no one can ignore the fact that translators are
individuals [15]; they make personal choices which can affect
the translation process. This is what we call “Translator
Stylometry”.
In this paper, we are going to introduce a new method that
uses network motifs to identify the difference in translator’s
style. Our hypothesis is reformulated as “network motifs can be
used to differentiate between different translators based on their
own writing stylometrics”. To test this hypothesis, we represent
our datasets as networks. This is done by generating a word
adjacency network for each piece of work by a translator in the
dataset. To analyze and compare two networks, we can use their
global statistical features; these include Shortest-path length,
global centrality, clustering coefficient, etc.., or their structural
design principles like the network motifs. Network motifs which
are initially introduced by Ron Milo et al. [16] are patterns
2039