Research Article
Volume - 8 Issue - April 2018
DOI: 10.19080/JFSCI.2018.08.555733
J Forensic Sci & Criminal Inves
Copyright © All rights are reserved by Pooja Ahuja
Authorship Profiling of Instant Messaging Sites
based on Stylistic and Stylometric Analysis
Gloria Christal
1
, Prajakta Manve
1
, Pooja Ahuja*
2
and MS Dahiya
3
1Student, MSc. Forensic science, Institute of Forensic Science, Gujarat Forensic Sciences University, India
2Assistant professor, Institute of Forensic Science, Gujarat Forensic Sciences University, India
3Director, Institute of Forensic Science, Gujarat Forensic Sciences University, India
Submission: March 16, 2018; Published: April 04, 2018
*Corresponding author: Pooja Ahuja, Assistant professor, Institute of Forensic Science, Gujarat Forensic Sciences University, India,
Email:
J Forensic Sci & Criminal Inves 8(2): JFSCI.MS.ID.555733 (2018)
001
Introduction
The increase in popularity of the Internet media, like emails,
blogs/internet forum and websites have been identified as the
ideal communication platform for people and one such medium
is Instant Messaging (IM) which has gained prominence recently
with rise of the Internet. Instant messaging is a type of online
chat that offers real-time text transmission over the Internet.
The Global Web Index report was conducted across 32 markets
involving 170,000 internet users. The study shows that 52 per
cent of Indian instant messaging users are on WhatsApp, while
42 per cent use Facebook Messenger, 37 per cent use Skype, We
Chat has a 26 per cent share in the market and Viber with 18 per
cent market share is in the fifth spot IM is a set of communication
technologies used for text-based communication between two or
more participants over the Internet or other types of networks
instantly [1].
Forensic linguistics, legal linguistics, or language and the law,
is the application of linguistic knowledge, methods and insights
to the forensic context of law, language, crime investigation,
trial, and judicial procedure. Applications of forensic linguistics
include voice identification, interpretation of expressed
meaning in laws and legal writings, analysis of discourse in
legal settings, interpretation of intended meaning in oral and
written statements, authorship identification and interpretation
and translation when more than one language must be used
in a legal context.” Forensic stylistics is the application of the
science of linguistic stylistics to forensic contexts. Common
features of style include the use of dialogue, including regional
tones and pronunciation and individual dialects (or ideolects),
the use of grammar which includes the observation of active
voice and passive voice, the use of particular language registers,
the distribution of sentence lengths and etc. Stylometrics is
a development of literary stylistics, which is based on the
assumption that all authors have individual writing habits. These
writing habits can be seen in features such as core vocabulary
use, phraseology and sentence complexity and all these features
are unconscious habits which are well ingrained. Furthermore,
it is also concerned with locating textual features which can
be used for determining authorship of a text/ writings. This
is achieved by having a sample of known authored texts from
different authors which can be compared to a disputed text.
Stylometrics is a quantitative analysis [2-5].
Thus, Stylometric approaches seek to find or describe
quantifiable markers of authorship, which in the general sense
vary more between authors than within authors. Typical
stylometric markers include relative frequencies of different
word classes or even non word letter clusters. The style markers
can be categorized as character-based, word-based, sentence-
based, document based, structural or syntactic. A few examples
of style markers include: function word usage (common adverbs,
auxiliary verbs, conjunctions, prepositions and pronouns);
word collocations; sentence length and punctuation. Author
identification, is the task of determining the author of a piece of
work. There are following two types of variation that a forensic
expert faces Intra-author variation refers to the ways in which an
author’s text differs from another text written by the same author,
whereas inter-author variation refers to the ways texts vary
between different authors. However, in this study, the only causes
of variation that could have any bearing are time lapse, if some
time has passed between posts, and change in circumstances if
the writer has undergone any recent changes in her life. Despite
Facebook being a social networking site, there could be socio
metric parameters, as it is common for an individual to have on
Abbreviations :IM: Instant Messaging; WST: Wordsmiths Tools