Journal of Intelligent & Fuzzy Systems xx (20xx) x–xx
DOI:10.3233/JIFS-179007
IOS Press
1
Prediction of reading difficulty
in Russian academic texts
Valery Solovyev
a,∗
, Marina Solnyshkina
b
, Vladimir Ivanov
c
and Ildar Batyrshin
d
a
Research and Education Center on Linguistics named after I.A. Boduen de Kurtene,
Kazan Federal University, Kazan, Russian Federation
b
Department of German Philology, Higher School of Russian and Foreign Philology,
Kazan Federal University, Russian Federation
c
Innopolis University, 1, Universitetskaya Str., Innopolis, Russian Federation
d
Centro de Investigaci´ on en Computaci ´ on, Instituto Polit´ ecnico Nacional, CDMX, Mexico
Abstract. Education policy makers view measuring academic texts readability and profiling classroom textbooks as a primary
task of education management aimed at sustaining quality of reading programs. As Russian readability metrics, i.e. “objective”
features of texts determining its complexity for readers, are still a research niche, we undertook a comparative analysis of
academic texts features exemplified in textbooks on Social Science and examination texts of Russian as a foreign language.
Experiments for 7 classifiers and 4 methods of linear regression on Russian Readability corpus demonstrated that ranking
textbooks for native speakers is a much more difficult task than ranking examination texts written (or designed) for foreign
students. The authors see a possible reason for this in differences between two processes: acquiring a native language on
the one hand and learning a foreign language on the other. The results of the current study are extremely relevant in modern
Russia which is joining the Bologna Process and needs to provide profiled texts for all types of learners and testees. Based
on a qualitative and quantitative analysis of a text, the research offers a guide for education managers to help build consensus
on selecting a reading material when educators have differing views.
Keywords: Text readability, machine learning, Russian academic text, text complexity, examination tests
Introduction
Modern communication as ’the imparting or
exchanging of information by speaking, writing, or
using some other medium’ (Oxford English Dictio-
nary, 1996) implies either generating or receiving a
text, which may be handwritten, printed, electronic
or oral. Successful communication in its turn largely
depends on whether the amount, content and structure
of the quanta of the information sent by its gener-
ator in the text and received by the addressee are
∗
Corresponding author. Valery Solovyev, Research and Educa-
tion Center on Linguistics named after I.A. Boduen de Kurtene,
Kazan Federal University, 18 Kremlyovskaya street, Kazan
420008, Russian Federation. Tel.: +7 843 233 75 12; Fax: +7 843
292 74 18; E-mail: maki.solovyev@mail.ru.
similar or in an ideal situation is the same. Thus,
for the information of any text (written or oral) to
be elicited, processed and stored in the recipient’s
mind, it is important that the text itself aligns with
the cognitive and linguistic abilities of the recipient.
Matching a text to the target audience is a problem rel-
evant in a number of spheres: the military, education,
PR, advertising, government, business, publishing,
medicine and social relations as these are the areas
where communication is the foundation of success.
The research shows that companies suffer damages
and take financial hits if the texts to which they expose
their customers are hard for the average reader to read
[1]. If a text is too easy, i.e. primitive for the audience,
readers lose their interest and stop reading. In modern
science the problem of text complexity is positioned
ISSN 1064-1246/19/$35.00 © 2019 – IOS Press and the authors. All rights reserved
Corrected Proof