Applıcatıon of Paragraphs Vectors Model for Semantıc
Text Analysıs
Irina Gruzdo
1[0000-0002-4399-2367]
, Iryna Kyrychenko
1[0000-0002-7686-6439]
,
Glib Tereshchenko
1[0000-0001-8731-2135]
, Olga Cherednichenko
2[0000-0002-9391-5220]
1
Kharkiv National University of Radioelectronics, Kharkiv, Ukraine
{irina.gruzdo,iryna.kyrychenko, hlib.tereshchenko}@nure.ua
2
National Technical University "KhPI", Kharkiv, Ukraine
olha.cherednichenko@gmail.com
Abstract. The paper examined a model of paragraph vectors, as well as its
methods of distributed memory and distributed bag of words. The peculiarity of
this model lies in the definition of the objective functions of individual sentences
and their representation in the form of some local vectors, on the basis of which
a global vector is constructed, which determines the semantic component of the
text as a whole. Various aspects of the application of distributed memory and
distributed bag of words methods were considered, as well as the sets of
algorithms of the underlying distributed memory and distributed bag of words
methods, which allow obtaining distributed vectors of text parts to solve the
problem of determining similar articles, where the search will be carried out key
words, annotations, and articles of various sizes. It was experimentally
established that Doc2Vec and its Bag-of-Words method, the most complete,
allows you to determine borrowing and analogues depending on the structural
elements of the text, in accordance with the review and the task. Also Bag-of-
Words allows the user to make an exact picture of the lexical meaning of a word
and its semantic relations in language and texts.
Keywords: Text Meaning Definition, Semantic Analysis, Latent-Semantic
Analysis, Experiment, Textual Information, Model, Semantic Analysis Library,
Text Analysis, Text Fragment.
1 Introduction
At the present stage of development of information technologies, both worldwide and
in Ukraine, the tasks related to the processing of textual information for solving a num-
ber of tasks such as plagiarism detection, text recognition, highlighting the structural
blocks of text, analysis and issuance of recommendations, etc. [1, 2, 3]. Among all these
tasks, one of the essential problems, which has been solved for more than 60 years and
is the “cornerstone”, is the problem of semantic analysis of the text [1, 4, 5]. In [9–15],
approaches to checking semantic correctness are shown. During the analysis of the pri-
mary sources of the first works devoted to semantic analysis, a tendency was observed
Copyright © 2020 for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).