16 Introduction Te paper describes the process of building a classifer. It may fnd application as a tool for the content analysis of virtual communities on social media based on the processing of published texts. Te study resulted in the development of two models of the classifer. Tey were based on diferent types of word vector representations, Bag-of-Words and Word Embeddings. Te classi- fer was built using Convolutional Neural Network (CNN). Both models were built with the Keras library (Keras: the Python deep learning API 2020) on Python 3 (Python 3.6.7 documentation 2020). All the experiments were performed on Google Colab (Google Colab 2020) using Tesla T4 GPU. Te classifer assigns one of the two defned labels. If the argument of the maxima of the neural network prediction is 0, the programme assigns the label “non-addicts” and outputs the following message, “Tis text was most likely published in a non-addicts’ virtual community.” In turn, if the argument of the maxima of the neural network prediction is 1, the programme assigns the label “addicts” and outputs the following message, “Tis text was most likely published in an ‘addicts’ virtual community.” Applied Linguistics UDC 81’33, 81’32 https://www.doi.org/10.33910/2687-0215-2020-2-1-16-27 Automatic recognition of messages from virtual communities of drug addicts V. I. Firsanova 1 1 Saint Petersburg State University, 7/9 Universitetskaya Emb., Saint Petersburg 199034, Russia Abstract. The paper describes building a binary classifer with Convolutional Neural Network (CNN) using two diferent types of word vector representations, Bag-of-Words and Word Embeddings. Te purpose of the classifer is to recognise messages published in virtual communities of drug-addicted people. Tis system may fnd application in healthcare as a tool for automatic identifcation of addicts’ communities. It may also provide insights on the features of addicts’ online discourse. Te classifer is based on the dataset from Russian-speaking online VK (VKontakte) communities. Te dataset comprises texts of publications and comments posted in two types of open communities. Te frst type includes communities which actively discuss problems of addiction to psychotropic and psychoactive substance. Te second type of communities focuses on the discussion of private issues — the users share their life stories and ask for help or advice. In the latter case publications are not related to drug addiction issues. Te experiments centered around the development, evaluation and comparative analyses of two models — based on Bag-of-Words and Word Embeddings, respectively. Te neural network training was implemented with the Tesla T4 graphics processing unit on the Google Colab platform. Te model with the best performance showed 0.99 F1-Score and 0.95 Accuracy; however, afer the programme testing, a few weaknesses were found. Te programme still requires retraining on a supplemented dataset which includes publications collected from both addicts’ and non-addicts’ communities describing various mental conditions including depression, anxiety and nervous disorders. Tis opens up an opportunity to create sofware that can automatically distinguish publications made by people struggling with depression caused by the use of psychoactive substances from publications made by people sufering from depressive disorders of a diferent kind. Keywords: text classifcation, Word Embeddings, Bag-of-Words, Convolutional Neural Networks, supervised learning, text categorisation, neural networks, one-hot encoding, classifcation algorithm.