mathematics Article Author Identification Using Chaos Game Representation and Deep Learning Catalin Stoean 1,2, * and Daniel Lichtblau 3 1 Human Language Technology Research Center, University of Bucharest, 010014 Bucharest, Romania 2 Grupo Ingeniería de Sistemas Integrados, E.T.S.I. Telecomunicación, Universidad de Málaga, 29071 Málaga, Spain 3 Wolfram Research, Champaign, IL 61820, USA; danl@wolfram.com * Correspondence: catalin.stoean@fmi.unibuc.ro Received: 14 September 2020; Accepted: 29 October 2020; Published: 2 November 2020   Abstract: An author unconsciously encodes in the written text a certain style that is often difficult to recognize. Still, there are many computational means developed for this purpose that take into account various features, from lexical and character-based attributes to syntactic or semantic ones. We propose an approach that starts from the character level and uses chaos game representation to illustrate documents like images which are subsequently classified by a deep learning algorithm. The experiments are made on three data sets and the outputs are comparable to the results from the literature. The study also verifies the suitability of the method for small data sets and whether image augmentation can improve the classification efficiency. Keywords: authorship attribution; chaos game representation; deep learning 1. Introduction The style of every author is encoded into the documents that the person has written, be that these are books, articles, or simply emails or some small statement in a social network. The style gets more consistent as the amount of text written is larger. The authorship attribution (AA) task assumes that there is a training set of documents for which the authors are known and another test set of texts where the writer is not acknowledged in advance but these are, in general, written by one of the authors from the training samples. The goal is to determine who is the author for each of the test samples. Most computational approaches deal with counting of various components in the texts for determining the style of the writer and, accordingly, the author. A good survey for such methods can be read in Reference [1] or in an overview for a competition in AA [2]. The method put forward in the current research suggests a solution that is very different from the standard approaches. While in general the trend is to transform images to text to get their meaning, like in Reference [3], herein we aim to encode the entire text in one image. To provide a simple and intuitive overview, the present methodology proposes that the text documents are substituted by a chaos game representation (CGR) and the obtained images subsequently fed to a deep learning (DL) method to learn specific characteristics for each author from the obtained illustrations. The DL could next distinguish between similar representations that result from new test documents which are also transformed via CGR. In a previous proposed approach [4], CGR proved to be efficient for encoding the style of the writers with the same goal of determining the authors, but using a shallow classifier, and not the indisputable power of DL for image processing. However, the latter comes with a high demand, that of supplying a large amount of data to train on, which is not always available. Nevertheless, the goal of the current article is to study the suitability of the CGR-DL tandem for the AA task and there are two benchmark problems that are considered for experimentation. Mathematics 2020, 8, 1933; doi:10.3390/math8111933 www.mdpi.com/journal/mathematics