Article Statistics and Machine Learning Experiments in Poetry Ovidiu Calin Department of Mathematics & Statistics, Eastern Michigan University, Ypsilanti, MI 48197, USA; ocalin@emich.edu Received: 10 June 2020; Accepted: 28 June 2020 First Version Published: 2 July 2020 (doi:10.3390/sci2030048)   Abstract: This paper presents a quantitative approach to poetry, based on the use of several statistical measures (entropy, information energy, N-gram, etc.) applied to a few characteristic English writings. We found that English language changes its entropy as time passes, and that entropy depends on the language used and on the author. In order to compare two similar texts, we were able to introduce a statistical method to asses the information entropy between two texts. We also introduced a method of computing the average information conveyed by a group of letters about the next letter in the text. We found a formula for computing the Shannon language entropy and we introduced the concept of N-gram informational energy of a poetry. We also constructed a neural network, which is able to generate Byron-type poetry and to analyze the information proximity to the genuine Byron poetry. Keywords: entropy; Kullback–Leibler relative entropy; recurrent neural networks; learning 1. Introduction This paper deals with applications of statistics and machine learning to poetry. This is an interdisciplinary ﬁeld of research situated at the intersection of information theory, statistics, machine learning and literature, whose growth is due to the recent developments in data science and technology. This paper topic is important since it contains a quantitative approach to an area that until recently belonged more to the ﬁeld of arts rather than to the ﬁeld of exact sciences. The paper presents ﬁrst some familiar statistical tools that are based on the usage of letter frequency analysis, which have been used since the beginning of the natural languages theory. The underlying idea is that each language has its own letter distribution frequency, which can be characterized by certain statistical measures. For instance, in English, from the most frequent to least frequent, letters are ordered as etaoin shrdlu cmfwyp vbgkjq xz, while in French they are ordered as elaoin sdrétu cmfhyp vbgwqj xz. Even within the same language, writers write slightly different, adapting the language to their own style. The statistical tools used in this paper and calculated for speciﬁc authors are: entropy, informational energy, bigram, trigram, N-gram, and cross-entropy. Moreover, these statistics may be used to prove or disprove authorship of certain texts or to validate the language of a text. These tools have been useful, as each statistic measures a speciﬁc informational feature of the text. However, when it comes to the problem of generating a new text that shares the statistic similarities of a given author, we need to employ recent developments of machine learning techniques. This task can be accomplish by the most recent state-of-the-art advances in character-level language modeling and is presented in the second part of the paper. More speciﬁcally, we use recurrent neural networks (RNNs) to generate Byron-type poetry and then to asses their informational deviation from a genuine Byron text. When this difference of similarity becomes small enough, the network training stops and we obtain a generator for Byron poetry. It is worth noting that a similar neural architecture can be employed to generate poetry similar to other authors. Sci 2020, 2, 48; doi:10.3390/sci2030048 www.mdpi.com/journal/sci