Pattern Recognition Letters 129 (2020) 137–143 Contents lists available at ScienceDirect Pattern Recognition Letters journal homepage: www.elsevier.com/locate/patrec An end-to-end deep learning system for medieval writer identiﬁcation  N.D. Cilia ∗ , C. De Stefano , F. Fontanella , C. Marrocco , M. Molinara , A. Scotto Di Freca Department of Electrical and Information Engineering, University of Cassino and Southern Lazio, Via Di Biasio 43, 03043 Cassino (FR), Italy a r t i c l e i n f o Article history: Received 13 August 2019 Revised 13 November 2019 Accepted 18 November 2019 Available online 19 November 2019 MSC: 41A05 41A10 65D05 65D17 Keywords: Deep learning Transfer learning Writer identiﬁcation Row detection Avila bible Digital paleography a b s t r a c t This paper presents an end-to-end system to identify writers in medieval manuscripts. The proposed system consists in a three-step model for detection and classiﬁcation of lines in the manuscript and page writer identiﬁcation. The ﬁrst two steps are based on deep neural networks trained with transfer learning techniques and specialized to solve the task in hand. The third stage is a weighted majority vote row-decision combiner that assigns to each page a writer. The main goal of this paper is to study the applicability of deep learning in this context when a relatively small training dataset is available. We tested our system with several state-of-the-art deep architectures on a digitized manuscript known as the Avila Bible, using only 9.6% of the total pages for training. Our approach proves to be very effective in identifying page writers, reaching a peak of 96.48% of accuracy and 96.56% of F1 score. © 2019 Elsevier B.V. All rights reserved. 1. Introduction Paleography is the study of ancient and medieval handwrit- ing. An important problem faced by paleographers is to identify the writers, a.k.a. scribes, who contributed to the drawing up of a manuscript. Traditionally, paleographers perform qualitative evalu- ations to distinguish the writers, and in recent years, these tech- niques have been joined by computer-based tools [1] to measure quantities automatically such as height and width of letters, dis- tances between characters, inclination angles, number and types of abbreviations, etc. Recently emerged approaches in digital paleog- raphy combine powerful machine learning algorithms with high- quality digital images of medieval manuscripts. However, tradi- tional techniques require a preliminary feature engineering step that involves an expert in the ﬁeld, thus increasing the application development cost. In recent years, deep-learning-based approaches have received increasing attention from researchers thanks to their ability to handle complex and diﬃcult image classiﬁcation tasks [2]. Deep  Handled by Associate Editor: G. Sanniti di Baja, Ph.D. ∗ Corresponding author. E-mail address: nicoledalia.cilia@unicas.it (N.D. Cilia). neural networks are capable of learning hierarchical feature repre- sentations directly from data, instead of using handcrafted features based on domain-speciﬁc knowledge [3]. Nonetheless, very few studies applied deep learning techniques to the interpretation of medieval manuscripts, and previous approaches were mainly used for identifying sundry elements of interest inside document pages, but not with the speciﬁc focus on writer recognition. In our previous paper [4], we presented preliminary results of a study in which deep neural networks were employed for the identiﬁcation of the scribes in ancient documents. For this aim, we proposed a deep transfer learning solution for row detection and page classiﬁcation obtaining very encouraging results that en- abled us to extend the previous approach and develop an end- to-end system for writer recognition. The proposed approach is based on three steps intended (i) to detect the lines (a.k.a. rows) in each page of the manuscript, (ii) to classify them, and (iii) to recognize the writer of the entire page. The ﬁrst step consists in a deep-learning-based object detector trained in transfer learning on a generic dataset (like MS-COCO [5]) and specialized to solve the task in hand. The second step is a row classiﬁer composed of a fully convolutional feature extractor and a meta-architecture classiﬁer that can be trained both from scratch and in ﬁne tuning. The third stage is a weighted majority vote row-decision combiner https://doi.org/10.1016/j.patrec.2019.11.025 0167-8655/© 2019 Elsevier B.V. All rights reserved.