IJDAR (2017) 20:173–187 DOI 10.1007/s10032-017-0289-3 ORIGINAL PAPER On writer identification for Arabic historical manuscripts Abedelkadir Asi 1 · Alaa Abdalhaleem 1 · Daniel Fecker 2 · Volker Märgner 2 · Jihad El-Sana 1 Received: 2 May 2017 / Revised: 24 July 2017 / Accepted: 25 July 2017 / Published online: 1 August 2017 © Springer-Verlag GmbH Germany 2017 Abstract This paper introduces new methodologies for reli- ably identifying writers of Arabic historical manuscripts. We propose an approach that transforms key point-based fea- tures, such as SIFT, into a global form that captures high-level characteristics of writing styles. We suggest a modification for a common local feature, the contour direction feature, and show the contribution of combining local and global fea- tures for writer identification. Our work also presents a novel algorithm that determines the number of writers involved in writing a given manuscript. The experimental study confirms the significant improvement in this algorithm on writer identification once applied to historical manuscripts. Comprehensive experiments using different features and classification schemes demonstrate the vitality of the sug- gested methodologies for reliable writer identification. The presented techniques were evaluated on both historical and modern documents where the suggested features yielded very promising results with respect to state-of-the-art features. B Alaa Abdalhaleem alaaabd@cs.bgu.ac.il Abedelkadir Asi abedas@cs.bgu.ac.il Daniel Fecker Fecker@ifn.ing.tu-bs.de Volker Märgner maergner@ifn.ing.tu-bs.de Jihad El-Sana el-sana@cs.bgu.ac.il 1 Department of Computer Science, Ben-Gurion University of the Negev, Beersheba, Israel 2 Institute for Communications Technology, Technische Universität Braunschweig, Brunswick, Germany Keywords Writer identification · Writer retrieval · Key point-based features · Contour-based features · Supervised learning · Hierarchical clustering · Classification 1 Introduction Identifying the writer of a handwritten document is an emerg- ing research problem that has been receiving significant interest in recent years. It poses interesting research chal- lenges for document examiners, especially for historical handwritten documents. Paleographers invest a consider- able amount of time to recognize the writer of a questioned manuscript. This explains the acute demand for developing an automatic system for document writer recognition that can scale up to handle the huge amount of digital manuscripts. Such systems provide a list of suspected writers to human experts who still have the main role in determining the indi- viduality of a handwriting. Given a dataset of known writers in a reference dataset, the writer identification task aims to assign one of these writers to a query document image. Writer retrieval task aims to retrieve the document images, out of a set of documents, written by the writer of the query document. It is important to mention that in these tasks a writer is represented by the writing style. In essence, we are identifying and retrieving writing styles and not necessarily writers. However, to stay inline with previous works we use the common terminology from the literature. Recently, a unique challenge for writer recognition in historical manuscripts has emerged. Researchers noticed a writing technique, known as the staggering technique, where different scribes write the same document to induce a one- writer illusion [3]. The staggering technique might seriously distort the performance of automatic writer recognition sys- 123