Published in IET Information Security Received on 16th June 2009 doi: 10.1049/iet-ifs.2010.0100 ISSN 1751-8709 Application of fuzzy logic and genetic algorithm in biometric text-independent writer identification S.M. Saad Department of Information Technology, Institute of Graduate Studies and Research, University of Alexandria, 163 El-Horreya Avenue, El Shatby 21526, P.O. Box 832, Alexandria, Egypt E-mail: saad.darwish@gmail.com Abstract: The identification of a person on the basis of scanned images of handwriting is a useful biometric technique with application in forensic document analysis. This study describes the design and implementation of a system that identifies the writer using offline Arabic handwritten text. The key point is using multiple features to capture different aspects of handwriting individuality and to operate at different level of analysis with the aim of improving identification performance. Fuzzy logic (FL) and genetic algorithm (GA) have been used in a complementary fashion to fuse (combine) extracted features as well as to deal with the ambiguity of human judgment of handwritings similarity. GA is used to help construct and tune fuzzy membership functions that are necessary to categorise the strength of existence of handwritings features similarity through FL, with the purpose of yielding high correct identification rates. The final results indicate and clarify that the proposed system achieves an excellent test accuracy of identification rated up to 96% for Arabic text. 1 Introduction With the increase in use of computers in every aspect of life, automatic person identification is becoming an important problem and receives growing interest from both academia and industry. Identification of a person can be implemented by many methods, including most commonly used password or by making use of either the biometric static features of the person (e.g. fingerprint, face, iris pattern) or biometric dynamic features (e.g. handwriting, voice) [1]. Writer identification, the task of determining the writer from his handwriting, is such a technique that satisfies four requirements of personal identification: accessible, cheap, reliable and acceptable [2, 3]. Consequently, in spite of existence of other biometric techniques, it appears that the writer identification still remains an attractive application. Handwriting-based personal identification has a wide variety of potential applications, from security, forensics, financial activities to archaeology (identify ancient document writers). Research into writer identification has been focused on two streams: online and offline writer identification [1, 2]. The former assumes that a transducer device (tablet digitiser) is connected to the computer, which can convert writing movement into a sequence of signals. Online handwriting allows us to use velocity, pressure and spatial information along with pen-up and pen-down events, which are not available with offline data. As a result, online writer identification, compared to the offline one, is easier to achieve high identification accuracy [3]. Unfortunately, online systems are inapplicable in many cases (e.g. archaeology). Therefore developing effective techniques on offline system is a vital task. In comparison, offline systems have been studied as tools that deal with handwritings scanned into a computer file in two-dimensional (2D) image representation. These systems are based on the use of computer image processing and pattern recognition techniques to solve the different types of problems encountered: pre-processing, feature extraction and selection, samples comparison and performance evaluation [4]. Within offline identification systems, depending on how the writer identification is implemented with regard to the registered writer’s samples, the writer identification system can be divided into two approaches, namely text-dependent and text-independent [1, 3, 5] approaches. Text-dependent approach requires handwriting based on a specific text (signature). Commonly, the geometry or structure features of those given characters/words are extracted as the writing features. The major problem with text-dependent systems is that they are not applicable to cases where the text is not available, such as in criminal justice systems when text documents with different content need to be compared. Second, text-dependent systems are more prone to forgery (replay attack) as same data are presented for testing. On the other hand, text-independent systems model the style information, independent of the content and can identify the writer based on any given text. This usually requires the use of statistics of features computed from a large quantity of data to avoid anomalies due to specific text. One of the main difficulties associated with the writer identification task is to define a set of features able to reflect the large variability observed in different samples of script from the same writer over time or from different scriptors. There is no ideal mathematical model that can IET Inf. Secur., 2011, Vol. 5, Iss. 1, pp. 1–9 1 doi: 10.1049/iet-ifs.2010.0100 & The Institution of Engineering and Technology 2011 www.ietdl.org