METHODOLOGIES AND APPLICATION A hybrid group search optimization: firefly algorithm-based big data framework for ancient script recognition T. S. Suganya 1 • S. Murugavalli 1 Ó Springer-Verlag GmbH Germany, part of Springer Nature 2019 Abstract Optical character recognition is becoming one of the widely researched areas in recent times. This research paper presents an optimization framework for ancient script recognition using the process of script or character segmentation. The proposed algorithm is based on evolutionary algorithm and capable of handing a continuous script of high-resolution data using concepts of big data. A hybrid combination of group search and firefly algorithm has been proposed in this research work and compared against recent works. Optimal classifications results are observed and recorded in this research paper. Keywords Character recognition system Optical character recognition (OCR) Segmentation and neural network (NN) Group search algorithm Firefly algorithm 1 Introduction Tamil is one of the ancient languages which can be lined up with Greek, Latin, and Sanskrit. The evolution of Tamil seems to have started from third century BC. The basic structure of Tamil as a language is different from those used now. The evolution of Tamil can be divided into three—the period between third century BC and sixth century AD, Medieval Tamil between sixth century AD and twelfth century AD, and Modern Tamil from twelfth century down to the present day. In Tamilnadu, in the ancient days writers and poets used palm leaves and stone inscription to encrypt their writings. Palm leaf manuscripts are still available of the ancient scriptures, which have been preserved till date; and a best example of palm leaf manuscript is a Tamil grammar book named Tholkappiyam written during fourth century BC. Three different types of works are used in palm manu- scripts and in stone inscriptions. • The documents of lands and buildings that are regis- tered and donated by the kings to people are encrypted. • Literary works including grammar, astrology, poetry, science, and technology are encrypted. • Encryption of historical moments of place and domin- ion also takes place. In pattern recognition, character recognition (Khan et al. 2010) is one of the most difficult tasks. In image processing techniques, there are many difficulties. Many techniques are applied by researchers for breaking through the com- plex problem of recognition of character. Recognizing the sculpting of Brahmi characters in stones, copper plate, clay pot, etc., is a very difficult process. Though researchers are able to convert characters up to 90% accuracy, the con- version words cannot be understood with proper meaning in the sentence. One unique character which is present in Tamil is Aayuthaezhuthu (Pugazhenthi and Vallarasi 2015); there are 12 vowels, 18 consonants and 12 9 18 compound letters, a total of 247 characters and some Sanskrit char- acters such as [,],\,‘. While there are just 26 characters in English, the character set in Tamil is vast. So, character recognition of Tamil language is a difficult job. By joining a vowel with consonant, compound letters are formed. For example, we furnish the first consonant with all vowels. In Tamil, the pitch varies from character to character. The pitch of is greater than that of . Sometimes characters are split even into three parts. Communicated by V. Loia. & T. S. Suganya suganyats2018@gmail.com S. Murugavalli murugavalli26@rediffmail.com 1 Computer Science and Engineering, Panimalar Engineering College, Chennai, India 123 Soft Computing https://doi.org/10.1007/s00500-019-04596-x