Analyses of Same Content Texts Written in Different Languages Mentor Hamiti Faculty of Contemporary Sciences and Technologies, South East European University Ilindenska bb, 1200 Tetovo, R. of Macedonia m.hamiti@seeu.edu.mk Agni Dika Faculty of Electrical and Computer Engineering, University of Pristina Pristine, R. of Kosova agnidika@yahoo.com Abstract. Language is the basic and most consummate way of communication between people. It can be materialized in two ways: spoken and written. Every society has a spoken language, even the primitive ones. Although only civilized societies have a written language with a defined alphabet. The presence of letters in the context of words determines the meaning, while the determined order of these in words presents a work of art. It is, thus, understandable to raise the question: Which letter is used the most and least in different languages? Or maybe there is similarity on their distribution even though it has to do with languages which use different alphabets? Or in general, which are the differences or what could different languages have in common when they interpret the same content?! The answer to this question remains within the scope of this paper. Keywords. Language, Text, Alphabet, Analyse, English, Albanian, Macedonian. 1. Introduction Computer language software and their presence on the Internet have become a vital part of communication and modern concepts of the so-called “scientific field”. Linguists have taken seriously the provocation of the computer era in the field of linguistics, because computer linguistics is the only way for protecting, enriching and advancing every language in the world. The aim of this research is to present the continuous development of the languages, including the statistical research component. With the help of the original program, written in C# programming language 1 , and putting the computer in service of different languages, text with same content were written in English, Albanian and Macedonian languages 2 . Linguists can use the gained results for further linguistics research and analyses. 2. Classification of alphabet letters and the specifics of computer based processing The classification of letters is specific for every language. In our case, the English language uses 26 Latin letters. The Albanian alphabet consists of 36 letters. In this case, beside the Latin letters, other double letters are being used as well as two diacritic marks. Whilst, the Macedonian language uses the Cyrillic alphabet, which also differs from the other two languages by the number of letters used (31). Therefore, separate study of each language specifics for computer based processing is needed. 2.1. English language, the specifics of computer based processing All letters from the English alphabet are available on a standard computer keyboard. Therefore, for analytical text analysis, the English language has a definite advantage compared with other languages. All Latin letters are included in the standard keyboard ASCII 3 . This represents another additional benefit, for 1 Text Analyzer, application prepared by authors 2 South East European University, Official languages: English, Albanian and Macedonian 3 ASCII- American Standard Code for Information Interchange 527 Proceedings of the ITI 2009 31 st Int. Conf. on Information Technology Interfaces, June 22-25, 2009, Cavtat, Croatia