1 st Regional Conference on ICT and E-Paradigms 24 th – 26 th June 2004, Colombo, Sri Lanka Development of Standards for Sinhala Computing Gihan Dias and Aruni Goonetilleke ICT Agency Abstract Information technology has been used in Sri Lanka for about 20 years, but to a great extent, its use has been limited to those with a knowledge of English. And as these are a minority of the population, the larger proportion of Sri Lanka’s citizens has not been able to make use of the information revolution. A number of initiatives to introduce Sinhala computing have been made from the 1980s. We outline the progress of this work, and describe the work done by CINTEC and ICTA to support Sinhala computing. The encoding of Sinhala in a standard manner to facilitate information interchange, the development of Sinhala fonts, and Sinhala keyboards are described. We present the rationale of the design, and not simply the final standard. 1. Introduction Currently most computer operating systems, databases and applications in Sri Lanka work only in English. However, the majority of Lankans are more familiar with Sinhala or Tamil, and prefer to use IT in their own language. This has resulted in a gap between how people want to use computers, and what today's computers can do. Although the corporate sector in this country operates mainly in English, small businesses and the government work mostly in Sinhala or Tamil. Individuals use computers for personal work, such as e-mail, at home or in telecentres. Lankans working abroad need to communicate with their relatives in Sri Lanka. The dispersion and effective use of IT in these sectors require that they support our languages. Countries such as Japan, Korea and Thailand, among others, had similar problems when they started using computers. However, Japanese, Korean or Thai language is now standard on computers used in those countries, and it is common for people in those countries to use computers and applications in their own language. Why then didn't local-language computing become common in Sri Lanka? One reason is that unlike in the above countries, a number of Lankans, including many decision and opinion makers, are proficient in English. Also, people assume that using IT requires a knowledge of English, and have not demanded a change. Another reason is the small size of the Lankan market. This situation is now changing. A significant number of non-English-speaking persons want to use Information Technology (IT). This is not limited to the use of computers, but devices such as phones, game consoles etc. This paper documents the initiatives, by a number of persons and organisations, to make IT readily available to Sinhala speakers. A similar initiative, not covered in this paper, is being carried out both in Sri Lanka and abroad, on IT in Tamil. 2. Local Language Support Requirements A number of inter-related elements are required to support a given language in a computer system. These are: Character Encoding: how letters and words are represented in a system. Each letter or other symbol is represented by a code. For documents to be portable across systems, they must be encoded in a standard format. For example, the ASCII code is one (but not the only) method of encoding English text. Fonts: how text is represented on a screen or printer. The same character may be written in several ways, with scope for artistic expression. Text input: from a keyboard, pen, voice recognition system, etc. The most common text input method is the keyboard. Both the keyboard layout, i.e., the assignment of keys to letters, and key sequences, i.e., what sequence of keys yields a given character, should be defined. Application support for the language. In addition to text handling, each application may have local language menus, error messages, help screens, etc. Utilities: such as spelling checkers. 3. Review of Current Sinhala Technology A number of Sinhala fonts and applications are currently available. These may be categorised as:: fonts – which may be used with any application and packages which bundle an application (e.g. a word processor) with a set of fonts. Although these systems are disparate, many of them share a number of features. These are: 8-bit character set: Almost all current fonts are based on an 8-bit character encoding, which limits them to less than 256 symbols. Character codes based on keyboard layout: Many current fonts map Sinhala symbols to the codes used by Roman letters in ASCII. The two most common mapping schemes are based on the Wijesekera keyboard layout and the “phonetic” layout (i.e., where Sinhala letters are placed on the same keys as their English sound-alikes, e.g. ක is on the 'k' key and ග is on the 'g' key). Consequently, the codes allocated to Sinhala letters are