Advances in NLP applied to Word Prediction Carlo Aliprandi 1 , Nicola Carmignani 2 , Nedjma Deha 2 , Paolo Mancarella 2 , Michele Rubino 2 1 Synthema Srl – Pisa, Italy 2 Department of Computer Science – University of Pisa, Italy carlo.aliprandi@synthema.it, {nicola, deha, paolo, rubino}@di.unipi.it Abstract Presenting some recent advances in word prediction, a fluorish- ing research area in Natural Language Processing, we describe FastType, an innovative word prediction system that outclasses typical limitations of standard techniques when applied to in- flected languages. FastType is based on combined statistical and rule-base methods relying on robust open-domain language resources, that have been refined to improve Keystroke Saving. Word prediction is particularly useful to minimise keystrokes for users with special needs, and to reduce misspellings for users having limited language proficiency. Word prediction can be effectively used in language learning, by suggesting correct words to non-native users. FastType has been tried out and eval- uated in some test benchmarks, showing a relevant improve- ment in Keystroke Saving, which now reaches 51%, comparable to what achieved by word prediction methods for non-inflected languages. Index Terms: Word Prediction, Natural Language Processing (NLP), Augmentative and Alternative Communication, Com- puter Aided Language Learning, Speech and Natural Language Interfaces, Assistive Technology 1. Introduction This paper describes an innovative approach to Word Predic- tion, presenting recent results achieved for inflected languages. Word Prediction is the task of guessing words that are likely to follow a given fragment of text. A Word Prediction software is a writing support: at each keystroke it suggests a list of mean- ingful predictions, amongst which the user can possibly identify the word he is willing to type. By selecting a word from the list, the software will automatically complete the word being writ- ten, thus saving keystrokes. Word prediction is facing a very ambitious challenge, as several typical complex problems arising when dealing with Natural Language are to be faced. The inherent amount of aris- ing ambiguities (lexical, structural and semantic ambiguities but also pragmatic, cultural and phonetic ambiguities for speech) are complex problems to be solved by a computer. Many re- search efforts have been experimented and several core NLP tasks have been employed as, for example, Language Modeling, Part-of-Speech (POS) Tagging, Parsing and Lemmatisation. Word prediction has been widely adopted in Augmentative and Alternative Communication (AAC) systems [1], becoming an essential aid for people with motor or cognitive disabilities, in order to reduce the typing effort and to assist learning or language impairments. Indeed, writing text for work, study or communicating is, according to a survey we conducted (as de- scribed in [2]), the most frequent and time-consuming activity for most computer users. Therefore a word predictor would be useful to a very large number of computer users, both disabled and not. FastType is designed to predict words for inflected lan- guages, that is languages that have a large dictionary of word forms with several morphological features, produced from a root or lemma and a set of inflection rules. The degree of in- flection of a language may vary from very high (e.g. Basque), to moderate (e.g. Spanish, Italian, French), to low (e.g. En- glish). The large number of word forms makes word prediction for inflected languages a hard task. As word prediction operates at typing time, any NLP task that can be applied, unlike com- mon NLP analytics which processes complete sentences, has to cope with the further problem of sentence incompleteness. To make word prediction as simple and immediate as pos- sible, we have implemented DonKey, a new human-computer interface. DonKey improves the original, naive, interface of FastType, allowing the user to benefit from automatic word pre- diction in any desktop application. In addition to re-designing the user interface, the underlying prediction engine has been enhanced: we added new resources, like the word and Part-of- Speech n-gram Language Models, and implemented more effi- cient prediction algorithms. Thanks to the upgrades, performances are greatly im- proved. Keystroke Saving reached 51% and is now compara- ble to the one achieved with state-of-the-art methods for non- inflected languages. 2. State of the Art on Word Prediction Word prediction is a research area where a very challenging and ambitious task is faced, basically with methods coming from Artificial Intelligence, Natural Language Processing and Ma- chine Learning. The main goal of word prediction is guessing and complet- ing the word a user is willing to type. Word predictors are in- tended to support writing and are commonly used in combina- tion with assistive devices such as keyboards, virtual keyboards, touchpads and pointing devices. Another potential application is in text-entry interfaces [3] for messaging on mobile phones and typing on handheld and ubiquitous devices (e.g. PDAs or smartphones). Prediction methods have become quite known as largely adopted in mobile phones and PDAs, where multitap is the in- put method. Nuance T9 (formerly Tegic Communications T9) 1 and Zi Corporation eZiText 2 are commercial systems that adopt a very simple method of prediction based on dictionary disam- biguation. At each user keystroke the system selects the letter between the ones associated with the key guessing it from a dic- 1 http://www.nuance.com/t9/ 2 http://www.zicorp.com/eProducts/ZiPredictiveTextSuite/