Incremental String Correction: Towards Correction of XML Documents Ahmed Cheriat , Agata Savary , B´ eatrice Bouchou, ırian Halfeld Ferrari Universit´ e Fran¸ cois Rabelais de Tours - LI/Campus de Blois, France 3 place Jean Jaur` es - 41000 Blois, France ahmed.cheriat@etu.univ-tours.fr {agata.savary, beatrice.bouchou, mirian}@univ-tours.fr Abstract. We define a problem of an incremental string-to-string correction with respect to a regular grammar. A user is given a valid word which may be updated through one or more editing operations. If the resulting word is invalid we propose correction candidates that take not only the incorrect word but also the initial valid word into account. The method is based on the error distance matrix calculation as proposed by [8]. It has been developed in view of incremental XML document correction (as opposed to correction from scratch). Experimental results show a good performance of our algorithm despite its exponential theoretical complexity. 1 Introduction We introduce an incremental string-to-string correction method with respect to a regular grammar. Given an initial correct (valid ) word A (i.e. a word accepted by a regular grammar), a user can adapt this word to his needs by proposing one or more elementary operations (updates ) on it under the condition that the resulting word B remains valid. If however B happens to be invalid (e.g. due to user’s mistake when performing updates) the system should guess the user’s intention and propose a set of plausible corrections. Thus, we are not willing to search for all nearest neighbors of B in the dictionary but only those that might result from A through a sequence of operations which are similar (but not identical) to the updates proposed by the user. Our solution is to explore the finite-state automaton corresponding to the gram- mar in order to find valid words that are as close as possible to both A and B. Thus, we benefit from the achievements of the string-to-string correction domain ([10], [4]), as well as of their due to the finite-state representation of grammar or lexicon ([8]), while providing some new ideas focused on incrementality. The motivation for the incremental string-to-string correction comes from the area of XML-document validation and correction. The validity of each node in such a document is described by one or more regular expressions. When a user wishes to * Supported by R´ egion Centre, France Partly supported by the IUT of Blois, France 1