Deterministic Automata on Unranked Trees Julien Cristau 1 Christof L¨ oding 2 Wolfgang Thomas 2 1 LIAFA, Universit´ e Paris VII, France 2 RWTH Aachen, Germany Abstract. We investigate bottom-up and top-down deterministic au- tomata on unranked trees. We show that for an appropriate definition of bottom-up deterministic automata it is possible to minimize the number of states efficiently and to obtain a unique canonical representative of the accepted tree language. For top-down deterministic automata it is well known that they are less expressive than the non-deterministic ones. By generalizing a corresponding proof from the theory of ranked tree au- tomata we show that it is decidable whether a given regular language of unranked trees can be recognized by a top-down deterministic au- tomaton. The standard deterministic top-down model is slightly weaker than the model we use, where at each node the automaton can scan the sequence of the labels of its successors before deciding its next move. 1 Introduction Finite automata over finite unranked trees are a natural model in classical lan- guage theory as well as in the more recent study of XML document type defi- nitions. In the theory of context-free languages, unranked trees (i.e. trees with finite but unbounded branching) arise as derivation trees of grammars in which the right-hand sides are regular expressions rather than single words ([BB02]). The feature of finite but unbounded branching appears also in the tree repre- sentation of XML documents. The generalization of tree automata from the case of ranked label alphabets to the unranked case is simple: A transition e.g. of a bottom-up automaton is of the form (L, a, q), allowing the automaton to assume state q at an a-labeled node with say n successors if the sequence q 1 ...q n of states reached at the roots of the n subtrees of these successors belongs to L. Most core results of tree automata theory (logical closure properties, decidability of non-emptiness, inclusion, and equivalence) are easily transferred to this framework of “unranked tree automata” and “regular sets of unranked trees”. For certain other results of classical tree automata theory, however, such a transfer is less obvious and does not seem to be covered by existing work. In the present paper we deal with two such questions: the problem of automaton minimization, and the definition and expressive power of top-down automata (i.e. automata working from the root to the leaves, more closely following the pattern of XML query processing than the bottom-up version). We confine ourselves to the question of tree language recognition; so we do not address models like the query automata of [NS02] or the transducers of [MSV03]. Dagstuhl Seminar Proceedings 05061 Foundations of Semistructured Data http://drops.dagstuhl.de/opus/volltexte/2005/228