Improving Cross-Lingual Transfer through Subtree-Aware Word Reordering Oﬁr Arviv 1 Dmitry Nikolaev 2 Taelin Karidi 1 Omri Abend 1 1 Hebrew University of Jerusalem 2 IMS, University of Stuttgart {ofir.arviv,taelin.karidi,omri.abend}@mail.huji.ac.il dnikolaev@fastmail.com Abstract Despite the impressive growth of the abilities of multilingual language models, such as XLM-R and mT5, it has been shown that they still face difﬁculties when tackling typologically-distant languages, particularly in the low-resource set- ting. One obstacle for effective cross-lingual transfer is variability in word-order patterns. It can be potentially mitigated via source- or target-side word reordering, and numerous ap- proaches to reordering have been proposed. However, they rely on language-speciﬁc rules, work on the level of POS tags, or only target the main clause, leaving subordinate clauses intact. To address these limitations, we present a new powerful reordering method, deﬁned in terms of Universal Dependencies, that is able to learn ﬁne-grained word-order patterns conditioned on the syntactic context from a small amount of annotated data and can be applied at all levels of the syntactic tree. We conduct experiments on a diverse set of tasks and show that our method consistently outperforms strong baselines over different language pairs and model architec- tures. This performance advantage holds true in both zero-shot and few-shot scenarios. 1 1 Introduction Recent multilingual pre-trained language models (LMs), such as mBERT (Devlin et al., 2019), XLM- RoBERTa (Conneau et al., 2020), mBART (Liu et al., 2020b), and mT5 (Xue et al., 2021), have shown impressive cross-lingual ability, enabling effective transfer in a wide range of cross-lingual natural language processing tasks. However, even the most advanced LLMs are not effective when dealing with less-represented languages, as shown by recent studies (Ruder et al., 2023; Asai et al., 2023; Ahuja et al., 2023). Furthermore, annotating sufﬁcient training data in these languages is not a 1 Code available at https://github.com/ OfirArviv/ud-based-word-reordering feasible task, and as a result speakers of underrep- resented languages are unable to reap the beneﬁts of modern NLP capabilities (Joshi et al., 2020). Numerous studies have shown that a key chal- lenge for cross-lingual transfer is the divergence in word order between different languages, which often causes a signiﬁcant drop in performance (Ra- sooli and Collins, 2017; Wang and Eisner, 2018; Ahmad et al., 2019; Liu et al., 2020a; Ji et al., 2021; Nikolaev and Pado, 2022; Samardži´ c et al., 2022). 2 This is unsurprising, given the complex and inter- dependent nature of word-order (e.g., verb-ﬁnal languages tend to have postpositions instead of prepositions and place relative clauses before nom- inal phrases that they modify, while SVO and VSO languages prefer prepositions and postposed rel- ative clauses, see Dryer 1992) and the way it is coupled with the presentation of novel information in sentences (Hawkins, 1992). This is especially true for the majority of underrepresented languages, which demonstrate distinct word order preferences from English and other well-resourced languages. Motivated by this, we present a reordering method applicable to any language pair, which can be efﬁciently trained even on a small amount of data, is applicable at all levels of the syntactic tree, and is powerful enough to boost the performance of modern multilingual LMs. The method, deﬁned in terms of Universal Dependencies (UD), is based on pairwise constraints regulating the linear order of subtrees that share a common parent, which we term POCs for “pairwise ordering constraints”. We estimate these constraints based on the prob- ability that the two subtree labels will appear in one order or the other when their parent has a given label. Thus, in terms of UD, we expect, e.g., lan- guages that use pre-nominal adjectival modiﬁcation 2 From a slightly different perspective, this topic has also been actively studied in the machine translation literature (cf. Steinberger, 1994; Chang and Toutanova, 2007; Murthy et al., 2019). arXiv:2310.13583v1 [cs.CL] 20 Oct 2023