Machine Translation Meets Frequent Case Generation in Query Translation Based CLIR Kimmo Kettunen Department of Information Stud- ies, University of Tampere Kanslerinrinne 1, FIN-33014 Tampereen yliopisto kimmo.kettunen@uta.fi Abstract In this paper we introduce evaluation results of Cross-language information retrieval for two small languages, Finnish and Swedish. Our approach is based on machine translation of topics and usage of the Frequent Case Generation method for management of query term variation in translated topics. Retrieval results of more standard query term variation management approaches, such as stemming and lemmatization of translated topics, are also shown. 1 Introduction Cross-language information retrieval (CLIR) has become one of the research areas in information retrieval during the last 10 years. The develop- ment and success of WWW has been one of the key factors that has increased interest in retrieval tasks where the language of queries is other than that of the retrieved documents. There are vast amounts of textual data in various languages available electronically and the textual and lin- guistic abundance increases constantly. Thus there is and will be a social need for retrieval systems, where the user can state his/her search request in native language and get the documents in another language that he/she is capable of un- derstanding to the extent that some information need is satisfied. Although real finished applica- tions of CLIR in the Web still mostly don’ t exist (despite Google’ s Translated Search), it could be approximated that some sort of CLIR applica- tions may reach maturity during 5-10 years. CLIR has many approaches. One of the most popular approaches to CLIR has been query translation. When queries are translated, different methods can be used: either the queries are trans- lated with electronic dictionaries or word lists, with machine translation programs or using large parallel corpora as translation’ s knowledge source. All these query translation methods have been successful and they can also be mixed. Re- cently much research has been done using paral- lel corpora as translation resource, but also all the older methods flourish. (Abusalah et al., 2005; Kishida, 2005; Oard and Riekema, 1998). 2 Frequent Case Generation and MT based CLIR In this paper we shall combine available machine translation programs of two small languages into FCG, Frequent Case Generation, a recent method for management of query term variation. Ma- chine translation has been used in CLIR as a query translation tool e.g. for English, German, French and Spanish, but not much for small lan- guages like Swedish or Finnish. FCG, on the other hand, has been quite recently introduced to monolingual management of query term varia- tion (Kettunen, 2008; Kettunen et al., 2007). It has proven quite successful in management of query term variation for morphologically com- plex or moderately complex languages. Thus it is of interest to verify, if the method can be used in CLIR of these same languages. Airio and Ket- tunen (2008) have tried FCG successfully in CLIR, but in this context it was used with a dic- tionary-based query translation tool, Utaclir (Hedlund, 2003; Hedlund et al., 2004). We shall report evaluation results of machine translated queries from English to Finnish and Swedish. Materials of CLEF 2003 are used in the tests and the process of query translation and re- trieval is arranged as follows: SLTC 2008 61