Migemo: Incremental Search Method for Languages with Many Character Faces Satoru Takabayashi and Hiroyuki Komatsu and Toshiyuki Masui Sony Computer Science Laboratories, Inc. Nara Institute of Science and Technology, Graduate School of Information Science Tokyo Institute of Technology, Graduate School of Information Science and Engineering satoru@csl.sony.co.jp, komatsu@matsulab.is.titech.ac.jp, masui@csl.sony.co.jp Abstract We introduce a new incremental search method called Migemo for languages with many character faces. Migemo performs the incremental search by dynamically expand- ing the input pattern into a compact regular expression which represents all the possible words that match the input pattern. We show that Migemo is useful not only for search- ing texts in Japanese and other East Asian languages, but also for performing sophisti- cated searches on ASCII-only documents. 1 Introduction Incremental search is one of the most powerful oper- ations provided in text editors like Emacs, as the sim- plest form of dynamic query. Various text matching algorithms can be used for implementing incremen- tal searches for languages with ASCII characters, but they are not directly applicable to Japanese and other East Asian languages, where keyboard characters do not directly correspond to text characters. In conventional Japanese text editors, users have to select Japanese characters before performing a search. In this case, the advantage of incremental search is almost lost, because Japanese character entry usually takes multiple steps like the following: 1. Type the pronunciation of a word using an ASCII keyboard. 2. Convert the ASCII text into a Kana text which represents the pronunciation of a Japanese word. 3. Convert the Kana text into a set of Kanji words by a Kana-Kanji converter. One pronunciation usually corresponds to more than one Kanji char- acters. For example, “ (machine)”, “ (opportunity)”, and “ (strange)” all have the same pronunciation “kikai”. 4. Select the desired Kanji word from the set of can- didate words. If users want to find “ ” with a conventional in- cremental search method, they have to type more than ten keys 1 . Figure 1 shows the process of selecting “ ” as the search keyword. A search with Kana-Kanji conversion in this way is thus not dynamic at all. Keyboard ‘kikai’ Selector ASCII-Kana Converter Kana-Kanji Converter 1. ASCII Input 2. ASCII-Kana 3.Kana-Kanji 4. Selection ki ka i : KANA k i k a i CONVERT SELECT SELECT RETURN KANA Figure 1: The process of selecting “ (strange)” as the search keyword. 2 Migemo We propose a new incremental search method called Migemo, which solves the problem described in the previous section. Figure 2 shows the process of per- forming incremental search for a Japanese word “ ” with the Migemo for Emacs 2 . In this example, the user typed only four keys ( ˆS k i k ) to find “ ”, which is much easier than typing twelve keys in the previous example. Unlike conversion-based search, the search with Migemo is truly incremental and dynamic, just like an incremen- tal search for ASCII documents is. 1 Using Emacs, the users have to type ˆS KANA k i k a i CONVERT SELECT SELECT RETURN KANA : ˆS starts the incremental search, KANA starts and ends Kana- Kanji conversion, CONVERT converts a Kana text to a set of Kanji words, SELECT selects a Kanji word from the candi- dates of the Kanji words, and RETURN finishes the selection process of Kanji words. 2 http://migemo.namazu.org/.