KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS VOL. 3, NO. 3, June 2009 285 Copyright 2009 KSII Guiding Practical Text Classification Framework to Optimal State in Multiple Domains Sung Pil Choi 1 , Sung-Hyon Myaeng 2 and Hyun-Yang Cho 3 1 Department of Information Technology Research, KISTI 335 Gwahangno, Yuseong-gu, Daejeon, 305-806, South Korea [e-mail: spchoi@kisti.re.kr] 2 School of Engineering, Information and Communications University, 119, Munjiro, Yuseong-gu, Daejeon, 305-732, South Korea [e-mail: myaeng@icu.ac.kr] 3 Library & Information Science Department, Kyonggi University, 94-6, Yiui-dong, Yeongtong-gu, Suwon, Kyonggi-do, 443-760, South Korea [e-mail: hycho@kyonggi.ac.kr] *Corresponding author : Sung Pil Choi Received April 17, 2009; revised June 5, 2009; accepted June 8, 2009; published June 22, 2009 Abstract This paper introduces DICE, a Domain-Independent text Classification Engine. DICE is robust, efficient, and domain-independent in terms of software and architecture. Each module of the system is clearly modularized and encapsulated for extensibility. The clear modular architecture allows for simple and continuous verification and facilitates changes in multiple cycles, even after its major development period is complete. Those who want to make use of DICE can easily implement their ideas on this test bed and optimize it for a particular domain by simply adjusting the configuration file. Unlike other publically available tool kits or development environments targeted at general purpose classification models, DICE specializes in text classification with a number of useful functions specific to it. This paper focuses on the ways to locate the optimal states of a practical text classification framework by using various adaptation methods provided by the system such as feature selection, lemmatization, and classification models. Keywords: Document classification, test bed, machine learning, text categorization, feature selection, text mining, lemmatization The research was supported by Korea Institute of Science and Technology Information (KISTI) under Institutional Grant and partially supported by Korea Research Council of Fundamental Science & Technology (KRCF) Grant, the Korean government. We would like to thank the anonymous reviewers for their critical and insightful comments. DOI: 10.3837/tiis.2009.03.005