Comparing Similarity Calculation Methods in Conversational CBR Mingyang Gu, Xin Tong, and Agnar Aamodt Department of Computer and Information Science, Norwegian University of Science and Technology, Sem Saelands vei 7-9, N-7491, Trondheim, Norway Email: mingyang,tongxin,agnar@idi.ntnu.no Abstract— Conversational Case-Based-Reasoning (CCBR) pro- vides a mixed-initiative dialog for guiding users to construct their problem description incrementally through a question-answering sequence. Similarity calculation in CCBR, as in traditional CBR, plays an important role in the retrieval process since it decides the quality of the retrieved case. In this paper, we analyze the different characteristics of the query (new case) between CCBR and traditional CBR, and argue that the similarity calculation method that only takes the features appearing in the query into account, so called query-biased, is more suitable for CCBR. An experiment is designed and executed on 36 datasets. The results show us that on 31 datasets out of the total 36, the CCBR system using the query-biased similarity calculation method achieves more effective performance than those using case-biased and equally-biased similarity calculation methods. I. I NTRODUCTION The basic idea underlying case-based reasoning (CBR) [1], [2] is to reuse the solution to the previous most similar problem in helping solve the current problem. Before we can reuse any existing solutions, we have to ﬁnd the most similar previous case based on the current problem description. In traditional CBR processes, users are assumed to be able to provide a well-deﬁned problem description, and based on such a description a CBR system can ﬁnd the most appropriate previous case (base case). But this assumption is not always realistic. In some situations, users only have vague ideas about their target problems at the beginning of retrieval, and tend to describe them using surface features. Conversational Case-Based Reasoning (CCBR) [3] pro- vides a mixed-initiative dialog for guiding users to construct their problem description incrementally through a question- answering sequence. In CCBR, a user provides one or sev- eral explicit features as her initial query (new case). The CCBR system uses the initial query to retrieve the ﬁrst set of candidate cases, and identiﬁes a group of informative features from them to generate discriminative questions. Both the retrieved cases and identiﬁed discriminative questions are ranked and shown to the user. The user either ﬁnds the base case to terminate the retrieval process or chooses a question, which she considers relevant to her task and can answer explicitly, and provides the answer to it (CCBR systems usually also prompt the alternative answer options that correspond to the feature values available in the case base). An updated query is constructed through combining the previous query with the newly gained answer. Subsequent rounds of retrieving and question-answering will cut down the returned case set iteratively until the user ﬁnds her desired base case, or no discriminative questions are available. That is, instead of letting a user guess how to describe her target problem, CCBR discovers a sequence of discriminative ques- tions helping extract information from the user to construct the problem description incrementally. CCBR applications have been successfully ﬁelded, e.g., in the troubleshooting domain [4], [5] and in the products and services selection [6], [7]. In both traditional CBR and CCBR, one key research topic is to calculate the similarities between a query and stored cases to decide which case is most similar to the current problem. Normally, the similarity between a query and a stored case is measured by the accumulated similarities on all counted features. On the one hand, the similarity is inﬂuenced by different methods to calculate the similarity on each feature. For example, in syntactic methods two cases can be thought similar on one nominal feature only when they have the same value on that feature [8], while in knowledge-intensive methods, two cases with various values on one nominal feature can possibly be considered as similar through exploring general domain knowledge [9], [10]. On the other hand, the similarity is also inﬂuenced by the counted feature scope, i.e. set of the features appearing in the query, in the case, or in both of them. In this paper, from the perspective of counted feature scope, we provide a framework to classify the similarity calculation methods into three categories: case- biased (features in the stored case), query-biased (features in the query) and equally-biased (features in both the query and the stored case). CCBR research is currently to a large extent focusing on the discriminative question selecting and ranking to minimize the cognitive load demanded on users to retrieve the base case [6], [11], for example, selecting the most informative questions to ask [6], [12], [13], [14], [15], or using feature inferencing to avoid asking users the questions which can be answered implicitly using the currently known information [12], [15]. To our knowledge, there are so far no published results on how different similarity calculation methods inﬂuence on the performance of a CCBR system. In this paper, we analyze the differences on query charac- teristics between traditional CBR and CCBR, and hypothesize that the similarity calculation method only taking the query features into account is more suitable for CCBR. An experi- 427 0-7803-9093-8/05/$20.00 ©2005 IEEE.