Comparing Similarity Calculation Methods in
Conversational CBR
Mingyang Gu, Xin Tong, and Agnar Aamodt
Department of Computer and Information Science, Norwegian University of Science
and Technology, Sem Saelands vei 7-9, N-7491, Trondheim, Norway
Email: mingyang,tongxin,agnar@idi.ntnu.no
Abstract— Conversational Case-Based-Reasoning (CCBR) pro-
vides a mixed-initiative dialog for guiding users to construct their
problem description incrementally through a question-answering
sequence. Similarity calculation in CCBR, as in traditional CBR,
plays an important role in the retrieval process since it decides
the quality of the retrieved case. In this paper, we analyze the
different characteristics of the query (new case) between CCBR
and traditional CBR, and argue that the similarity calculation
method that only takes the features appearing in the query into
account, so called query-biased, is more suitable for CCBR. An
experiment is designed and executed on 36 datasets. The results
show us that on 31 datasets out of the total 36, the CCBR system
using the query-biased similarity calculation method achieves
more effective performance than those using case-biased and
equally-biased similarity calculation methods.
I. I NTRODUCTION
The basic idea underlying case-based reasoning (CBR) [1],
[2] is to reuse the solution to the previous most similar problem
in helping solve the current problem. Before we can reuse any
existing solutions, we have to find the most similar previous
case based on the current problem description.
In traditional CBR processes, users are assumed to be able
to provide a well-defined problem description, and based on
such a description a CBR system can find the most appropriate
previous case (base case). But this assumption is not always
realistic. In some situations, users only have vague ideas about
their target problems at the beginning of retrieval, and tend to
describe them using surface features.
Conversational Case-Based Reasoning (CCBR) [3] pro-
vides a mixed-initiative dialog for guiding users to construct
their problem description incrementally through a question-
answering sequence. In CCBR, a user provides one or sev-
eral explicit features as her initial query (new case). The
CCBR system uses the initial query to retrieve the first set
of candidate cases, and identifies a group of informative
features from them to generate discriminative questions. Both
the retrieved cases and identified discriminative questions
are ranked and shown to the user. The user either finds
the base case to terminate the retrieval process or chooses
a question, which she considers relevant to her task and
can answer explicitly, and provides the answer to it (CCBR
systems usually also prompt the alternative answer options
that correspond to the feature values available in the case
base). An updated query is constructed through combining
the previous query with the newly gained answer. Subsequent
rounds of retrieving and question-answering will cut down the
returned case set iteratively until the user finds her desired
base case, or no discriminative questions are available. That
is, instead of letting a user guess how to describe her target
problem, CCBR discovers a sequence of discriminative ques-
tions helping extract information from the user to construct the
problem description incrementally. CCBR applications have
been successfully fielded, e.g., in the troubleshooting domain
[4], [5] and in the products and services selection [6], [7].
In both traditional CBR and CCBR, one key research topic
is to calculate the similarities between a query and stored cases
to decide which case is most similar to the current problem.
Normally, the similarity between a query and a stored case
is measured by the accumulated similarities on all counted
features. On the one hand, the similarity is influenced by
different methods to calculate the similarity on each feature.
For example, in syntactic methods two cases can be thought
similar on one nominal feature only when they have the
same value on that feature [8], while in knowledge-intensive
methods, two cases with various values on one nominal
feature can possibly be considered as similar through exploring
general domain knowledge [9], [10]. On the other hand, the
similarity is also influenced by the counted feature scope,
i.e. set of the features appearing in the query, in the case,
or in both of them. In this paper, from the perspective of
counted feature scope, we provide a framework to classify
the similarity calculation methods into three categories: case-
biased (features in the stored case), query-biased (features in
the query) and equally-biased (features in both the query and
the stored case).
CCBR research is currently to a large extent focusing on
the discriminative question selecting and ranking to minimize
the cognitive load demanded on users to retrieve the base case
[6], [11], for example, selecting the most informative questions
to ask [6], [12], [13], [14], [15], or using feature inferencing
to avoid asking users the questions which can be answered
implicitly using the currently known information [12], [15].
To our knowledge, there are so far no published results on
how different similarity calculation methods influence on the
performance of a CCBR system.
In this paper, we analyze the differences on query charac-
teristics between traditional CBR and CCBR, and hypothesize
that the similarity calculation method only taking the query
features into account is more suitable for CCBR. An experi-
427 0-7803-9093-8/05/$20.00 ©2005 IEEE.