Ranking Contact Center Conversations using Dynamic Programming based Pattern Matching Meghna Pandharipande § , Upasana Tiwari § , Rupayan Chakraborty, Sunil Kumar Kopparapu TCS Research and Innovation - Mumbai, INDIA {meghna.pandharipande, tiwari.upasana1, rupayan.chakraborty, sunilkumar.kopparapu}@tcs.com Abstract Any customer initiated inbound call is not only essential for the service agent to respond, but also to resolve the problem. Although agents are trained not to be expressive and provide solution timely and in best possible ways, but the course of the conversation actually plays an important role in deciding the outcome of the call, whether it ends with a positive or negative note. Timely identifying dissatisﬁed customer based on the au- tomatic analysis of the conversation to measure the positivity of a call, can aid to improve customer satisfaction index and the call handling capabilities of the service desk. In this pa- per, to automatically analyze the call conversation, we propose a system that extract non-linguistic features, create patterns us- ing multi-dimensional representations, followed by a dynamic algorithm to ﬁnd the similarity measures. Further, we fuse the information regarding the trend in the variation of the affec- tive content throughout the conversation to get a ﬁnal score that quantiﬁes the positiveness in a call. Finally, ranking contact center calls in the decreasing order of the positivity measure and evaluate the system using a ranking agreement metric. Index Terms: call center conversation, pattern matching, dy- namic algorithm, affective content, speaking rate. 1. Introduction Contact centers associated with industries either function as customer service centers, help-desks, or information lines. Their operations involved in marketing and selling prod- ucts, handling customer facing and product related problems, service-related issues, etc [1]. For any industry to stay compet- itive, requires to serve customers to their satisfactions, and to operate the contact centers 24x7. With the boom in Artiﬁcial Intelligence (AI) and Machine Learning (ML) technologies, in- dustries are adopting AI-ML enabled interfaces to reach and serve their customers more reliably, at anytime and from any- where. In fact, majority of the customers still prefer to speak to the customer care center to get their issues resolved. One of the key metrics that is used to measure the performance of the customer care center is customer satisfaction index (CSI) [2]. Mostly, this metric is measured post the interaction between the agent and the customer in the form of a questioner. Since, thou- sands of calls happen on daily basis, it is a tough task for the hu- man agents to manually evaluate all the recorded conversations to take the necessary actions. Moreover, manual evaluations are prone to human-bias, time consuming, and hectic. An al- ternative is to use speech based technologies for automatic and fast evaluations. Automatic speech recognition(ASR) engines have been widely used in call center analytics, which transcribes the conversational speech into texts, followed by text analytics [3, 4, 5]. The main disadvantages of using an ASR are the fol- § Both the authors have equal contribution lowing: (1) it is language dependent (2) in realistic scenario like call centers, where customer can call from any place, the speech to text transcriptions are erroneous, and those errors can prop- agate to subsequent text analytics. On the other hand, affective content analysis is found to be very useful in contact center con- versations [6, 7, 8, 9, 10, 11, 12, 13]. In this paper, we propose an automatic method to rank call conversations by using non-linguistic cues, thus making the sys- tem language independent. Particularly, we make use of two cues throughout the conversations, (1) speaking rate (SR) for agent and customer speech (2) variation of affective content (in arousal and valence scale). Since, the call conversation be- tween agent and customer is recorded on single channel audio, we use speaker diarization at the front end to segment out the agent and customer speech (i.e. ”who spoke when”). There- fore, speaker diarization and SR together imparts information regarding the number of switching between agent and customer, how fast or slow the spoken words are being uttered. Impor- tantly, we observed that these have direct correspondence to the different segments in the ﬂow of a call conversation (e.g. begin- ning, middle or end of a call) [14]. By using this knowledge, we create a simulated representation for an ideal positive call with speaking rate values, and then extract multi-domain fea- tures for generating reference templates correspond to different segment of the call. Given a test call, we segment that into mul- tiple portions (using knowledge created from statistical analy- sis on a set of calls), extract multi-domain features, and match patterns with respect to the reference templates using Dynamic Time Warping (DTW). Minimum distance is considered to be the best similarity between the reference and test templates. As the second non-linguistic cue, we use the variation of the af- fective contents in a call conversation, especially based on our observation that the negative affective contents trend to reduce (on the other side, positive affects trend to increase) in a positive call. Affective models are created from annotated utterances in call conversations (TCS internal). We get posterior probabilities against each of those models given spoken utterances as input. Then, variations of those probabilities along the call conversa- tions are analyzed and fused with the scores derived from DTW to decide the ﬁnal score for ranking the calls. The rest of the paper is organized as follows. Section 2 presents motivation and scenario of this work. Section 3, de- scribe the system in details. Section 4 describes dataset along with experiments and results. Followed by analysis and conclu- sion in section 5. 2. Motivation and scenario Automatic analysis and ranking of contact center calls to ad- dress and improve the customer satisfaction is the main motive of this work. 175 call conversations from (1) Insurance, (2) Finance and (3) Telecoms sector have been used for experimen-