A Data-driven Classiﬁcation Framework for Conﬂict and Instability Analysis Kihoon Choi and Krishna R. Pattipati Dept. of Electrical and Computer Engineering University of Connecticut Storrs, CT 06269 USA Email: {kihoon, krishna}@engr.uconn.edu Victor Asal Dept. of Political Science University at Albany Albany, NY 12203 USA Email: VAsal@uamail.albany.edu Abstract—Is it possible to identify and even forecast well in advance (6-12 months) the relative stability of a state to enable policy makers to successfully intervene? How does one acquire that understanding? One technique is to model and understand the social factors, which summarize the background conditions, attributes and performance factors of the country over time. The purpose of this paper is to: (1) present a generalized data-driven framework for conﬂict analysis and forecasting, (2) show that state-of-the-art pattern classiﬁcation techniques provide signif- icant improvements to forecasting accuracy, and (3) introduce classiﬁcation problems arising in social sciences to the engineering community for further enhancement of analysis techniques. We evaluate the efﬁcacy of our data-driven framework on macro- structural factors as relevant contributors to country instability, delineating the independent and dependent variables. The results demonstrate signiﬁcant improvement over previous approaches in classiﬁcation metrics of accuracy, precision, and recall. Index Terms—data-driven, social science, conﬂict analysis, sup- port vector machine (SVM), support vector machine regression (SVMR), data imputation, forecasting. I. I NTRODUCTION For all of recorded history, leaders have been going to war and interfering in the politics of other political entities [1]. For about as long, leaders have been making bad decisions based on faulty information and problematic assessments [2]. Battling with misperceptions [3] and inherent problems of group decision making [4] have plagued political decision makers trying to make decisions about security issues. Despite the inherent challenges that humans present as a subject of study – particularly when the stakes are high – those who have studied conﬂict academically have been able to identify repeated patterns that should be useful to policy makers [5][6][7][8][9]. Over time, leaders have made use of subject matter experts with varying levels of breadth of experience to help them think about policy [10]. The problem is that experts in the policy community “are not mere objective experts. They develop belief systems, opera- tional codes, theories, and agendas; they are subject to the same cognitive, social, psychological and group dynamics that affect decision makers.” [10] This is especially problematic when an expert has a narrow band of knowledge that recognizes little 0 The author, K. Choi, has contributed to this work as part of internship at Qualtech Systems Inc. (http://www.teamqsi.com). variance, or when the “expert” has a very explicit agenda of his or her own (Ahmed Abdel Hadi Chalabi would be a strong example of this problem [11]). Even when experts are right, of course, there is no guarantee that they will be listened to [12], nor is there often an effort to put conﬁdence intervals around their recommendations. Nonetheless the problem is deeper than simply experts not being listened to. Often subject matter experts are asked not what they know (based on some kind of rigorous analysis), but what they think. “Evidence from surveys suggests that forecasts of decisions in conﬂicts are typically based on experts’ unaided judgments.” [13] Asking experts to make predictions based on what they think is dangerous, for they are often wrong and no more right than novices [13]. The lack of some kind of check on the unaided forecasts by experts is particularly problematic, because they often disregard data that disagrees with their views [14]. Nonetheless substantive testable relevant social science forecasting is possible [15]. Despite this, useful information is not being brought to bear in key decisions that have important political and economic implications [15]. Part of the problem is that even when effective methodology is employed, it has not always been able to penetrate the policy world [1]. Despite these obstacles, aggressive pursuit of useful testable compu- tational analysis of social science phenomena is worthwhile, given the costs that are extracted when experts are giving their opinions and not required to support them empirically [12]. This is especially true given that there is signiﬁcant evidence that forecasting based on rigorous empirical methods can produce useful policy relevant results [15][16]. Despite the likely utility of this kind of effort, the social sciences have been handicapped in trying to produce this kind of research. Much of the work in political science has focused on in-sample prediction and not looked at out of sample analysis, which is critical for doing useful forecasting [17]. In addition, many of these efforts do not exploit the full array of methods that can be brought to bear on the problem of forecasting [18]. When this kind of analysis is done, the results can actually be quite impressive [17][19][20]. Part of the problem though is disciplinary. Very rarely do computational scientists and social scientists collaborate on this kind of work together [21]. Our effort in this paper is to bridge this gap and illustrate that when the theories and 114 1-4244-2384-2/08/$20.00 c  2008 IEEE