Model validation supplement to “Model trees with topic model pre-processing: An approach for data journalism illustrated with the WikiLeaks Afghanistan War Logs ” Thomas Rusch, Paul Hofmarcher, Reinhold Hatzinger and Kurt Hornik January 8, 2013 In this supplement to the paper “Fatalities in the WikiLeaks Afghanistan War Logs: An approach for data journalism using model trees with topic model pre- processing”we explain in detail how we validated the model trees and thus expand on Section 5 in the main paper. We start with using a bootstrap resampling approach to make in-bag and out-of-bag predictions of observations and assess stability of the trees structure, the segmentation and report concordance based on a segment-wise Jaccard index. We then establish stability and reproducibility of the segment-wise parameter estimates. Lastly, we assess fit of the local models and show for each segment that the observations meet the requirements for assuming a negative binomial model to hold. 1 Tree Validation With the model tree approach suggested in the main paper we build an exploratory model of fatalities in the overall WikiLeaks war diary. The model tree approach allows to use split variables to identify segments of the overall data to which a series of local models is fitted. Hence instead of validating the results globally (e.g., in terms of prediction or fit) the validation needs to be concerned with three aspects: How stable/reproducible is the tree structure (and hence the segmentation), how stable/reproducible are the segment-wise parameter estimates, and how well can the local models describe the observations in the segments (next section). In this section we address validation of model trees by assessing stability and reproducibility of the tree structure and the segmentation, and the parameter estimates. We find that we have ten stable segments in the original tree which are reproducible both in terms of the assigned reports and the estimated parameter values in the segments (R 1 through R 6 , R 8 through R 10 and R 15 ). Among them are five segments that we described in detail in Section 4 of the main paper, associated with the topics“Task Force Reports (Bushmaster)”,“Hostile Contacts ACF 1