Automatic Extraction of Concealed Relations from Email Logs (Extended Abstract) * Nishith Pathak Department of Computer Science University of Minnesota, Twin Cities Minneapolis, MN, USA npathak@cs.umn.edu Jaideep Srivastava Department of Computer Science University of Minnesota, Twin Cities Minneapolis, MN, USA srivasta@cs.umn.edu ABSTRACT People interact with each other for various reasons. Based on the purpose of the relationship, these interactions ex- hibit certain characteristics. One such important character- istic is that of concealment. Concealed relations can often be a source of interest especially in the domain of counter- terrorism where relations fostering malicious activities tend to be secretive or concealed from the general public. In this paper we propose a technique for extracting concealed re- lations from social network data. The technique analyzes actors’ perceptions regarding other actors’ social interac- tions and requires that they can be constructed from the social network data. One popular communication medium for which this can be done efficiently is electronic mail. The proposed technique uses the popular and robust tf-idf mea- sure from the information retrieval literature to quantify the concept of concealment. We present experimental results from the Enron email corpus. Categories and Subject Descriptors H.4.3 [Information Systems]: Communications Applica- tions—email ; H.3.3 [Information Systems]: Information Search and Retrieval—tf-idf General Terms Algorithms, Measures Keywords Social Netowrk Analysis, email, concealed relations, tf-idf 1. INTRODUCTION Intuitively, a concealed relation can be defined as a rela- tion which is strong but known to only a very small subset of actors. Instances of interaction between two actors can be anything quantifiable such as the number of times they have conversed or the number of emails exchanged between them. By a strong relation we mean a pairwise interaction that is relatively much more frequent than the average frequency of a pairwise interaction in the social network. When we talk * (Produces the NetSci2006-specific release, location and copyright information). For use with netsci2006- submission.cls V1.4. Supported by ACM. Copyright is held by the author/owner(s). NetSci2006, May 22–25, 2006, Bloomington, IN, USA. . about a relationship being perceived by a third actor, we mean that this actor has observed some threshold number of instances of interaction between two actors involved in the relation, thus allowing him/her to have sufficient belief in the existence of their social relationship. There are vari- ous reasons which drive people towards keeping their social activities secret or concealed from the rest of the social net- work. This problem is of interest in the counter-terrorism domain, where individuals involved in malicious activities tend to entertain secret interactions. One of the primary problems in counter-terrorism is that of email surveillance. Some recent efforts in the computer science community have been directed towards this problem [2, 3]. In the social net- work domain Baker and Faulkner[1] have talked about how actors involved in illegal activities tend to focus more on concealment, above other factors. It would also be interest- ing to study the role of concealed relations in the informal network of an organization. In this paper we propose an ap- proach for automatically extracting concealed relations from email header analysis. Wellman[7] has identified how elec- tronic communication is gaining importance as a medium for social networking. In case of email communication, an actor observes only those emails which are addressed to him/her (i.e. the actor is on the To, Cc or Bcc fields) For example, consider an e-mail sent by actor A to B, with Cc to C and Bcc to D. The analysis of the header reveals the following: B and C know that A and B communicated, and that all three of them know about this communication. However, neither B nor C knows that D was also sent this e-mail. A and D know everything, and both of them also know that B and C do not know of D’s getting the e-mail. This analysis illus- trates that a single e-mail can create different beliefs among different people, depending on whether and how they are included in it. Moreover, it also provides information about who perceives which interactions. In [5] the authors provide an approach for construction and analysis of actors’ percep- tions from email logs. The proposed approach is based on the popular tf-idf measure [6] from information retrieval. With the appropriate semantic associations, the tf-idf mea- sure can be transformed into an efficient and robust tech- nique for scoring and ranking relations based on their “level of concealment.” In section 2 we introduce the proposed approach, followed by experimental results in section 3 and conclusions in section 4. 2. PROPOSED APPROACH Consider a social network consisting of N actors. If the set of actors is denoted by A, then for every actor ai A