ROOM: Rule Organized Optimal Matching for Fine-Grained Trafﬁc Identiﬁcation Hao Li, Chengchen Hu MOE Key Lab for Intelligent Networks and Network Security Department of Computer Science and Technology Xi’an Jiaotong University Abstract—Fine-grained trafﬁc identiﬁcation (FGTI) reveals the context/purpose of each packet that ﬂows through the network nodes/links. Instead of only indicating the application/protocol that a packet is related to, FGTI further maps the packet to a meaningful user behavior or application context. In this paper, we propose a Rule Organized Optimal Matching (ROOM) for fast and memory efﬁcient ﬁne-grained trafﬁc identiﬁcation. ROOM splits the identiﬁcation rules into several ﬁelds and elaborately organizes the matching order of the ﬁelds. We formulate and model the optimal rule organization problem of ROOM mathematically, which is demonstrated to be NP-hard, and then we propose an approximate algorithm to solve the problem with the time complexity of O(N 2 ) (N is the number of ﬁelds in a rule). In order to perform evaluations, we implement ROOM and related work as real prototype systems. Also, real traces collected in wired Internet and mobile Internet are used as the experiment input. The evaluations show very promising results: 1.6X to 104.7X throughput improvement is achieved by ROOM in the real system with acceptable small memory cost. I. I NTRODUCTION Application trafﬁc identiﬁcation is a fundamental technol- ogy for network monitoring and measurement [1], which empowers people to better monitor, manage and control the networks. The traditional trafﬁc identiﬁcation, called Coarse- Grained Trafﬁc Identiﬁcation (CGTI) [2]–[5], only reports at the protocol or application granularity, e.g., the trafﬁc belongs to MSN or skype, but it fails to provide Fine-Grained Trafﬁc Identiﬁcation (FGTI) information for nowadays’ complicated use, e.g., tweeting using iPhone with Chrome. FGTI system has to employ semantic-based rules instead of regular expression-based (regex-based) rules as CGTI. A regex-based rule will be hit no matter which part of a packet matches with it, while a semantic-based rule is considered matched only when all the values in speciﬁed segments (a.k.a., ﬁelds) of the packet matches with the rule. This requirement of ﬁrst splitting the packet into different ﬁelds brings higher complexity. In addition, FGTI also employs more rules than CGTI to demonstrate more speciﬁc behaviors of application. Even only considering the simpler regex-based rules, it is demonstrated that the combination of a selected rule set with 794 regular expressions consumes 5.29GB memory [6]. FGTI is also different with the Intrusion Detection sys- tem (IDS). Only few packets related to intrusions would be This paper is supported by 973 plan (2012CB315901), 863 plan (SS2013AA010601), NSFC (61272459, 61221063), the Fundamental Re- search Funds for Central Universities. matched in IDS, but every packet should have a match in FGTI system. Besides, IDS system will not do any further identiﬁcation for a “bad” ﬂow but just reset the connection, while FGTI system checks every single packet of a ﬂow even it has matched for certain behaviors. Therefore, the previous studies cannot be directly used for FGTI. In this paper, we investigate efﬁcient method to support semantic based rules for FGTI so as to provide high identiﬁ- cation throughput with controlled memory consumption. We observe the feature that segments (a.k.a., ﬁelds) in different matching rules of FGTI can share the same signature. Based on this observation, we propose a Rule Organized Optimal Matching (ROOM) to eliminate the matching on redundant ﬁelds. As a result, we exploit a better tradeoff between the time complexity and the memory complexity. ROOM splits the rules into ﬁelds, determines the matching order of each ﬁeld, and selects only a (small) part of the rules that could be possibly hit to do the matching. We make the following contributions in this paper: • We have proposed ROOM to construct a Layered Match- ing Tree (LMT), which reduces the space complexity and improve throughput for FGTI matching. • We have demonstrated that the construction problem of an optimal LMT is NP-hard and we proposed an approximation algorithm to solve the problem. • We have implemented a real system of ROOM for performance testing, which achieves 5Gbps matching throughput in average with only 20MB memory cost. Compared with previous work, ROOM improves the performance-cost ratio by 1.5 times to 23 times. The remainder of the paper is organized as follows: In Section II, we present our basic idea of this paper. We describe the detailed design of ROOM in Section III. In Section IV, a prototype of ROOM is built and evaluated under real traces. And ﬁnally in Section V, we conclude the paper. II. BASIC I DEA In the literature, there are two ways to match a large number of semantic-based rules in CGFI and IDS systems [6, 7]. As shown in Fig. 1(a), the ﬁrst one constructs one matcher for each rule, by checking each ﬁeld of the rule sequentially [7]. Matchers are checked one by one as well. The system moves to the next matcher if no match in current matcher, and stops when one of the matchers (rule) is hit. Obviously, the matching