ROOM: Rule Organized Optimal Matching for Fine-Grained Traffic Identification Hao Li, Chengchen Hu MOE Key Lab for Intelligent Networks and Network Security Department of Computer Science and Technology Xi’an Jiaotong University Abstract—Fine-grained traffic identification (FGTI) reveals the context/purpose of each packet that flows through the network nodes/links. Instead of only indicating the application/protocol that a packet is related to, FGTI further maps the packet to a meaningful user behavior or application context. In this paper, we propose a Rule Organized Optimal Matching (ROOM) for fast and memory efficient fine-grained traffic identification. ROOM splits the identification rules into several fields and elaborately organizes the matching order of the fields. We formulate and model the optimal rule organization problem of ROOM mathematically, which is demonstrated to be NP-hard, and then we propose an approximate algorithm to solve the problem with the time complexity of O(N 2 ) (N is the number of fields in a rule). In order to perform evaluations, we implement ROOM and related work as real prototype systems. Also, real traces collected in wired Internet and mobile Internet are used as the experiment input. The evaluations show very promising results: 1.6X to 104.7X throughput improvement is achieved by ROOM in the real system with acceptable small memory cost. I. I NTRODUCTION Application traffic identification is a fundamental technol- ogy for network monitoring and measurement [1], which empowers people to better monitor, manage and control the networks. The traditional traffic identification, called Coarse- Grained Traffic Identification (CGTI) [2]–[5], only reports at the protocol or application granularity, e.g., the traffic belongs to MSN or skype, but it fails to provide Fine-Grained Traffic Identification (FGTI) information for nowadays’ complicated use, e.g., tweeting using iPhone with Chrome. FGTI system has to employ semantic-based rules instead of regular expression-based (regex-based) rules as CGTI. A regex-based rule will be hit no matter which part of a packet matches with it, while a semantic-based rule is considered matched only when all the values in specified segments (a.k.a., fields) of the packet matches with the rule. This requirement of first splitting the packet into different fields brings higher complexity. In addition, FGTI also employs more rules than CGTI to demonstrate more specific behaviors of application. Even only considering the simpler regex-based rules, it is demonstrated that the combination of a selected rule set with 794 regular expressions consumes 5.29GB memory [6]. FGTI is also different with the Intrusion Detection sys- tem (IDS). Only few packets related to intrusions would be This paper is supported by 973 plan (2012CB315901), 863 plan (SS2013AA010601), NSFC (61272459, 61221063), the Fundamental Re- search Funds for Central Universities. matched in IDS, but every packet should have a match in FGTI system. Besides, IDS system will not do any further identification for a “bad” flow but just reset the connection, while FGTI system checks every single packet of a flow even it has matched for certain behaviors. Therefore, the previous studies cannot be directly used for FGTI. In this paper, we investigate efficient method to support semantic based rules for FGTI so as to provide high identifi- cation throughput with controlled memory consumption. We observe the feature that segments (a.k.a., fields) in different matching rules of FGTI can share the same signature. Based on this observation, we propose a Rule Organized Optimal Matching (ROOM) to eliminate the matching on redundant fields. As a result, we exploit a better tradeoff between the time complexity and the memory complexity. ROOM splits the rules into fields, determines the matching order of each field, and selects only a (small) part of the rules that could be possibly hit to do the matching. We make the following contributions in this paper: • We have proposed ROOM to construct a Layered Match- ing Tree (LMT), which reduces the space complexity and improve throughput for FGTI matching. • We have demonstrated that the construction problem of an optimal LMT is NP-hard and we proposed an approximation algorithm to solve the problem. • We have implemented a real system of ROOM for performance testing, which achieves 5Gbps matching throughput in average with only 20MB memory cost. Compared with previous work, ROOM improves the performance-cost ratio by 1.5 times to 23 times. The remainder of the paper is organized as follows: In Section II, we present our basic idea of this paper. We describe the detailed design of ROOM in Section III. In Section IV, a prototype of ROOM is built and evaluated under real traces. And finally in Section V, we conclude the paper. II. BASIC I DEA In the literature, there are two ways to match a large number of semantic-based rules in CGFI and IDS systems [6, 7]. As shown in Fig. 1(a), the first one constructs one matcher for each rule, by checking each field of the rule sequentially [7]. Matchers are checked one by one as well. The system moves to the next matcher if no match in current matcher, and stops when one of the matchers (rule) is hit. Obviously, the matching