Efficient Multistriding of Large Non-deterministic Finite State Automata for Deep Packet Inspection Matteo Avalle, Fulvio Risso, Riccardo Sisto Dipartimento di Automatica e Informatica, Politecnico di Torino, Torino, Italia Email: {matteo.avalle, fulvio.risso, riccardo.sisto}@polito.it Abstract—Multistride automata speed up input matching be- cause each multistriding transformation halves the size of the input string, leading to a potential 2x speedup. However, up to now little effort has been spent in optimizing the building process of multistride automata, with the result that current algorithms cannot be applied to real-life, large automata such as the ones used in commercial IDSs, because the time and the memory space needed to create the new automaton quickly becomes unfeasible. In this paper, new algorithms for efficient building of multistride NFAs for packet inspection are presented, explaining how these new techniques can outperform the previous algorithms in terms of required time and memory usage. I. I NTRODUCTION Deep packet inspection is still at the foundation of many security tools, such as Intrusion Detection Systems (IDS), firewalls, spam filters and more. While string matching was the most common technique used in the past, the complexity of nowadays attacks requires the deployment of sophisticated tools based on regular expressions (regex). One of the most memory-efficient ways to represent a regular expression (or a set of regular expressions) is based on Nondeterministic Finite-state Automata (NFA). An NFA can be seen as an oriented graph in which nodes represent states and arcs represent transitions, labeled with the input symbols (bytes). A simple example of this type of representation can be found in Figure 1; more details about NFA will be presented in Section II-A. Since run-time throughput and memory consumption repre- sent the key factors that characterize a deep packet inspection system, much effort has been dedicated to new optimization techniques that increase processing throughput and/or reduce memory consumption, thus enabling the matching of complex regular expression patterns. One of those techniques is mul- tistriding, which creates a new NFA in which each transition consumes multiple bytes instead of just one, as shown in the example in Figure 1. This modification can potentially achieve an impressive performance boost, as it linearly reduces the number of steps (and memory accesses) required to process each input string. In practice, the length of the input string reduces to 1/n, where n is the number of bytes grouped together. On the other side, it can increase the size of the NFA, as the space of symbols becomes much larger (256 n , where n is the number of bytes grouped together), potentially triggering a quick growth in the number of transitions. This problem can be mitigated through an alphabet com- pression pass, which bases on the observation that many 0 ab(cd)*e Regular expression: Automaton 1 2 1 3 5 4 a b e c d c e 0 1 2 1 3 5 4 *a bc e* ab de de e* 2-Stride automaton bc dc cd Fig. 1. A simple regular expression shown in its textual form, in NFA form and in 2-strided NFA form. symbols are equivalent as they are always used together in every transition of the NFA. In case some symbols (e.g., the numbers from ‘00’ to ‘99’) are equivalent, they can be replaced by a single one (e.g., ‘x’), reducing the cardinality of the symbol set and hence the complexity of transitions in the NFA. Although this technique requires that the input strings are translated into the new language (e.g., all the instances of the characters ‘00-99’ in the input packet must be replaced by the symbol ‘x’), the impact on the run-time throughput is usually negligible on modern processors as the access to the input strings happens sequentially. This technique is discussed in more detail in Section II. Unfortunately, the algorithms available in the literature for creating multistride automata are hardly suitable for real- world, complex patterns. Even if those algorithms can run “off line” and hence do not impact on the performance at run-time when network traffic is being filtered, we cannot accept that the computation takes several months on modern CPUs, or that a machine with 12GB of RAM is not sufficient, which sometimes happens when using the tools presented in [3]. This paper addresses the problem of building large mul- tistride NFAs efficiently, by proposing new algorithms and mixing them in a better building process, paying particular attention to computational complexity and memory consump- tion. For instance, to the best of the authors’ knowledge, this paper studies for the first time the impact of the well-known technique of NFA minimization in the building process of mul- tistride automata, in combination with improved multistriding and alphabet compression algorithms. This paper is structured as follows: Section II recalls the basis of the NFA theory and of the underlying techniques;