Tuesday, December 28, 2010

How to Build DPI Products? (Part VII - Regular Expression Matching-3)

Yet another paper on regular expression optimization (see previous post - here and here) - this time by Yi-Hua E. Yang, Hoang Le and Viktor K. Prasanna (picture)- all from Ming Hsieh Dept. of Electrical Eng, University of Southern California.

See "High Performance Dictionary-Based String Matching for Deep Packet Inspection String Matching for Deep Packet Inspection" - here.


Dictionary-Based String Matching (DBSM) is used in network Deep Packet Inspection (DPI) applications virus scanning [1] and network intrusion detection [2]. We propose the Pipelined Affix Search with Tail Acceleration (PASTA) architecture for solving DBSM with guaranteed worst-case performance.

Our PASTA architecture is composed of a Pipelined Affix Search Relay (PASR) followed by a Tail Acceleration Finite Automaton (TAFA). PASR consists of one or more pipelined Binary Search Tree (pBST) modules arranged in a linear array. TAFA is constructed with the Aho-Corasick goto and failure functions [3] in a compact multi-path and multi-stride tree structure. Both PASR and TAFA achieve good memory efficiency of 1.2 and 2 B/ch (bytes per character) respectively and are pipelined to achieve a high clock rate of 200 MHz on FPGAs.

Because PASTA does not depend on the effectiveness of any hash function or the property of the input stream, its performance is guaranteed in the worst case. Our prototype implementation of PASTA on an FPGA with 10 Mb on-chip block RAM achieves 3.2 Gbps matching throughput against a dictionary of over 700K characters.

This level of performance surpasses the requirements of next-generation security gateways for deep packet inspection.

No comments:

Post a Comment