Skip to content

Multi-pattern regular expressions #496

@krizhanovsky

Description

@krizhanovsky

Tempesta FW core must implement multi-pattern regular expressions to efficiently handle HTTP matching rules for filtering and configuration (see for example #471, #495, #530, #1544 with many ignored headers matching in #1550 for caching). Intel HyperScan can be used as reference or foundation for the feature.

ReDoS must be considered by the implementation. It seems limited or fully prohibited back and forward referencing and resource consumption in sense of #488 .

Should be done close or together with #732, since simple multi-pattern is a sub-task of multi-pattern regexps.

Since Tempesta FW deals with fields of parsed HTTP messages, in general we need (1) relatively simple regular expressions for (2) relatively short strings. E.g.

location ~ ^/(/category/foo/|dddd|ccccc|vvvv|aaaa)/
hdr "Referer" == "*.tempesta-tech.com/*"  -> base;

In most cases simple multipattern prefix/suffix is enough. Definitely no need for PCRE. However, there could be tens of location rules with simple regexps, so multi-pattern regexps still make sense.

The only functionality requiring relatively large input data (up to tens kilobytes and hundreds bytes in average) and complex regexps is WAF filtering rules against User-Agent, URI, Cookie or other headers values.

These two cases must be separated:

  1. a simple multi-patter string search (e.g. Comentz-Walter or a SIMD algorithm) with begind/end bindings
  2. multipattern regexps, e.g. with runtime ported from Hyperscan (done in https://github.com/G-Core/linux-regex-module)

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions