-
-
Notifications
You must be signed in to change notification settings - Fork 21
Open
Description
Essentially I'm reworking a php bot detection library in nim. It has a list of crawlers and combines them with groups into one big regex, like (bot1|bot2|bot3|....). There's around a thousand crawlers in the list. It takes around half an hour to compile with the release flag and on a dataset of around 40000 user agents it took a few minutes to check them all. With std/re it compiles more or less instantly and checks that dataset in a few seconds.
Here's an example code, files are attached:
crawlers.txt
useragents.txt
import regex, strutils
const crawlerRegex = re2('(' & join(splitLines(staticRead("crawlers.txt")), "|") & ')')
let uas = splitLines(readFile("useragents.txt"))
for ua in uas:
echo ua, ", ", contains(ua, crawlerRegex)
nitely
Metadata
Metadata
Assignees
Labels
No labels