Skip to content

Conversation

@elliotwutingfeng
Copy link

@elliotwutingfeng elliotwutingfeng commented Oct 29, 2023

Current implementation of select() searches for longest matching TLDs from the right end all the way to the left end.

This approach is necessary to handle edge cases like example.s3.cn-north-1.amazonaws.com.cn, where

  • s3.cn-north-1.amazonaws.com.cn and com.cn are valid.
  • but the intermediates cn-north-1.amazonaws.com.cn and amazonaws.com.cn are not valid.

However, this disadvantages URLs with long subdomains like a.very.long.subdomain.example.co.uk.

We can terminate the search early by limiting the search size to [parts.size, @max_rule_size].min, where parts.size is number of parts in the hostname, and @max_rule_size is the number of parts in the largest rule in @rules.

Also replaced the kernel loop with a faster bounded while loop, as it is possible to convert the current break condition to a loop condition.

Before

$ ruby test/benchmarks/bm_find_all.rb 1000000
Rehearsal -------------------------------------------------------------
NAME_SHORT                  2.348576   0.000000   2.348576 (  2.350146)
NAME_SHORT (noprivate)      2.444302   0.000000   2.444302 (  2.445995)
NAME_MEDIUM                 2.890648   0.000000   2.890648 (  2.892380)
NAME_MEDIUM (noprivate)     3.014823   0.000000   3.014823 (  3.017137)
NAME_LONG                   3.705042   0.002693   3.707735 (  3.710142)
NAME_LONG (noprivate)       3.727960   0.000000   3.727960 (  3.730321)
NAME_WILD                   3.657520   0.000000   3.657520 (  3.659759)
NAME_WILD (noprivate)       3.815247   0.000000   3.815247 (  3.817492)
NAME_EXCP                   4.420996   0.000000   4.420996 (  4.423570)
NAME_EXCP (noprivate)       4.408350   0.000000   4.408350 (  4.411540)
IAAA                        2.604410   0.000000   2.604410 (  2.605894)
IAAA (noprivate)            2.688674   0.000000   2.688674 (  2.690398)
IZZZ                        2.605931   0.000000   2.605931 (  2.607543)
IZZZ (noprivate)            2.679484   0.000000   2.679484 (  2.681334)
PAAA                        4.506107   0.000000   4.506107 (  4.509242)
PAAA (noprivate)            4.174697   0.000000   4.174697 (  4.177737)
PZZZ                        4.618712   0.000000   4.618712 (  4.622306)
PZZZ (noprivate)            4.323496   0.000000   4.323496 (  4.327372)
JP                          4.151477   0.000000   4.151477 (  4.154904)
JP (noprivate)              4.230317   0.000000   4.230317 (  4.234143)
IT                          2.645423   0.000000   2.645423 (  2.647490)
IT (noprivate)              2.731147   0.000000   2.731147 (  2.733281)
COM                         2.672895   0.000000   2.672895 (  2.675236)
COM (noprivate)             2.796167   0.000000   2.796167 (  2.798951)
--------------------------------------------------- total: 81.865094sec

                                user     system      total        real
NAME_SHORT                  2.455661   0.000000   2.455661 (  2.458051)
NAME_SHORT (noprivate)      2.465275   0.000000   2.465275 (  2.468431)
NAME_MEDIUM                 2.946424   0.000000   2.946424 (  2.949358)
NAME_MEDIUM (noprivate)     3.023296   0.000000   3.023296 (  3.025300)
NAME_LONG                   3.770850   0.000000   3.770850 (  3.773397)
NAME_LONG (noprivate)       3.828416   0.000000   3.828416 (  3.830904)
NAME_WILD                   3.749617   0.000000   3.749617 (  3.752038)
NAME_WILD (noprivate)       3.827687   0.000000   3.827687 (  3.830190)
NAME_EXCP                   4.418445   0.000000   4.418445 (  4.421315)
NAME_EXCP (noprivate)       4.531002   0.000000   4.531002 (  4.535273)
IAAA                        2.699374   0.000000   2.699374 (  2.700931)
IAAA (noprivate)            2.768779   0.000000   2.768779 (  2.771347)
IZZZ                        2.699160   0.000000   2.699160 (  2.702339)
IZZZ (noprivate)            2.766278   0.000000   2.766278 (  2.769706)
PAAA                        4.706753   0.000000   4.706753 (  4.711835)
PAAA (noprivate)            4.363877   0.000000   4.363877 (  4.367030)
PZZZ                        4.716710   0.000000   4.716710 (  4.722447)
PZZZ (noprivate)            4.109007   0.000000   4.109007 (  4.111433)
JP                          3.937950   0.000000   3.937950 (  3.941688)
JP (noprivate)              4.065472   0.000000   4.065472 (  4.070663)
IT                          2.628695   0.000000   2.628695 (  2.630612)
IT (noprivate)              2.718972   0.000000   2.718972 (  2.721554)
COM                         2.647181   0.000000   2.647181 (  2.649369)
COM (noprivate)             2.714115   0.000000   2.714115 (  2.715725)

After

$ ruby test/benchmarks/bm_find_all.rb 1000000
Rehearsal -------------------------------------------------------------
NAME_SHORT                  2.237599   0.000000   2.237599 (  2.239443)
NAME_SHORT (noprivate)      2.336548   0.000000   2.336548 (  2.338574)
NAME_MEDIUM                 2.713107   0.000000   2.713107 (  2.714795)
NAME_MEDIUM (noprivate)     2.830825   0.000000   2.830825 (  2.832685)
NAME_LONG                   3.042471   0.000000   3.042471 (  3.044456)
NAME_LONG (noprivate)       3.019529   0.003196   3.022725 (  3.024463)
NAME_WILD                   2.978485   0.000000   2.978485 (  2.980252)
NAME_WILD (noprivate)       3.088728   0.000000   3.088728 (  3.090743)
NAME_EXCP                   3.682105   0.000000   3.682105 (  3.684332)
NAME_EXCP (noprivate)       3.815742   0.000000   3.815742 (  3.818032)
IAAA                        2.458039   0.000000   2.458039 (  2.459425)
IAAA (noprivate)            2.496389   0.000000   2.496389 (  2.497893)
IZZZ                        2.404844   0.000000   2.404844 (  2.406255)
IZZZ (noprivate)            2.463744   0.000000   2.463744 (  2.465130)
PAAA                        3.515573   0.000000   3.515573 (  3.517585)
PAAA (noprivate)            3.193961   0.000000   3.193961 (  3.195845)
PZZZ                        3.587199   0.000000   3.587199 (  3.589388)
PZZZ (noprivate)            3.254129   0.000000   3.254129 (  3.256092)
JP                          3.783495   0.000000   3.783495 (  3.785693)
JP (noprivate)              3.885775   0.003331   3.889106 (  3.891664)
IT                          2.513112   0.000000   2.513112 (  2.514673)
IT (noprivate)              2.599210   0.000000   2.599210 (  2.600769)
COM                         2.539283   0.000000   2.539283 (  2.540692)
COM (noprivate)             2.485424   0.000000   2.485424 (  2.486922)
--------------------------------------------------- total: 70.931843sec

                                user     system      total        real
NAME_SHORT                  2.218905   0.000000   2.218905 (  2.220197)
NAME_SHORT (noprivate)      2.282971   0.000000   2.282971 (  2.284161)
NAME_MEDIUM                 2.707217   0.000000   2.707217 (  2.708815)
NAME_MEDIUM (noprivate)     2.781946   0.000000   2.781946 (  2.783615)
NAME_LONG                   3.018843   0.000000   3.018843 (  3.020559)
NAME_LONG (noprivate)       3.079345   0.000000   3.079345 (  3.081143)
NAME_WILD                   3.041727   0.000000   3.041727 (  3.043618)
NAME_WILD (noprivate)       3.079496   0.000000   3.079496 (  3.081228)
NAME_EXCP                   3.655873   0.000000   3.655873 (  3.658370)
NAME_EXCP (noprivate)       3.754648   0.000000   3.754648 (  3.756916)
IAAA                        2.507284   0.000000   2.507284 (  2.509283)
IAAA (noprivate)            2.540126   0.000000   2.540126 (  2.541872)
IZZZ                        2.466202   0.000000   2.466202 (  2.467584)
IZZZ (noprivate)            2.544616   0.000000   2.544616 (  2.546141)
PAAA                        3.622206   0.000000   3.622206 (  3.624447)
PAAA (noprivate)            3.272909   0.000000   3.272909 (  3.274831)
PZZZ                        3.675658   0.000000   3.675658 (  3.677843)
PZZZ (noprivate)            3.318359   0.000000   3.318359 (  3.320537)
JP                          3.882480   0.000000   3.882480 (  3.885434)
JP (noprivate)              3.971438   0.000000   3.971438 (  3.974437)
IT                          2.548282   0.000000   2.548282 (  2.549875)
IT (noprivate)              2.609304   0.000000   2.609304 (  2.610879)
COM                         2.569648   0.000000   2.569648 (  2.571186)
COM (noprivate)             2.497100   0.000000   2.497100 (  2.498543)

@elliotwutingfeng elliotwutingfeng marked this pull request as ready for review October 29, 2023 08:00
@weppos
Copy link
Owner

weppos commented Nov 21, 2023

Thanks for your contribution @elliotwutingfeng. I need some time to review the changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants