Commit 6739dd7
committed
Speed up
The codespell codebase unsurprisingly spends a vast majority of its
runtime in various regex related code such as `search` and `finditer`.
The best way to optimize runtime spend in regexes is to not do a regex
in the first place, since the regex engine has a rather steep overhead
over regular string primitives (that is at the cost of
flexibility). If the regex rarely matches and there is a very easy
static substring that can be used to rule out the match, then you can
speed up the code by using `substring in string` as a conditional to
skip the regex. This is assuming the regex is used enough for the
performance to matter.
An obvious choice here falls on the `codespell:ignore` regex, because
it has a very distinctive substring in the form of `codespell:ignore`,
which will rule out almost all lines that will not match.
With this little trick, runtime goes from ~5.4s to ~4.5s on the corpus
mentioned in #3419.codespell:ignore check by skipping the regex in most cases1 parent 200c31b commit 6739dd7
1 file changed
+10
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
54 | 54 | | |
55 | 55 | | |
56 | 56 | | |
57 | | - | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
58 | 64 | | |
59 | 65 | | |
60 | 66 | | |
| |||
904 | 910 | | |
905 | 911 | | |
906 | 912 | | |
907 | | - | |
| 913 | + | |
| 914 | + | |
| 915 | + | |
908 | 916 | | |
909 | 917 | | |
910 | 918 | | |
| |||
0 commit comments