Skip to content

Commit 6739dd7

Browse files
committed
Speed up codespell:ignore check by skipping the regex in most cases
The codespell codebase unsurprisingly spends a vast majority of its runtime in various regex related code such as `search` and `finditer`. The best way to optimize runtime spend in regexes is to not do a regex in the first place, since the regex engine has a rather steep overhead over regular string primitives (that is at the cost of flexibility). If the regex rarely matches and there is a very easy static substring that can be used to rule out the match, then you can speed up the code by using `substring in string` as a conditional to skip the regex. This is assuming the regex is used enough for the performance to matter. An obvious choice here falls on the `codespell:ignore` regex, because it has a very distinctive substring in the form of `codespell:ignore`, which will rule out almost all lines that will not match. With this little trick, runtime goes from ~5.4s to ~4.5s on the corpus mentioned in #3419.
1 parent 200c31b commit 6739dd7

File tree

1 file changed

+10
-2
lines changed

1 file changed

+10
-2
lines changed

codespell_lib/_codespell.py

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,13 @@
5454
uri_regex_def = (
5555
r"(\b(?:https?|[ts]?ftp|file|git|smb)://[^\s]+(?=$|\s)|\b[\w.%+-]+@[\w.-]+\b)"
5656
)
57-
inline_ignore_regex = re.compile(r"[^\w\s]\s*codespell:ignore\b(\s+(?P<words>[\w,]*))?")
57+
codespell_ignore_tag = "codespell:ignore"
58+
inline_ignore_regex = re.compile(
59+
rf"[^\w\s]\s*{codespell_ignore_tag}\b(\s+(?P<words>[\w,]*))?"
60+
)
61+
USAGE = """
62+
\t%prog [OPTIONS] [file1 file2 ... fileN]
63+
"""
5864

5965
supported_languages_en = ("en", "en_GB", "en_US", "en_CA", "en_AU")
6066
supported_languages = supported_languages_en
@@ -904,7 +910,9 @@ def parse_lines(
904910
line_number = fragment_line_number + i
905911

906912
extra_words_to_ignore = set()
907-
match = inline_ignore_regex.search(line)
913+
match = (
914+
inline_ignore_regex.search(line) if codespell_ignore_tag in line else None
915+
)
908916
if match:
909917
extra_words_to_ignore = set(
910918
filter(None, (match.group("words") or "").split(","))

0 commit comments

Comments
 (0)