-
-
Notifications
You must be signed in to change notification settings - Fork 321
Open
Description
The regex tries to use 5-digit Unicode escapes, but Unicode escapes are only 4-digit, which makes it not work.
For example this fragment: \u20000-\u2A6DF
is interpreted as 3 Unicode ranges:
-
U+2000 (which is not a Han character)
-
from 0 (U+0030) to U-2A6D (which encompasses tons of various characters, including the entire Latin alphabet, but no Han characters)
-
F (U+0045)
I guess the regex should be rewritten using surrogates, like the emoji one.
Metadata
Metadata
Assignees
Labels
No labels