-
Notifications
You must be signed in to change notification settings - Fork 1
Description
I’ve encountered an issue where some valid URLs containing non-ASCII characters in their domain names are not parsed correctly.
"http://faß.de"
Expected: http://faß.de
Actual: http://fa/
"http://نامهای.com"
Expected: http://نامهای.com
Actual: http://نامه
and ای.com
http://ශ්රී.com
Expected: http://ශ්රී.com
Actual: http://ශ්
and රී.com
It seems the current parsing logic might not fully support IDN (Internationalized Domain Names) or certain Unicode characters in URLs.
Environment:
PHP version: 7.3
Library version: 3.12 (latest)
Possible cause:
The regex or parsing method used to detect URLs might not be Unicode-aware for domain name parts, leading to incorrect splitting or truncation.
Suggestion:
Consider updating the regex to fully support Unicode letters and IDN domains, possibly using \p{L} for letters and ensuring the u (UTF-8) regex modifier is used.