URL parsing fails for valid links containing non-ASCII characters in domain

I’ve encountered an issue where some valid URLs containing non-ASCII characters in their domain names are not parsed correctly.


`"http://faß.de"`

Expected: `http://faß.de`
Actual:   `http://fa/`


`"http://نامه‌ای.com"`

Expected: `http://نامه‌ای.com`
Actual:   `http://نامه`  and  `ای.com`


`http://ශ්‍රී.com`

Expected: `http://ශ්‍රී.com`
Actual:   `http://ශ්`  and  `රී.com`


It seems the current parsing logic might not fully support IDN (Internationalized Domain Names) or certain Unicode characters in URLs.


### Environment:

PHP version: 7.3

Library version: 3.12 (latest)


### Possible cause:
The regex or parsing method used to detect URLs might not be Unicode-aware for domain name parts, leading to incorrect splitting or truncation.


### Suggestion:
Consider updating the regex to fully support Unicode letters and IDN domains, possibly using \p{L} for letters and ensuring the u (UTF-8) regex modifier is used.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

URL parsing fails for valid links containing non-ASCII characters in domain #56

Environment:

Possible cause:

Suggestion:

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

URL parsing fails for valid links containing non-ASCII characters in domain #56

Description

Environment:

Possible cause:

Suggestion:

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions