Incorrect/incomplete extraction of URLs with special characters

### Description

I just stumbled upon some files which have links in the following format:

```
https://sk.wikipedia.org/wiki/Administratívne_členenie_Slovenska
https://ru.wikipedia.org/wiki/Федеральные_округа_Российской_Федерации

```

Running this file through the URL detection yields https://sk.wikipedia.org/wiki/Administrativne_clenenie_Slovenska and https://ru.wikipedia.org/wiki, which both do not match the input.

### How To Reproduce

* Save the above code snippet as `test.txt`.
* Open the interactive Python console and run:

  ```
  >>> from scancode import api
  >>> api.get_urls('test.txt')
  {'urls': [{'url': 'https://sk.wikipedia.org/wiki/Administrativne_clenenie_Slovenska', 'start_line': 1, 'end_line': 1}, {'url': 'https://ru.wikipedia.org/wiki', 'start_line': 2, 'end_line': 2}]}
  >>> 
  ```

### System configuration

> For bug reports, it really helps us to know:

* What OS are you running on? (Windows/MacOS/Linux) - Linux
* What version of scancode-toolkit was used to generate the scan file? - 32.3.3
* What installation method was used to install/run scancode? (pip/source download/other) - pip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Incorrect/incomplete extraction of URLs with special characters #4475

Description

How To Reproduce

System configuration

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Incorrect/incomplete extraction of URLs with special characters #4475

Description

Description

How To Reproduce

System configuration

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions