-
-
Notifications
You must be signed in to change notification settings - Fork 600
Open
Labels
Description
Description
I just stumbled upon some files which have links in the following format:
https://sk.wikipedia.org/wiki/Administratívne_členenie_Slovenska
https://ru.wikipedia.org/wiki/Федеральные_округа_Российской_Федерации
Running this file through the URL detection yields https://sk.wikipedia.org/wiki/Administrativne_clenenie_Slovenska and https://ru.wikipedia.org/wiki, which both do not match the input.
How To Reproduce
-
Save the above code snippet as
test.txt
. -
Open the interactive Python console and run:
>>> from scancode import api >>> api.get_urls('test.txt') {'urls': [{'url': 'https://sk.wikipedia.org/wiki/Administrativne_clenenie_Slovenska', 'start_line': 1, 'end_line': 1}, {'url': 'https://ru.wikipedia.org/wiki', 'start_line': 2, 'end_line': 2}]} >>>
System configuration
For bug reports, it really helps us to know:
- What OS are you running on? (Windows/MacOS/Linux) - Linux
- What version of scancode-toolkit was used to generate the scan file? - 32.3.3
- What installation method was used to install/run scancode? (pip/source download/other) - pip