Skip to content

possible url normalization issues #72

@philbudne

Description

@philbudne

A few things I found while looking for pathological cases:

http://do.ma.in:80/what/ever is normalized to http://do.ma.in/what/ever
and so is https://do.ma.in/what/ever
but https://do.ma.in:442/what/ever comes out as http://do.ma.in:443/what/ever

http://10.2.3.4/hello/world.html comes out as http://2.3.4/hello/world.html

Spaces and %20 in query strings are normalized to +
but %20 and + in path are left as is
space is changed to %20

UTF-8 in path is %-quoted, but %27 is turned into '
(BUT ' is left alone, so the result is a uniform, but ' is officially a delimiter in https://datatracker.ietf.org/doc/html/rfc3986#section-2.2)

The above two were seen in the wild in:
http://www.seychellesnewsagency.com/articles/19841/Over++Seychelles%27+households+received+financial+assistance+following+Dec.++disasters

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions