-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Prevent re-encoding of path when following a redirect #2184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
@JoeMylinn thanks for the pull request. Could you please add enough unit tests for this in |
@yhirose I've added a test for this, let me know if it's enough. |
@JoeMylinn could you please fix the style formatting error? |
@yhirose done |
'test/style-check' still gives errors... |
my bad, shouldn't give any errors now |
this->set_path_encode( | ||
false); // Disable path encoding, redirects should already be encoded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this set path_encode_
for all subsequent requests, overriding what the user configured?
Client cli(HOST, PORT);
cli.set_follow_location(true);
cli.set_path_encode(true); // encode paths
auto res = cli.Get("/a b"); // encoded as "/a%20b", redirects
// path_encode_ is now false
auto res = cli.Get("/c d"); // not encoded anymore, sent as "/c d"
Just a thought, I haven't tested this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I believe you are right. That shouldn't be the case of course. Maybe we can just always create a new redirect client, even if it's the same host and never reuse the original one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or you go back to your original suggestion (we talked about on discord) and decode the host+query so that it can be re-encoded in the client as normal. @JoeMylinn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah maybe the best option is to always decode both the path and query, and re-encode it before passing the new path to the original or redirect client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't like the decode and re-encode thing. Especially since we'd always have to encode if we decode it first, or it could break URLs with encoded characters if path_encode_
is set to false
.
We could just parse the query parameters and add them to the request, something like this:
auto path = next_path;
if (!next_query.empty())
{
detail::parse_query_text(next_query, req.params);
}
@yhirose any thoughts on what solution you'd prefer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JoeMylinn I still don't fully understand the whole situation, yet. Could you please try with v0.22.0 to see if the same issue happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yhirose Yes, v0.22.0 has the same issue.
I'll try to explain the issue a bit better. We need several things for it to happen:
follow_location_
set totrue
path_encode_
set totrue
- Get redirected, with the query part of the Location containing a
+
.
For example, let's say the redirect location is https://example.com/test?say=Hello+World
. The value of the parameter say
should be Hello World
, since '+'
encodes space ' '
in a query. However, the current implementation just appends the query to the path, and then (if path_encode_
is set to true
) encodes the whole path, including the query. This encodes it as Hello%2BWorld
, instead of Hello+World
or Hello%20World
, changing the value of the of the parameter to Hello+World
.
I've noticed this issue because it causes downloads from GitHub to fail, same as in #2185.
I originally fixed this by just setting path_encode_
to false
when following a redirect.
However this is flawed, since it might override the user configured settings for the client, when the location is on the same host.
I think there are mainly two ways to fix this:
- Always use a new redirect client, even if the location is on the same host, with
path_encode_
set tofalse
. - Parse the query properly, and pass the parameters to the new request.
The current implementation re-encodes the path and query when following a redirect, unless
path_encode_
is set tofalse
.However, the query is never decoded. This causes issues if the query contains a space
' '
(encoded as'+'
). Since it's never decoded, but later re-encoded, the space' '
(encoded as'+'
) gets changed into a'+'
(encoded as'%2B'
). Thus, the value of the parameter gets changed.This fixes it by disabling path encoding on the redirect client.