Skip to content

Prevent re-encoding of path when following a redirect #2184

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

JoeMylinn
Copy link

The current implementation re-encodes the path and query when following a redirect, unless path_encode_ is set to false.
However, the query is never decoded. This causes issues if the query contains a space ' ' (encoded as '+'). Since it's never decoded, but later re-encoded, the space ' ' (encoded as '+') gets changed into a '+' (encoded as '%2B'). Thus, the value of the parameter gets changed.

This fixes it by disabling path encoding on the redirect client.

@yhirose
Copy link
Owner

yhirose commented Jul 19, 2025

@JoeMylinn thanks for the pull request. Could you please add enough unit tests for this in test/test.cc? Then, I'll take a look at it. Thanks!

@JoeMylinn
Copy link
Author

@yhirose I've added a test for this, let me know if it's enough.

@yhirose
Copy link
Owner

yhirose commented Jul 20, 2025

@ungive, do you have any thought of this pull request? Thanks!
#2082

@yhirose
Copy link
Owner

yhirose commented Jul 20, 2025

@JoeMylinn could you please fix the style formatting error?

@JoeMylinn
Copy link
Author

@yhirose done

@yhirose
Copy link
Owner

yhirose commented Jul 20, 2025

'test/style-check' still gives errors...

@JoeMylinn
Copy link
Author

my bad, shouldn't give any errors now

Comment on lines +8668 to +8669
this->set_path_encode(
false); // Disable path encoding, redirects should already be encoded
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this set path_encode_ for all subsequent requests, overriding what the user configured?

Client cli(HOST, PORT);
cli.set_follow_location(true);
cli.set_path_encode(true); // encode paths
auto res = cli.Get("/a b"); // encoded as "/a%20b", redirects
// path_encode_ is now false
auto res = cli.Get("/c d"); // not encoded anymore, sent as "/c d"

Just a thought, I haven't tested this code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I believe you are right. That shouldn't be the case of course. Maybe we can just always create a new redirect client, even if it's the same host and never reuse the original one?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or you go back to your original suggestion (we talked about on discord) and decode the host+query so that it can be re-encoded in the client as normal. @JoeMylinn

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah maybe the best option is to always decode both the path and query, and re-encode it before passing the new path to the original or redirect client.

Copy link
Author

@JoeMylinn JoeMylinn Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't like the decode and re-encode thing. Especially since we'd always have to encode if we decode it first, or it could break URLs with encoded characters if path_encode_ is set to false.

We could just parse the query parameters and add them to the request, something like this:

 auto path = next_path;

 if (!next_query.empty())
 {
   detail::parse_query_text(next_query, req.params);
 }

@yhirose any thoughts on what solution you'd prefer?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoeMylinn I still don't fully understand the whole situation, yet. Could you please try with v0.22.0 to see if the same issue happens?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yhirose Yes, v0.22.0 has the same issue.

I'll try to explain the issue a bit better. We need several things for it to happen:

  1. follow_location_ set to true
  2. path_encode_ set to true
  3. Get redirected, with the query part of the Location containing a +.

For example, let's say the redirect location is https://example.com/test?say=Hello+World. The value of the parameter say should be Hello World, since '+' encodes space ' ' in a query. However, the current implementation just appends the query to the path, and then (if path_encode_ is set to true) encodes the whole path, including the query. This encodes it as Hello%2BWorld, instead of Hello+World or Hello%20World, changing the value of the of the parameter to Hello+World.

I've noticed this issue because it causes downloads from GitHub to fail, same as in #2185.

I originally fixed this by just setting path_encode_ to false when following a redirect.
However this is flawed, since it might override the user configured settings for the client, when the location is on the same host.

I think there are mainly two ways to fix this:

  1. Always use a new redirect client, even if the location is on the same host, with path_encode_ set to false.
  2. Parse the query properly, and pass the parameters to the new request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants