-
Notifications
You must be signed in to change notification settings - Fork 178
Headers +toString(), fixes #1385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…t; fix broken forEach; optimize; missing tests
PS: I think
can be optimized (removed). If you accept this PR and give me the go-ahead: I can do this optimizations with a benchmark. Anyway, not cached serializeToArray is used in |
FYI, the code was merged with a PR from earlier today, so update your branch. |
|
I'll look at any change, but this isn't broken. All that work was done to manually track the size as data was added and removed because the information is accessed repeatedly, and it prevented calculating it repeatedly. I suppose a benchmark could provide sufficient justification to accept a change. |
The size of this pr is fine, since it's one class and all related. And yes I know my PR's are large a lot, but that's the nature of working something that affects many things, like nullability. It's a fine line and I'm sure I cross it sometimes. |
The "not cached" |
1) getBytes 3-5x faster than getBytes(US_ASCII) 2) don't create and copy extra byte arrays
I have applied all your recommendations Plus: new serializeToArray() is 3.7 times faster than previous one |
@@ -522,7 +523,8 @@ private void checkKey(String key) { | |||
for (int idx = 0; idx < len; idx++) { | |||
char c = key.charAt(idx); | |||
if (c < 33 || c > 126 || c == ':') { | |||
throw new IllegalArgumentException(KEY_INVALID_CHARACTER + "'" + c + "'"); | |||
throw new IllegalArgumentException(KEY_INVALID_CHARACTER +"0x"+ Integer.toHexString(c) | |||
+" at "+ idx +" in "+ key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just this please.
private static final String KEY_INVALID_CHARACTER = "Header key has invalid character: 0x";
throw new IllegalArgumentException(KEY_INVALID_CHARACTER + Integer.toHexString(c));
if (c > 127 || c == 10 || c == 13) { | ||
throw new IllegalArgumentException(VALUE_INVALID_CHARACTERS + c); | ||
throw new IllegalArgumentException(VALUE_INVALID_CHARACTERS +"0x"+ Integer.toHexString(c) | ||
+" at "+ i +" in "+ val); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private static final String VALUE_INVALID_CHARACTERS = "Header value has invalid character: 0x";
throw new IllegalArgumentException(VALUE_INVALID_CHARACTERS + Integer.toHexString(c)));
b[i] = ';'; | ||
} | ||
} | ||
return new String(b, 0, HVCRLF_BYTES, b.length - HVCRLF_BYTES - 3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fully on board with squeezing every nano second out, but we can't use a deprecated method. So just simplify this. We are talking maybe nano seconds on an operation that probably isn't used all that much. Headers have been around for years and no one has asked for this, users probably write their own anyway.
@Override
public String toString() {
byte[] b = getSerialized();
for (int i = 0, len = b.length; i < len; i++) {
switch (b[i]) {
case CR: b[i] = ';'; break;
case LF: b[i] = ' '; break;
}
}
return new String(b, US_ASCII);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new String(b, US_ASCII)→new String(b, ISO_8859_1) ⇒ 3x times faster
Yes, in this case deprecated String ctor isn't faster than new String(b, ISO_8859_1) 👍
|
Special getBytes() allows serializing 3.7 times faster: Keys and values are US_ASCII: the input is checked for that, so both methods generate the same bytes as an output. The result is absolutely the same, but the speed is much faster. I have added a simple benchmark in tests: you can see yourself 🤝
On my computer For long keys/values new version should be even faster |
"Just this please. private static final String KEY_INVALID_CHARACTER = "Header key has invalid character: 0x"; throw new IllegalArgumentException(KEY_INVALID_CHARACTER + Integer.toHexString(c));"
✅
JDK25-26 has it: I have fixed and explained everything I had seen. PS: more about deprecations Javadoc:
it is almost "random", can fail and usually slower than US_ASCII/ISO_8859_1 |
I think, It has sense to make checkKey and checkValue package-private or (better) put them into Checker class: In old java versions like 8: synthetic accessor methods may be generated for private access, causing minor overhead. |
I'd actually be fine with a manual implementation that is similar to deprecated code but is tuned to recognize that we have actually already validated the key in checkKey, so we know that every char/byte c is 33 <= c <= 126 Value is also validated. So in both places where you use a deprecated method, you can hand code the fastest anyway to take advantage of that knowledge. |
Sounds good. |
Please don't 🙏 Manual implementation is 116 (one hundred sixteen) times slower than String.getBytes 😱😱😱 This method is deprecated, not because it is bad or will be removed any time soon (not in years). There are no alternatives to it. Microsoft and others use it. It is deprecated as a reminder that "it is for professionals, who understand exactly what they are doing and where to use it exactly". Please discuss it with your colleagues. Excessive formalism can be harmful 🙏 The closest competitor getBytes(ISO_8859_1) is 10 times slower. All numbers are in the benchmark ^^^ |
A copy of my thoughts about HashMap(capacity formula) to be sure you see it I see 4 possible solutions:
then
PS: if you open the sources of JDK:
|
|
The command is not understood. Please specify 🙏 |
1) static class Checker doesn't contain hidden reference to Headers ⇒ -1 field 👍 2) not private checkKey and checkValue can be accessed directly from Checker, without synthetic accessors 3) static should be a little faster
Done! 👍 |
|
For small src maps this is the best solution 👍
Thank you! 💖 Done! 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Headers.java
Then I saw and:
Fixed nullability issues.
Replaced inefficient keySet() + Map.get() pattern with entrySet().
Fixed broken forEach loop; added a failing test to confirm the issue before the fix.
Replaced val.chars() with a more efficient for-loop and charAt().
Removed "redundant"
private final boolean readOnly
field; replaced with another check.Finally, I have added missing tests.
I really don't have time for several one-line PRs 😭🙏
This PR is obvious and not bigger than your recent PRs.
I open to your questions, critic, and recommendations for improvement 🤝
PS: Some java quirks (why I have changed those lines)
java.util.Collections.UnmodifiableMap#forEach
java.util.Objects#hash
it is overkill for ONE value
For one value, it is recommended to use
java.util.Objects#hashCode
~ "Header value has invalid character: 42911"
c has type int ⇒ "+ c " creates decimal digit representation
but "+ (char) c" can be problematic too: invisible characters, special characters e.g.
0x200B// Zero Width space
0x200C// Zero Width non-joiner
0x200D// Zero Width joiner