Skip to content

Conversation

@CoreyKaylor
Copy link
Owner

Optimizations suggested by @sebastienros in #191
I've only compared to v0.20.0, not the previous comparer implementations. The current branch outperforms the baseline v0.20.0 between 5% and 20%

Separately, it feels like there might be a more optimal route for sorted Guid as well (which I assume is a common key scenario). I'll do a little bit of digging, but if anyone knows something I don't already feel free to share.

Memory

  • Allocation-free comparers confirmed - constant 56B overhead regardless of operation count (100 ops → 56B, 10000 ops → 56-59B)

Read Performance - All Comparers (1000 ops, 64B values)

Comparer Time vs Native
LengthOnly 24.9 μs -75%
ReverseSignedInt 99.6 μs -1%
Default (Native) 100.8 μs baseline
SignedInt 103.0 μs +2%
ReverseUnsignedInt 105.8 μs +5%
UnsignedInt 107.3 μs +6%
ReverseBitwise 110.8 μs +10%
Bitwise 113.2 μs +12%
Utf8String 114.1 μs +13%
ReverseLength 118.4 μs +17%
Length 119.4 μs +18%
ReverseUtf8String 163.9 μs +63%
HashCode 172.1 μs +71%

Write Performance - All Comparers (1000 ops, 64B values)

Comparer Time vs Native
LengthOnly 82.3 μs -65%
UnsignedInt 200.4 μs -15%
SignedInt 202.5 μs -14%
ReverseLength 219.0 μs -7%
ReverseSignedInt 221.7 μs -6%
ReverseUnsignedInt 224.8 μs -5%
ReverseBitwise 228.6 μs -3%
Default (Native) 236.5 μs baseline
Length 237.6 μs +0.5%
Utf8String 237.8 μs +0.5%
Bitwise 294.5 μs +25%
ReverseUtf8String 301.4 μs +27%
HashCode 314.7 μs +33%

Integer Keys (10000 ops, 4-byte keys)

Comparer Time vs Native
SignedInt 307 μs -81%
UnsignedInt 312 μs -81%
ReverseSignedInt 1,012 μs -38%
ReverseUnsignedInt 1,007 μs -38%
Default (Native) 1,633 μs baseline

Notes

  • LengthOnly is fastest but only compares by length (no content comparison)
  • SignedInt/UnsignedInt provide major gains for integer keys
  • HashCode is consistently slowest across all scenarios
  • Most custom comparers perform within ±15% of native for general byte data

/// </remarks>
#if NET5_0_OR_GREATER
[SkipLocalsInit]
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is MDBValue read-only?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be something to leverage, marking it readonly, and adding the in keyword in some methods such that the value is passed by ref automatically.

Here is the chatgpt explanation

Short answer:
They reduce copies, prevent accidental mutation, and enable better compiler optimizations—especially for larger structs.

Details:

1) readonly struct

Marking a struct as readonly guarantees it’s immutable after construction.

Advantages

  • No defensive copies: The compiler knows instance methods won’t mutate fields, so it doesn’t create hidden copies when the struct is accessed through in, readonly fields, or properties.
  • Clear intent & safety: Prevents accidental field mutation and enforces immutability at compile time.
  • Better optimizations: The JIT can make stronger assumptions, sometimes improving inlining and register usage.
  • Thread-safety by design: Immutable value types are naturally safer to share.

Cost

  • You must ensure all instance fields are readonly and methods don’t mutate state.

2) in parameters

in passes a struct by readonly reference instead of by value.

Advantages

  • Avoids copying large structs: Especially useful when structs exceed ~16 bytes or are passed frequently.
  • Expresses intent: Signals “this method will not modify the argument.”
  • Interoperates with readonly structs: No defensive copies when calling methods on a readonly struct.

Cost

  • Indirection: Very small structs can be slower due to pointer dereferencing.
  • Readonly rules: Attempting mutation causes compile errors or hidden copies if the struct isn’t readonly.

3) Using both together (best case)

This is where the real benefit shows up.

  • readonly struct + in parameters ⇒ zero copies, no defensive cloning, maximum safety.
  • Methods called on the struct don’t trigger hidden temporaries.
  • Ideal for math types, vectors, coordinates, timestamps, and domain value objects.

Practical guidance

  • Use readonly struct when:

    • The type is logically immutable
    • It’s used frequently or passed around a lot
  • Use in when:

    • The struct is medium-to-large
    • The method is hot-path or allocation/copy sensitive
  • Don’t bother for tiny structs (e.g., two ints).

Bottom line:
You get immutability guarantees, fewer copies, and better performance—when used selectively and intentionally.

Copy link
Owner Author

@CoreyKaylor CoreyKaylor Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I overlooked an aspect of the question.

  1. P/Invoke signatures must use ref - Functions like mdb_cursor_get write back values (native code sets size/data pointers). Can't use in.
  2. CompareFunction delegate must use ref - This is called FROM native code. The marshalling requires ref, not in:
    delegate int CompareFunction(ref MDBValue left, ref MDBValue right);
  3. IComparer passes by value - The standard interface signature is Compare(T x, T y), not Compare(in T x, in T y)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, there's probably several of the interop methods that could benefit from changing to 'in' instead of ref.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ref is fine, in just behaves like a ref without the need to use the ref keyword from the caller.
It's fine if not all methods use in or ref, like Compare. It's not a requirement, just that readonly allows the usage of in. We can still to ref when it's required.

mdb_cursor_get write back values...

Haven't checked the code, maybe a custom mutable and reusable struct (or class) could be used to pass to these, and then the library creates immutable ones.

Just brainstorming in case we can find patterns.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even with readonly it can pass structs to be updated, as long as we know when it's done, ideally only during initialization:

private (MDBResultCode resultCode, MDBValue key, MDBValue value) Get(CursorOperation operation)
{
    MDBValue mdbKey = default;
    MDBValue mdbValue = default;
    
    unsafe
    {
        var result = mdb_cursor_get(
            _handle, 
            ref Unsafe.AsRef<MDBValue>(in mdbKey), 
            ref Unsafe.AsRef<MDBValue>(in mdbValue), 
            operation);
        return (result, mdbKey, mdbValue);
    }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this PR is fine, if you want to do changes maybe isolated them in separate PRs, no need to dead-lock PRs.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was less hung up on the PR comments, though I don't mind queueing up your feedback before pulling in. I was more delayed from a company party and corresponding activities last week.

@CoreyKaylor
Copy link
Owner Author

Benchmarks comparing to v0.20.0 is quite a bit slower. Trying to narrow in specifically to what it is causing things to slow down.

ReadBenchmarks

OpsPerTx ValueSize v0.20.0 optimizations Change
1 8 78.18 ns 169.0 ns +116% slower
1 64 76.81 ns 151.2 ns +97% slower
1 256 76.48 ns 146.1 ns +91% slower
100 8 5,952 ns 11,058 ns +86% slower
100 64 5,865 ns 9,270 ns +58% slower
100 256 6,423 ns 12,128 ns +89% slower
1000 8 86,140 ns 134,786 ns +56% slower
1000 64 96,559 ns 157,259 ns +63% slower
1000 256 99,618 ns 179,171 ns +80% slower

WriteBenchmarks

OpsPerTx ValueSize v0.20.0 optimizations Change
1 8 43.75 μs 64.21 μs +47% slower
1 64 44.45 μs 61.27 μs +38% slower
1 256 44.07 μs 52.43 μs +19% slower
100 8 53.98 μs 68.49 μs +27% slower
100 64 51.70 μs 65.30 μs +26% slower
100 256 64.77 μs 82.80 μs +28% slower
1000 8 161.62 μs 198.13 μs +23% slower
1000 64 198.75 μs 339.59 μs +71% slower
1000 256 272.54 μs 405.36 μs +49% slower

@CoreyKaylor
Copy link
Owner Author

Summary of Tests

Test Read (1 op) Write (1 op) Status
v0.20.0 baseline 78 ns 44 μs ✓ Fast
MDBValue only (readonly struct + helpers, with ref) 77 ns 45 μs ✓ Fast
ref→in only (struct MDBValue, with in) 102 ns 54 μs ⚠️ Partial regression
Full optimizations (readonly + in) 169 ns 64 μs ✗ Full regression

Conclusion

  1. MDBValue changes are fine - readonly struct, [SkipLocalsInit], new helper methods cause NO performance impact when used with ref P/Invoke
  2. ref→in is the culprit - This P/Invoke change alone causes ~30% regression
  3. The combination is worse - ref→in with readonly struct together cause ~100% regression (double the impact)

@CoreyKaylor CoreyKaylor merged commit 5805814 into main Dec 23, 2025
3 checks passed
@CoreyKaylor CoreyKaylor deleted the optimizations branch December 23, 2025 14:39
@AlgorithmsAreCool
Copy link
Collaborator

AlgorithmsAreCool commented Dec 23, 2025

Summary of Tests

Test Read (1 op) Write (1 op) Status
v0.20.0 baseline 78 ns 44 μs ✓ Fast
MDBValue only (readonly struct + helpers, with ref) 77 ns 45 μs ✓ Fast
ref→in only (struct MDBValue, with in) 102 ns 54 μs ⚠️ Partial regression
Full optimizations (readonly + in) 169 ns 64 μs ✗ Full regression
Conclusion

  1. MDBValue changes are fine - readonly struct, [SkipLocalsInit], new helper methods cause NO performance impact when used with ref P/Invoke
  2. ref→in is the culprit - This P/Invoke change alone causes ~30% regression
  3. The combination is worse - ref→in with readonly struct together cause ~100% regression (double the impact)

interesting, there must be some hidden defensive copying going on in there. I wonder if it is in the LibraryImport generated source.

Curiously, the IL and ASM outputs are the same for each of the 3 argument styles, i'm not sure where the regression is coming from:

https://sharplab.io/#v2:C4LghgzgtgPgAgJgIwFgBQcAMACOSB0ASgK4B2wAllAKb4DCA9lAA4UA21ATgMpcBuFAMbUIAbnRZcBEuSq0AkuS4NmvTgOFiJAZlwJsdbAG902M7l1kIYAGbVcAFmyFqNgIKcA5gApOr7ACyACIAQgBqYGzE9jYMDACUxqbmKTYUAB7UACbeweGR0QBU2MzAnNgAvNgAZLEJySlmJmiNrYEAnnAArN6lnPHiLW0Avg3Yo0NmY3CWpNZ2jtiKHj4UpIGhEVExcYnNbdhpmTl5W0UlZZU1dfFjKfsH5gGdPX0Dd+YTKV/m07Pz9jgThcNhcYCyAHlSGx2itfP4/OCGND2ht8ttDrskpNGkdsrlNgVqMU+ldarsPk1KY1nt1emV3jjvmMflMcQBtIJsNjyFgMTjAbwAIhsYAA1tQ2BQAEZC+IAXTGzE4FD4YGAgKQADZsFZbPZqOkNZx1kCOnTTkTijdBhN0OzuGKKMwADIMQSRCCKCjARUYXSIrLImG6ub67AQMrEQTANFnajofZrY2kSLYQPB1GkZMRigAL2og2wya4qbY6eoSJR2Gl7Q1xSy6rAtqAA

@CoreyKaylor
Copy link
Owner Author

I'll make another pass at it and remove the LibraryImport in favor of the DllImport and see if the performance is the same. While I've had it added for LibraryImport since it was introduced, I'm not sure it's bought much in terms of the implementation. If it ends up speeding up by removing it with 'in' included, it would be worth dropping probably.

@AlgorithmsAreCool
Copy link
Collaborator

The libraryimport itself isn't doing much besides pinning the args and passing them onto the inner dllimport. And i'm pretty sure that pinning a valuetype is a noop so i don't think there is any hurt in it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants