Skip to content

Conversation

uniphil
Copy link

@uniphil uniphil commented Apr 11, 2025

Cardinality estimator makes assumptions about the values it receives when deserializing which seem to be upheld by serde_json but not serde in general, leading to buffer overflows and other problems when used with other serde de/serializers.

I had the wildest crashes happening in a project, which I traced to cardinality-estimator. In narrowing down a smaller reproducible example, I found that using bincode with this library was sufficient to get buffer overflows: see https://github.com/uniphil/whose-overflow-is-it-anyway

This PR adds a fuzz target for postcard, whose API is close enough to serde_json for the change from the serde fuzz target to be a single line. It also adds postcard calls to the normal serde tests, though I wasn't able to specifically write a failing regression test case for it (I'm sure it's possible, just had to stop somewhere).

To be clear: this change only illustrates the issue and does not include a fix. Seems like fixing will require either

  • fixing the deserialization logic to assume less about the inputs it receives (array alignment and length?), OR
  • documenting explicitly that the serde support is only safe to work with serde_json.

Thanks @DavidBuchanan314 for help narrowing the source of the crashes <3

Cardinality estimator makes assumptions about the values it receives when deserializing which are upheld by _`serde_json`_ but not `serde` in general, leading to buffer overflows and other problems when used with other serde de/serializers.

I had the wildest crashes happening in a project, which I traced to being at least _related_ to cardinality-estimator, since they stopped when I shimmed it out. In narrowing down a smaller reproducible example, I found that using bincode with this library was sufficient to get buffer overflows: see https://github.com/uniphil/whose-overflow-is-it-anyway

This PR adds a fuzz target for `postcard`, whose API is close enough to `serde_json` for the change from the serde fuzz target to be a single line. It also adds postcard calls to the normal serde tests, though I wasn't able to specifically write a failing regression test case for it (I'm sure it's possible, just had to stop somewhere).

To be clear: this change only illustrates the issue and does not include a fix. Seems like fixing will require either

- fixing the deserialization logic to assume less about the inputs it receives (array alignment and length?), OR
- documenting explicitly that the `serde` support is **only** safe to work with `serde_json`.

Thanks @DavidBuchanan314 for help narrowing the source of the crashes <3
@austinhartzheim
Copy link
Collaborator

Thanks for the write-up and fuzzing harness! I'll try to take a look in the next few days.

@uniphil
Copy link
Author

uniphil commented Apr 12, 2025

Thanks for the quick reply!

I added another fuzz harness for serde_json that directly creates arrays with serde_json::Value, and unfortunately this also quickly discovers the heap buffer overflow. So I think it might be a deserialization problem in general, and not one necessarily hidden by serde_json.

@uniphil uniphil changed the title Fuzz: buffer overflow for non-JSON serde deserializers Fuzz: serde deserializer heap buffer overflow Apr 12, 2025
@austinhartzheim
Copy link
Collaborator

Thank you. I did look at this last night and did notice that it is reproducible with serde_json.

I have a working patch, but want to spend a bit more time making sure it's property tested.

@uniphil
Copy link
Author

uniphil commented Apr 12, 2025

amazing! looking forward to it :)

@uniphil
Copy link
Author

uniphil commented Apr 22, 2025

Hey @austinhartzheim just a tiny nudge: any chance you can share the patch, if testing/getting it ready to merge will take more time?

I'll find another workaround if not, just checking

uniphil added 3 commits May 12, 2025 14:53
this harness seems to replicate my original memory safety crashes that i've now confirmed were not due to my own code feeding bad bytes to the cardinality-estimator deserializer: cardinality-estimator's *own* serialized bytes are enough.

this suggests that no privileged access to write malicious bytes into the serialized representations is required to cause a heap overflow from cardinality-estimator.
only need insert after roundtripping
works with u8s, no need for Arbitrary after all

also filled out the makefile for the new fuzz harnesses
@uniphil
Copy link
Author

uniphil commented May 12, 2025

fyi: i added another fuzz harness that shows the heap buffer overflow even without malicious bytes being fed to the deserializer: all fuzz inputs are only .inserted to the estimator, but serializing and then deserializing still reproduces the problem.

@uniphil uniphil changed the title Fuzz: serde deserializer heap buffer overflow Fuzz: serialization heap buffer overflow via round-trip and deserialize May 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants