Skip to content

Conversation

@wenming21
Copy link
Contributor

@wenming21 wenming21 commented Oct 2, 2025

  1. CI.yml fixes the problem of missing files corresponding to Python versions when building. whl files on various platforms.
  2. Add support for Python 3.14 and Pypy3.11.
  3. Optimize code using mimaloc memory allocator in windows \ macos \ musllinux to improve the running speed of Python bindings.
  4. The find_data and find_absolute_path methods have been added to Finder, which return the matching value and absolute path respectively.
  5. test_bindings.py and test.yml all tests passed.
  6. test.yml add support for windows.
  7. Comments are added in lib.rs.
    @night-crawler

1. Optimize the code and use the mimalloc memory allocator to improve the running speed of the Python binding.
2. The find_data and find_absolute_path methods have been added to Finder, which return the matching value and absolute path respectively.
3. test_bindings.py all tests passed, running on Windows 10 with python version 3.13.7.
4. Comments are added in lib.rs.
migrate to version 1.1.0
Copy link
Owner

@night-crawler night-crawler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general, ty! :)

I've read the code only once and left some comments. One thing that really bothers me is cache.

src/lib.rs Outdated
static GLOBAL: MiMalloc = MiMalloc;

// Query cache for caching parsed JSONPath queries
static QUERY_CACHE: LazyLock<Mutex<HashMap<String, JpQuery>>> =
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A side question: is parsing that slow so caching improves perf? If yes, how much? Just curious.

Apart from that, there totally can be a user taking jsonpath expressions from user input, or anything else that will lead to high cardinality values stored in the cache, so it will not be memory-efficient for them.

Also, the max number of paths cached is not capped, so if someone has a scenario with millions of jsonpath expressions, all of them will eat memory.

So I'd say if it really improves perf, the cache should be capped, maybe there's already something in deps the original jsonpath rust lib that does it (something LRU like), haven't check myself yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caching improves performance, but not by much.


// Execute JSONPath query, return only found data values
fn find_data(self_: PyRef<'_, Self>, query: String) -> PyResult<Vec<Py<PyAny>>> {
find_internal_data(&self_.value, &query, |_| true)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of predicates here take the same closure: |_| true. Do we need it at all now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in the execute_query function, the filter method expects to receive a closure that implements the FnMut trait, and the parameter type of this closure is &QueryRef<'a, Value>.

}

// Execute query and return data value list
fn find_internal_data(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nit: find_internal_data is a misleading name, because it assumes existence of some internal_data, I think a better name should be find_data_internal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this function was originally find_internal_data

# "$..book[?@.author ~= '(?i)REES']",
"$..*",
]
queries_results = [
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original lib should already be tested, and this test looks it's duplicating the original lib tests. It's better than my prints I forgot to convert to proper tests before, though I'd prefer to have it as a smoke-test.

Apart from that, if the lib changes behavior, some of these tests will fail, and it's not our job here to freeze contracts for the downstream lib that we don't control. Having said that, the test could run a single expression that it checks (something simple that is unlikely to fail), and the rest could be just asserted for non-empty result.

@night-crawler night-crawler merged commit a0eafc4 into night-crawler:main Oct 7, 2025
100 checks passed
@wenming21
Copy link
Contributor Author

Can you publish to pypi when you have the opportunity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants