-
Couldn't load subscription status.
- Fork 2
migrate to version 1.1.0 #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1. Optimize the code and use the mimalloc memory allocator to improve the running speed of the Python binding. 2. The find_data and find_absolute_path methods have been added to Finder, which return the matching value and absolute path respectively. 3. test_bindings.py all tests passed, running on Windows 10 with python version 3.13.7. 4. Comments are added in lib.rs.
migrate to version 1.1.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in general, ty! :)
I've read the code only once and left some comments. One thing that really bothers me is cache.
src/lib.rs
Outdated
| static GLOBAL: MiMalloc = MiMalloc; | ||
|
|
||
| // Query cache for caching parsed JSONPath queries | ||
| static QUERY_CACHE: LazyLock<Mutex<HashMap<String, JpQuery>>> = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A side question: is parsing that slow so caching improves perf? If yes, how much? Just curious.
Apart from that, there totally can be a user taking jsonpath expressions from user input, or anything else that will lead to high cardinality values stored in the cache, so it will not be memory-efficient for them.
Also, the max number of paths cached is not capped, so if someone has a scenario with millions of jsonpath expressions, all of them will eat memory.
So I'd say if it really improves perf, the cache should be capped, maybe there's already something in deps the original jsonpath rust lib that does it (something LRU like), haven't check myself yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caching improves performance, but not by much.
|
|
||
| // Execute JSONPath query, return only found data values | ||
| fn find_data(self_: PyRef<'_, Self>, query: String) -> PyResult<Vec<Py<PyAny>>> { | ||
| find_internal_data(&self_.value, &query, |_| true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of predicates here take the same closure: |_| true. Do we need it at all now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in the execute_query function, the filter method expects to receive a closure that implements the FnMut trait, and the parameter type of this closure is &QueryRef<'a, Value>.
| } | ||
|
|
||
| // Execute query and return data value list | ||
| fn find_internal_data( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nit: find_internal_data is a misleading name, because it assumes existence of some internal_data, I think a better name should be find_data_internal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of this function was originally find_internal_data
| # "$..book[?@.author ~= '(?i)REES']", | ||
| "$..*", | ||
| ] | ||
| queries_results = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original lib should already be tested, and this test looks it's duplicating the original lib tests. It's better than my prints I forgot to convert to proper tests before, though I'd prefer to have it as a smoke-test.
Apart from that, if the lib changes behavior, some of these tests will fail, and it's not our job here to freeze contracts for the downstream lib that we don't control. Having said that, the test could run a single expression that it checks (something simple that is unlikely to fail), and the rest could be just asserted for non-empty result.
Fix the issue where mimalloc cannot be used under Linux.
|
Can you publish to pypi when you have the opportunity |
@night-crawler