-
Notifications
You must be signed in to change notification settings - Fork 19
Rust graph (Peter) #175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rust graph (Peter) #175
Conversation
CodSpeed Performance ReportMerging #175 will degrade performances by 96.65%Comparing Summary
Benchmarks breakdown
|
7bf35d6 to
27c8c8f
Compare
947e083 to
4f71bc0
Compare
4f71bc0 to
3fa2db1
Compare
8a78c75 to
835dded
Compare
ba5d947 to
2f7ce49
Compare
b278283 to
ebe4fdb
Compare
8e62e75 to
fbeb020
Compare
2fe004a to
216dee2
Compare
pytest-benchmark will actually determine a suitable number of rounds+iterations automatically (typically much more than 3!). With only 3 rounds the results are quite noisy.
216dee2 to
04c32b3
Compare
seddonym
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving some initial comments but I've still got lots to read through and digest.
|
|
||
| pub type GrimpResult<T> = Result<T, GrimpError>; | ||
|
|
||
| impl From<GrimpError> for PyErr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking, but feels like we have a mixture of Python concerns and pure Rust concerns in this module. I think there's benefit in keeping them separate - e.g. to keep lib.rs for the py03 stuff and then everything else is pure Rust.
What do you think about moving this implementation either to exceptions.rs or to lib.rs? (Will that even compile?) We could leave the definition of GrimpError and GrimpResult here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No strong feeling about this one. Happy to change it.
| } | ||
|
|
||
| impl GraphWrapper { | ||
| fn get_visible_module_by_name(&self, name: &str) -> Result<&Module, GrimpError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why we have this on the GraphWrapper, rather than on the inner Graph?
I envisage GraphWrapper as purely being a translation layer from py03 concerns to pure Rust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it was the "visible" part that had me putting this here. Currently the methods within graph/ return modules whether they are visible or not. The filtering to visible/invisible is kept as a concern for lib.rs.
That said, we have the visible() method on ModuleIterator within graph/, so maybe the division isn't as clear as I'd imagined.
Tldr; I'm happy for it to be moved, but no strong feeling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, I think we should keep GraphWrapper focused on py03 concerns and try to move everything else into the Graph object.
| use string_interner::backend::StringBackend; | ||
| use string_interner::{DefaultSymbol, StringInterner}; | ||
|
|
||
| pub mod direct_import_queries; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for making these pub mod but the pathfinding module pub(crate) mod? Would it be better to be consistent here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good spot. I think I was imaging that if the rust crate was published on its own then all the methods of Graph would be public (pub), but the pathfinding module would probably be internal to the crate (pub(crate)). That said, I don't think we ever envisage the rust code to be published independently, so probably it could all just be pub for consistency.
|
|
||
| pub(crate) mod pathfinding; | ||
|
|
||
| lazy_static! { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO - let's talk through this so I understand the reasoning behind the approach.
| token: ModuleToken, | ||
|
|
||
| #[getset(get_copy = "pub")] | ||
| name: DefaultSymbol, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This, and the token field, are both effectively tokens right? I wonder if it is possible to combine them.
| impl<'a, T: Iterator<Item = &'a ModuleToken>> ModuleTokenIterator<'a> for T {} | ||
|
|
||
| #[derive(Debug, Clone, PartialEq, Eq, Hash, new, Getters, CopyGetters)] | ||
| pub struct ImportDetails { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is maybe using slightly different terminology to https://grimp.readthedocs.io/en/stable/usage.html#ImportGraph.get_import_details. I'm not massively happy with the name I chose for that method originally. But I think it might be worth coming up with a non-plural name for the struct here as it's more ergonomic. ImportMetadata?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes true your suggestion is better 👍 Let's change it.
| } | ||
|
|
||
| pub fn get_modules(&self) -> FxHashSet<String> { | ||
| pub fn get_modules(&self) -> HashSet<String> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting I forgot about this one. I thought it was necessary since FxHashSet doesn't implement IntoPy, but presumably your code was working before so I'm now unsure 🤔
| .get_modules() | ||
| .iter() | ||
| .map(|module| module.name.clone()) | ||
| .all_modules() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why we're renaming this method? Maybe easier to have the same name as on the wrapper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to change this - I think it's a copy over from pyimports.
| ._graph | ||
| .module_name_to_self_and_ancestors(module) | ||
| .into_iter() | ||
| .skip(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The skip(1) removes the self, leaving just the ancestors. I'll comment that in the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good!
This PR contains a rewrite of the import graph in rust. A lot of the code in this PR is ported from https://github.com/Peter554/pyimports.
Performance tricks in this PR:
find_downstream_modules,chain_exists.find_shortest_chainsandfind_illegal_dependencies_for_layers.if module in self.modulesin the python code, sometimes in a for loop. This is actually rather inefficient, since the entire graph has to be copied over from rust to a python set (every time). I addedgraph.contains_moduleas a more performant alternative. We should ideally find a way to avoid others making this same mistake. At the very least, document this. Perhaps also make it a method rather than a property e.g.graph.modules(), to show more clearly that work is being done (breaking change, but perhaps reasonable).find_shortest_chainsandfind_illegal_dependencies_for_layersin the case that the contract is kept (via an initialchain_existscheck). This should give significant benefits for users running import linter in CI, since most of the time most contracts should be kept - it makes sense to optimise for this case.