-
Notifications
You must be signed in to change notification settings - Fork 617
corrects capacity calculations #414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I'm a maintainer at kornia-rs, which has its own native implementation of apriltag, and I made a few observations The current structure for the table slots is - which, ends up being 16 bytes long due to alignment, thus we lose 4 bytes to padding. To remedy this, we could decouple the rcode and the rest of the metadata to achieve something like this struct QuickDecodeEntry { struct QuickDecode { storing exactly the amount of data we need in just 11 bytes per entry, resulting in a 31% improvement in memory usage. Another benefit is more optimal cache utilisation, as during matching, we no longer need to load in the metadata, which polluted the cache line i did a simple implementation of this in Rust, and these were the results I got - Creation – Original: ~14.7 ms Lookup – Original: ~872 µs Would you be interested in these optimizations? |
|
Additionally, another minor point: Is the rotation field in the quickdecode table effectively useless? It's never set anywhere and is immediately overwritten in the quick_decode_codeword function. pretty sure we can safely remove it; it would not save space in the current implementation due to padding, but still a good thing to clean up, in a more optimized implementation, we could only store the "cannonical" form of the code, ie the one which is the minimal in value amongst all 4 rotations, and also store how many rotations it took to achieve that, and while lookups, instead of 4 lookups one for each rotation, we could just lookup the minimum rotation, and calculate how much offset it took, thereby reducing the number of random accesses from 4 to 1, which may be very good for performance if the tables are larger and don't fit in CPU caches this with the one above beings the space needed from 6.3 gbs to 2.2 Gbs, while having a 3 times faster quick decode function |
Yes, sure. If you improve the runtime performance without affecting the detection performance, why not. |
The |
|
I meant the space in the table to store the rotation is not used efficiently, since the value is computed inside the function, it's possible to remove it entirely from the table and have the rotation value returned some other way |
I see. You mean because |
|
Fixes #413. |
|
I can confirm that this roughly halves the memory requirements for the |
This fixes the error in the capacity calculation math, where more than 6 times more space was being allocated than needed, because the capacity math used permutations instead of combination