-
Notifications
You must be signed in to change notification settings - Fork 0
Description
The current implementation is faster for cases when L2 cache can't hold the whole image (for 16MB of cache it's about the size of the FHD image). For lower resolutions it may be better to use the original tile-based implementation AssignThreadingStrategy::FineGrained
or AssignThreadingStrategy::CoreDistributed
.
The original implementation in FastSLIC is speedier for low resolution mainly thanks to the fact, that the tile-based parallelization does not need as much computation before the actual assignment.
There is an experimental implementation about meging assign and update step in one AssignThreadingStrategy::RowBasedFusedUpdate
, this is for big images somehow worse than AssignThreadingStrategy::RowBased
and it would be interesting to know why it's worse.
There would be very likely an problem in how we use L1 cache.
Ideally, there would be the clusters in the line sorted from left to right and after every assignment the final pixels should be immediatelly used in the accumulators of the update step. We can hint the load of the next line using the prefetch instructions.