Investigate the possibility of use prefetch instructions to fetch the data ahead in assign step.

The current implementation is faster for cases when L2 cache can't hold the whole image (for 16MB of cache it's about the size of the FHD image). For lower resolutions it may be better to use the original tile-based implementation `AssignThreadingStrategy::FineGrained` or `AssignThreadingStrategy::CoreDistributed`.

The original implementation in FastSLIC is speedier for low resolution mainly thanks to the fact, that the tile-based parallelization does not need as much computation before the actual assignment.

There is an experimental implementation about meging _assign_ and _update_ step in one `AssignThreadingStrategy::RowBasedFusedUpdate`, this is for big images somehow worse than `AssignThreadingStrategy::RowBased` and it would be interesting to know why it's worse.

There would be very likely an problem in how we use L1 cache.

Ideally, there would be the clusters in the line sorted from left to right and after every assignment the final pixels should be immediatelly used in the accumulators of the update step. We can hint the load of the next line using the prefetch instructions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Investigate the possibility of use prefetch instructions to fetch the data ahead in assign step. #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Investigate the possibility of use prefetch instructions to fetch the data ahead in assign step. #4

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions