Skip to content

Investigate the possibility of use prefetch instructions to fetch the data ahead in assign step. #4

@ondrasouk

Description

@ondrasouk

The current implementation is faster for cases when L2 cache can't hold the whole image (for 16MB of cache it's about the size of the FHD image). For lower resolutions it may be better to use the original tile-based implementation AssignThreadingStrategy::FineGrained or AssignThreadingStrategy::CoreDistributed.

The original implementation in FastSLIC is speedier for low resolution mainly thanks to the fact, that the tile-based parallelization does not need as much computation before the actual assignment.

There is an experimental implementation about meging assign and update step in one AssignThreadingStrategy::RowBasedFusedUpdate, this is for big images somehow worse than AssignThreadingStrategy::RowBased and it would be interesting to know why it's worse.

There would be very likely an problem in how we use L1 cache.

Ideally, there would be the clusters in the line sorted from left to right and after every assignment the final pixels should be immediatelly used in the accumulators of the update step. We can hint the load of the next line using the prefetch instructions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions