Skip to content

Conversation

jpromeror
Copy link

This PR includes a series of commits intended to reduce the computational time when running the algorithm in doublet mode. Mostly intended for high definition assays (big number of spots), but usable with all other methods.

SpaceXR_SpeedUp

Main changes:

1. Change parallelization approach

  • Use foreach + DoParallel for the multicore implementation. This prevents launching multiple R sessions and we also included a progress bar for a cleaner UI.

2. Speed up gather_results

  • Bottle neck in the algorithm. Fully vectorized the function to remove unnecessary loops. Added progress bar as well

3. General speed up

  • Modified existing functions to improve overall performance.

4. Add MIN_OBS as parameter to create.RCTD

  • Adds MIN_OBS as a variable to allow running in specific scenarios. Allows more control and customized running, but user should be aware of the drawbacks (i.e. sampling noise). Kept the original default value.

To do list:

  • Adapt new parallelization approach to other modes (full & multi)
  • Improve screen messages for easier progress tracking


results_df <- data.frame(spot_class = factor(sapply(results,function(X){return(X$spot_class)}),levels=spot_levels),
first_type = sapply(results,function(X){return(X$first_type)}),
scond_type = sapply(results,function(X){return(X$second_type)}),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a typo of 'second_type' here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching that!

@dpaysan
Copy link

dpaysan commented Apr 25, 2025

I have tried running the branch but in my case the function choose_sigma_c.R now runs substantially slower: While with the version from dmcable on my Visium HD data the 8 epochs complete within one hour using 28 cores, not even one epoch is processed within that time using the updated version proposed in this pull request. Could that be due to the fact that no cluster is created with makeClusters ? I also see very little CPU usage in general with most threads being in the S state and not running.

@jpromeror
Copy link
Author

jpromeror commented Apr 25, 2025

Hi @dpaysan! It is a bit difficult to identify the issue without any code. Here are some suggestions to see if we can get it to work as intended:

  1. We only updated the algorithm when running in "doublet mode". Can you confirm you are using this mode?

  2. We changed the parallel approach and we don't use makeClusters anymore (instead registerDoParallel(cores=max_cores) ), if your multicore session in enabled you should see a message like this: "Multicore enabled using ", max_cores," cores". Can you confirm this is the case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants