Replies: 2 comments 1 reply
-
If you are using the code you posted above then this is because the cl and mirai cluster objects are both mirai clusters. You would need to use parallel::makeCluster() to create base R clusters. |
Beta Was this translation helpful? Give feedback.
-
Which function to use is about the general API of the To be clear, you are finding that the performance of the cluster backends in base R and in |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello, my apologies if this is not the right place to post.
I recently came across your brilliant post and can confirm in my benchmarking on several x86 and ARM architectures with various core counts that your findings are consistent. I have been looking for a straightforward, low cost abstraction for multithreading massively parallel functions that operate on data frames and data tables.
I proceeded to try this approach on a slightly different scenario only to find that there is a substantial difference in performance where parallel + mirai backend appears to be substantially slower than the base R operation or purrr::map(). I am not sure what accounts for the difference. Here is a reproducible example, the function is merely intended to accept two arguments and necessitating a different parallel call than parLapply().
On different x86-64 computers, ranging from 2-24 nodes, I am seeing approximately 30-50x slower performance for clusterMap with mirai or standard backend compared to base R mapply(). purrr::map2() is approximately the same performance as mapply(). clusterMap with standard parallel backend is essentially the same performance as mirai backend.
Am I simply using the wrong call in mapply or clusterMap? Perhaps there is a more efficient construction to leverage mirai?
R 4.3.3
mirai 0.12.1
nanonext 0.13.2
parallel 4.3.3
purrr 1.0.2
Beta Was this translation helpful? Give feedback.
All reactions