-
Notifications
You must be signed in to change notification settings - Fork 47
Description
Currently only sparse by sparse products are parallel in the smmp
module. Converting the current sparse by dense products using ndarray::parallel
should be straight forward. Here is an implementation for par_csr_mulacc_dense_colmaj
that gives a significant speedup on my machine:
pub fn par_csr_mulacc_dense_colmaj<'a, N, A, B, I, Iptr>(
lhs: CsMatViewI<A, I, Iptr>,
rhs: ArrayView<B, Ix2>,
mut out: ArrayViewMut<'a, N, Ix2>,
) where
A: Send + Sync,
B: Send + Sync,
N: 'a + crate::MulAcc<A, B> + Send + Sync,
I: 'a + SpIndex,
Iptr: 'a + SpIndex,
{
assert_eq!(lhs.cols(), rhs.shape()[0], "Dimension mismatch");
assert_eq!(lhs.rows(), out.shape()[0], "Dimension mismatch");
assert_eq!(rhs.shape()[1], out.shape()[1], "Dimension mismatch");
assert!(lhs.is_csr(), "Storage mismatch");
let axis1 = Axis(1);
ndarray::Zip::from(out.axis_iter_mut(axis1))
.and(rhs.axis_iter(axis1))
.par_for_each(|mut ocol, rcol| {
for (orow, lrow) in lhs.outer_iterator().enumerate() {
let oval = &mut ocol[[orow]];
for (rrow, lval) in lrow.iter() {
let rval = &rcol[[rrow]];
oval.mul_acc(lval, rval);
}
}
});
}
The only changes here are the parallel iterator, adding the rayon
feature for ndarray
, and adding the Sync
and Send
trait bounds to the data types inside the matrices. My concern is that adding Send + Sync
will result in these trait requirements to be unnecessarily added in many places.
Looking at the impl Mul
for CsMatBase
and CsMatBase
I see that Sync + Send
is required no matter if multi_thread
is enabled or not. Is it okay to propagate these trait requirements all the way up to many of the trait impl
s for CsMatBase
and then use the conditional feature compilation on the lowest level functions found in the prod
module? Conditionally compiling at all the higher level implementations sounds like it would get nasty very quickly, especially as more parallel features get added.