-
Notifications
You must be signed in to change notification settings - Fork 48
Description
I just found out that my implementation of an ML model has been training up to several times slower than before. After some debugging, turns out that the cause was #276, where the MulAcc
trait was introduced in d56bd72, and specifically where the Copy
bound was removed in d2f0da7. For details please see tomtung/omikuji#32.
My intuitive guess is that the trait MulAcc
got rid of the Copy
bound at the cost of forcing the values to be passed as references (pointers). This might have made it hard for the compiler to do inlining and/or other optimizations. since the +=
operation is typically in the innermost part of the loop, any slow-down in such hot spots would result in significant speed regression.
I don't have context on the introduction of MulAcc
, so I'm not sure what's the right way to fix this. Let me know if there's something I could help with.