Before I forget, this is another followup of #419.
Presently fortran can use LIMHEL>0 to use a more relaxed helicity filtering (include helicities only if one of a few events has a ME above a threshold). In cudacpp the comparison is always to 0, and as a consequence the fortran code is also modified to use LIMHEL=0. If physicists can accept LIMHEL>0 to further speed up the code (at the cost of some precision), maybe this should be implemented in cudacpp too.
HOWEVER, I would like this to be done in such a way that the results can be compared exactly to fortran. Anpther aoption is to pass to cudacpp a pre-filtered set of helicities computed in fortran 'the fortran way'.