Commit bde3535
committed
RV64: Avoid repeated VLEN-evaluation in rejection sampling
For VLEN >= 512, there are tail iterations in the rejection
handling loop where we require less coefficients than fit into
a vector, requiring a adjustment of the dynamic VL.
The previous code did re-evaluate the dynamic VL in every iteration,
which incurred a signifcant runtime cost. This commit instead splits
the rejection sampling loop in two nested loops, where the inner loop
proceeds for a fixed VL and the outer loop re-evaluates the VL.
For VL <= 256, there is only one iteration of the outer loop, rendering
it as efficient as the original verison.
Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>1 parent 2567720 commit bde3535
1 file changed
+22
-16
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
757 | 757 | | |
758 | 758 | | |
759 | 759 | | |
760 | | - | |
761 | | - | |
762 | | - | |
763 | | - | |
764 | | - | |
765 | | - | |
766 | | - | |
767 | | - | |
768 | | - | |
769 | | - | |
770 | | - | |
771 | | - | |
772 | | - | |
| 760 | + | |
| 761 | + | |
| 762 | + | |
| 763 | + | |
773 | 764 | | |
774 | | - | |
| 765 | + | |
| 766 | + | |
| 767 | + | |
| 768 | + | |
| 769 | + | |
| 770 | + | |
| 771 | + | |
| 772 | + | |
| 773 | + | |
| 774 | + | |
| 775 | + | |
| 776 | + | |
| 777 | + | |
| 778 | + | |
| 779 | + | |
| 780 | + | |
| 781 | + | |
| 782 | + | |
775 | 783 | | |
776 | | - | |
777 | | - | |
778 | 784 | | |
779 | 785 | | |
780 | 786 | | |
| |||
0 commit comments