Releases: lucidrains/native-sparse-attention-pytorch
Releases · lucidrains/native-sparse-attention-pytorch
0.0.44
when doing interpolation of importance score, remask to 0 for illegal…
0.0.43
default to one mem kv for compressed attn
0.0.42
Full Changelog: 0.0.41...0.0.42
0.0.41
ready to be compared with full attention.
0.0.40
oops
0.0.39
do the differential topk gating in a more suboptimal way, but accommo…
0.0.38
Full Changelog: 0.0.36...0.0.38
0.0.37
account for learned memory key values in flex compress mask, also cle…
0.0.36
refactor compressed pathway with gqa
0.0.35
deviate from the paper and allow for interpolation of the compressed …