Commit 75b3d42
authored
Fix GQA fusion to produce present key/value (#2634)
Output present key value from the Attention op because past key value is
provided. Previously the Attention op created would consume past
key/value but not produce present key/value, which is not correct for
ORT.
<img width="1377" height="1225" alt="image"
src="https://github.com/user-attachments/assets/118958b4-bc27-4912-b70b-000549887c0f"
/>
Replaces #2632
Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>1 parent 811937c commit 75b3d42
1 file changed
+2
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
52 | 52 | | |
53 | 53 | | |
54 | 54 | | |
55 | | - | |
| 55 | + | |
56 | 56 | | |
57 | 57 | | |
58 | 58 | | |
| |||
103 | 103 | | |
104 | 104 | | |
105 | 105 | | |
| 106 | + | |
106 | 107 | | |
107 | 108 | | |
108 | 109 | | |
| |||
0 commit comments