Skip to content

Commit 4a600d2

Browse files
committed
realistic settings, keep seq len at 256 as still unoptimized
1 parent 031b858 commit 4a600d2

File tree

1 file changed

+4
-4
lines changed

1 file changed

+4
-4
lines changed

train.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818
GRAD_ACCUM_EVERY = 4
1919
LEARNING_RATE = 1e-4
2020
VALIDATE_EVERY = 100
21-
PRIME_LENGTH = 128
21+
PRIME_LENGTH = 64
2222
GENERATE_EVERY = 500
2323
GENERATE_LENGTH = 256
2424
SEQ_LEN = 256
@@ -95,9 +95,9 @@ def base_decoding(
9595
use_sparse_attn = USE_SPARSE_ATTN,
9696
sparse_attn_kwargs = dict(
9797
sliding_window_size = 32,
98-
compress_block_size = 4,
99-
selection_block_size = 4,
100-
num_selected_blocks = 1,
98+
compress_block_size = 32,
99+
selection_block_size = 32,
100+
num_selected_blocks = 2,
101101
)
102102
).cuda()
103103

0 commit comments

Comments
 (0)