-
Notifications
You must be signed in to change notification settings - Fork 57
Open
Description
I've been experimenting with the LION optimizer in your other (great) Imagen repository. I can share my anecdotal experience and combinations:
- Models of different sizes 0.2B, 0.7B and 1B params.
- Betas such as
beta1 0.95
andbeta2 0.98
- Learning rates
1e-4
,3e-5
and1e-5
. - Triton kernel turned both
True
andFalse
.
Training was indeed fast but unfortunately in the end always ended up yielding NaNs.
I think a potential issue could be how LION interacts with a warmup schedule; I am not sure if you're supposed to do warmup with this optimizer or not (which I always did).
Metadata
Metadata
Assignees
Labels
No labels