Always getting NaNs in long training

I've been experimenting with the LION optimizer in your other (great) Imagen repository. I can share my anecdotal experience and combinations:

- Models of different sizes 0.2B, 0.7B and 1B params. 
- Betas such as `beta1 0.95` and `beta2 0.98`
- Learning rates `1e-4`, `3e-5` and `1e-5`.
- Triton kernel turned both `True` and `False`.

Training was indeed fast but unfortunately in the end always ended up yielding NaNs.

I think a potential issue could be how LION interacts with a warmup schedule; I am not sure if you're supposed to do warmup with this optimizer or not (which I always did). 

![image](https://github.com/lucidrains/lion-pytorch/assets/40274004/7159adce-ecdd-48c6-909f-116a4a948e23)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Always getting NaNs in long training #33

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Always getting NaNs in long training #33

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions