Skip to content

Always getting NaNs in long training #33

@danbochman

Description

@danbochman

I've been experimenting with the LION optimizer in your other (great) Imagen repository. I can share my anecdotal experience and combinations:

  • Models of different sizes 0.2B, 0.7B and 1B params.
  • Betas such as beta1 0.95 and beta2 0.98
  • Learning rates 1e-4, 3e-5 and 1e-5.
  • Triton kernel turned both True and False.

Training was indeed fast but unfortunately in the end always ended up yielding NaNs.

I think a potential issue could be how LION interacts with a warmup schedule; I am not sure if you're supposed to do warmup with this optimizer or not (which I always did).

image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions