Skip to content

💡 The Warmstarting model tutorial needs to be updated #3579

@sanyalsunny111

Description

@sanyalsunny111

🚀 Describe the improvement or the new tutorial

Hello PyTorch Team,

cc @svekars

I recently came across the PyTorch warm starting tutorial
. While informative, I noticed it currently lacks a motivating example and some supporting code.

In my recent research, I explored warm starting by reusing a few layers from an LLM and retraining a smaller model. Surprisingly, in our case, the smaller warm-started model actually outperforms its larger counterpart whose weights it inherits. For example, see Figure 1 in our experiments with GPT-2 XL (1.5B).

Code: train_iniheritune.py

Paper: https://arxiv.org/abs/2404.08634

Please let me know if you’d be open to working on this jointly. I believe it could provide real value to the PyTorch community.

Best,
Sunny

Image

Existing tutorials on this topic

List of existing tutorial is already attached in my previous text. Attaching again.

PyTorch warm starting tutorial

Additional context

Warm starting paper and codebase to be used if we end up collaborating.

Code: train_iniheritune.py

Paper: https://arxiv.org/abs/2404.08634

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions