-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Description
🚀 Describe the improvement or the new tutorial
Hello PyTorch Team,
cc @svekars
I recently came across the PyTorch warm starting tutorial
. While informative, I noticed it currently lacks a motivating example and some supporting code.
In my recent research, I explored warm starting by reusing a few layers from an LLM and retraining a smaller model. Surprisingly, in our case, the smaller warm-started model actually outperforms its larger counterpart whose weights it inherits. For example, see Figure 1 in our experiments with GPT-2 XL (1.5B).
Code: train_iniheritune.py
Paper: https://arxiv.org/abs/2404.08634
Please let me know if you’d be open to working on this jointly. I believe it could provide real value to the PyTorch community.
Best,
Sunny

Existing tutorials on this topic
List of existing tutorial is already attached in my previous text. Attaching again.
PyTorch warm starting tutorial
Additional context
Warm starting paper and codebase to be used if we end up collaborating.
Code: train_iniheritune.py