Optimize GPU checkpoint loading by ensuring model transfer before load_state_dict on build method #325

nimanikoo · 2025-08-21T09:26:33Z

While reviewing the code in this repository, I noticed a few areas that could be optimized for efficiency. I decided to make some changes to how the models are loaded onto the GPU before applying their checkpoints. I believe this should have a positive impact on the performance and overall behavior of the code.

Thanks to everyone who contributed to this repo—really appreciate all the hard work that went into it

Summary

This PR ensures that both prefill_model and decode_model are moved to the
target device (e.g., GPU) before invoking load_state_dict.

Motivation

Previously, if the models were still on CPU when loading checkpoints, PyTorch
would perform an additional transfer of tensors, causing unnecessary overhead.
By explicitly moving the models to the correct device first, we avoid redundant
transfers and improve checkpoint loading efficiency.

Changes

Added prefill_model.to(device) before loading its state dict.
Added decode_model.to(device) before loading its state dict.

Impact

This reduces unnecessary GPU/CPU transfers during checkpoint loading, which
should result in faster and more efficient model initialization.

…ng state_dict Moved `prefill_model` and `decode_model` to the target device before calling `load_state_dict` to avoid redundant tensor transfers by PyTorch.

farazaaa · 2025-10-05T16:07:10Z

Let's go it' really good 👍

feat(checkpoint-loading): ensure models are moved to GPU before loadi…

5a2ccfa

…ng state_dict Moved `prefill_model` and `decode_model` to the target device before calling `load_state_dict` to avoid redundant tensor transfers by PyTorch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize GPU checkpoint loading by ensuring model transfer before load_state_dict on build method #325

Optimize GPU checkpoint loading by ensuring model transfer before load_state_dict on build method #325

Uh oh!

nimanikoo commented Aug 21, 2025

Uh oh!

farazaaa commented Oct 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Optimize GPU checkpoint loading by ensuring model transfer before load_state_dict on build method #325

Are you sure you want to change the base?

Optimize GPU checkpoint loading by ensuring model transfer before load_state_dict on build method #325

Uh oh!

Conversation

nimanikoo commented Aug 21, 2025

Summary

Motivation

Changes

Impact

Uh oh!

farazaaa commented Oct 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants