Skip to content

Conversation

@Cyrilvallez
Copy link
Member

@Cyrilvallez Cyrilvallez commented Nov 17, 2025

What does this PR do?

The device_map specifies the target keys when loading. The PR updates the loading accordingly, otherwise we have issues when using a device_map with any model using a _conversion_mapping (the VLMs) for example, where source and targets are different.
Currently, the only thing saving us is the fact that accelerate_dispatch will move the parameters if the device is not correct during post processing, which is why it was not detected before! But of course this is muuuuch more costly than our smart loading.

I checked very carefully (by running benchmarks AND checking source code), and performances are the same if using this PR, or if opening safetensors directly on device! This can also be verified by looking at the safetensors rust bindings here: it actually simply calls tensor.to(device) internally when calling get_slice, so this PR has no impact on performances (may even be slightly better due to avoiding to opening the files again and again)

To understand the issue, consider the following snippet:

import transformers
from transformers import AriaForConditionalGeneration
import torch

# Monkey-patch `accelerate_dispatch` just to illustrate the problem
def dummy_dispatch(*args, **kwargs):
    pass
transformers.modeling_utils.accelerate_dispatch = dummy_dispatch

model_name = "rhymes-ai/Aria"
model = AriaForConditionalGeneration.from_pretrained(model_name, device_map=0, dtype=torch.float16)

for k, v in model.state_dict().items():
    if v.device != torch.device(0):
        print(f"Param {k} is not on the correct device! Expected {0}, found {v.device}")

On main, it currently outputs:

Param model.vision_tower.embeddings.patch_embedding.weight is not on the correct device! Expected 0, found cpu
Param model.vision_tower.embeddings.patch_embedding.bias is not on the correct device! Expected 0, found cpu
Param model.vision_tower.embeddings.position_embedding.weight is not on the correct device! Expected 0, found cpu
Param model.vision_tower.encoder.layers.0.self_attn.k_proj.weight is not on the correct device! Expected 0, found cpu
Param model.vision_tower.encoder.layers.0.self_attn.k_proj.bias is not on the correct device! Expected 0, found cpu
Param model.vision_tower.encoder.layers.0.self_attn.v_proj.weight is not on the correct device! Expected 0, found cpu
Param model.vision_tower.encoder.layers.0.self_attn.v_proj.bias is not on the correct device! Expected 0, found cpu
Param model.vision_tower.encoder.layers.0.self_attn.q_proj.weight is not on the correct device! Expected 0, found cpu
...

Basically, all params are on cpu instead of 0, due to the mismatch between targets and sources in the _checkpoint_conversion_mapping.

On this PR, everything is fine again, and params are loaded immediately on the correct device.
This is also needed for my other offloading PR #42242

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ty its simpler as well

@Cyrilvallez Cyrilvallez merged commit 1742d11 into main Nov 18, 2025
24 checks passed
@Cyrilvallez Cyrilvallez deleted the fix-device-map branch November 18, 2025 08:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants