Skip to content

Question about latent range in DDIM inversion: X_T or X_{T-1}? #10

@Donus-S

Description

@Donus-S

Dear Authors,

Thank you for your great work.

I have a question regarding the following DDIM inversion code:

diff_harmon.py line 268~287

for t in tqdm(timesteps[:-1], desc="DDIM_inverse"):
    latents_input = torch.cat([latents] * 2)
    noise_pred = model.unet(latents_input, t, encoder_hidden_states=context)["sample"]
    noise_pred_uncond, noise_prediction_text = noise_pred.chunk(2)
    noise_pred = noise_pred_uncond + guidance_scale * (noise_prediction_text - noise_pred_uncond)

    next_timestep = t + model.scheduler.config.num_train_timesteps // model.scheduler.num_inference_steps
    alpha_bar_next = model.scheduler.alphas_cumprod[next_timestep] \
        if next_timestep <= model.scheduler.config.num_train_timesteps else torch.tensor(0.0)

    # leverage reversed x0
    reverse_x0 = (1 / torch.sqrt(model.scheduler.alphas_cumprod[t]) * (
        latents - noise_pred * torch.sqrt(1 - model.scheduler.alphas_cumprod[t])))

    latents = reverse_x0 * torch.sqrt(alpha_bar_next) + torch.sqrt(1 - alpha_bar_next) * noise_pred

    all_latents.append(latents)

# all_latents[N] -> N: DDIM steps (X_{T-1} ~ X_0)
return latents, all_latents

From what I understand, when t = T-1 (961), the next_timestep becomes T (981), meaning the alpha_bar_next is α_T (α_981), so the newly computed latent should correspond to X_T (X_981).

However, according to the comment at the end of the code (all_latents[N] corresponds to X_{T-1} ~ X_0), it seems the stored latents start from X_{T-1}, not X_T.

Could you please clarify this point? Specifically: why is the range of all_latents described as X_{T-1} ~ X_0, instead of X_T ~ X_0?

Thank you in advance for your help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions