-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Description
Dear Authors,
Thank you for your great work.
I have a question regarding the following DDIM inversion code:
diff_harmon.py line 268~287
for t in tqdm(timesteps[:-1], desc="DDIM_inverse"):
latents_input = torch.cat([latents] * 2)
noise_pred = model.unet(latents_input, t, encoder_hidden_states=context)["sample"]
noise_pred_uncond, noise_prediction_text = noise_pred.chunk(2)
noise_pred = noise_pred_uncond + guidance_scale * (noise_prediction_text - noise_pred_uncond)
next_timestep = t + model.scheduler.config.num_train_timesteps // model.scheduler.num_inference_steps
alpha_bar_next = model.scheduler.alphas_cumprod[next_timestep] \
if next_timestep <= model.scheduler.config.num_train_timesteps else torch.tensor(0.0)
# leverage reversed x0
reverse_x0 = (1 / torch.sqrt(model.scheduler.alphas_cumprod[t]) * (
latents - noise_pred * torch.sqrt(1 - model.scheduler.alphas_cumprod[t])))
latents = reverse_x0 * torch.sqrt(alpha_bar_next) + torch.sqrt(1 - alpha_bar_next) * noise_pred
all_latents.append(latents)
# all_latents[N] -> N: DDIM steps (X_{T-1} ~ X_0)
return latents, all_latents
From what I understand, when t = T-1 (961), the next_timestep becomes T (981), meaning the alpha_bar_next is α_T (α_981), so the newly computed latent should correspond to X_T (X_981).
However, according to the comment at the end of the code (all_latents[N] corresponds to X_{T-1} ~ X_0), it seems the stored latents start from X_{T-1}, not X_T.
Could you please clarify this point? Specifically: why is the range of all_latents described as X_{T-1} ~ X_0, instead of X_T ~ X_0?
Thank you in advance for your help!
Metadata
Metadata
Assignees
Labels
No labels