-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Hi @xinyangATK,
I'm currently training a model on a custom dataset where node features consist of both continuous values and multi-categorical attributes (represented as multi-hot vectors).
From my understanding, when using continuous features, the get_logit_beta_stats_con
function should be used instead of get_logit_beta_stats
in the precondition module.
However, I noticed that the implementation of the expected value of logit(z_t) in the code seems to differ from the equation described in the paper (Appendix B, Eq. (29)). Specifically, the current code uses:
E1 = 1.0 / (eta * alpha_t * xmin) * (
(eta * alpha_t * xmax).lgamma() - (eta * alpha_t * xmin).lgamma())
Shouldn't the denominator be eta * alpha_t * (xmax - xmin) instead of eta * alpha_t * xmin?
It seems that E2, V1, and V2 may have the same issue as well.
After applying this correction, I observed that the values in X within sample_batch start to diverge with each iteration, and the model eventually returns NaNs.
I suspect this may be related to the behavior of get_logit_beta_stats_con
, or possibly how scale_shift
, sigmoid_start
, sigmoid_end
, and sigmoid_power
are defined.
Could you kindly advise on this?