Inconsistent use of mean & sum when calculating KL divergence?

There is a mean taken inside [BaseVariationalLayer_.kl_div()](https://github.com/IntelLabs/bayesian-torch/blob/main/bayesian_torch/layers/base_variational_layer.py). But later a sum is used inside [get_kl_loss()](https://github.com/IntelLabs/bayesian-torch/blob/main/bayesian_torch/models/dnn_to_bnn.py) & when reducing the KL loss of a layer's bias & weights (e.g. inside [Conv2dReparameterization.kl_loss()](https://github.com/IntelLabs/bayesian-torch/blob/main/bayesian_torch/layers/variational_layers/conv_variational.py)).

I'm wondering if there is mathematical justification for this? Why take the mean of the individual weight KL divergences only to later sum across layers?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inconsistent use of mean & sum when calculating KL divergence? #36

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inconsistent use of mean & sum when calculating KL divergence? #36

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions