Not sure if a bug or not... #37
jamestwebber
started this conversation in
General
Replies: 2 comments 3 replies
-
Thanks for raising this! I misinterpreted the issue yesterday, sorry about that. I've fixed the fan-in. Do let me know if these implementations match. |
Beta Was this translation helpful? Give feedback.
1 reply
-
I don't think the weight shape is same as the Jax. See https://github.com/rwightman/pytorch-image-models/blob/4ea593196414684d2074cbb81d762f3847738484/timm/models/layers/std_conv.py#L79. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I opened #35 because I thought I spotted a bug in the weight standardization code but @vballoli says it's fine, so I'm opening a discussion (which I never knew existed!) to figure it out.
The code reads
This was suspicious to me as
shape[0:]
is just making a needless copy ofshape
, and calculatesfan_in
as the size of the entire tensor.The code in the deepmind repository here reads
shape[:-1]
, which meansfan_in
is the product over all but the last dimension, which makes more sense to me.Maybe I am missing a
pytorch
vsjax
implementation difference? What's the reason for the discrepancy?Beta Was this translation helpful? Give feedback.
All reactions