On normalization-free model's performance #11
ankur56
started this conversation in
Show and tell
Replies: 1 comment
-
No problem, discussions are always welcomed. It should work since the dependencies are only pytorch and nothing else. Since lightning integrates perfectly, I don't see why it shouldn't work. Do let me know if you face any usage issues. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I have my own custom 3D-Densenet model, which I am using within the Pytorch-Lightning framework. I was wondering if I can make a normalization-free version of my model using your code. As far as I can tell, I just need to change the "base.py" file to make it work for 3D-convolutions.
Edit: I converted my model to a normalization-free version and tested it. I made a couple of observations on the NF-model’s performance and behavior, and I would be grateful if anyone could shed light on them.
The NF-model takes almost twice as much time for a single epoch as the regular model for the same batch size. The paper states that batch normalization is quite a computationally demanding operation; however, the means and deviations of the weights are being computed in the “WSConv2d” layer as well. Is this the reason why the “WSConv2d” layer is more expensive than a regular “Conv2D” layer? I am not sure why the normalization-free model is more expensive than my regular model.
The convergence of the NF-model is excruciatingly slow and depends heavily on the choice of the clipping factor (lambda). In my case, a value of 0.01 for the clipping factor made the convergence extremely slow, so I changed it to 0.8 (or 1.0), which made the convergence relatively faster but still significantly slower than the regular model. I also tried changing the learning rate and batch size but couldn’t make the convergence faster. I am using the SGD optimizer with Nesterov momentum. Is there any other way to make the convergence faster?
Beta Was this translation helpful? Give feedback.
All reactions