Some issues before, years ago -
Issuer @zvezdochiot introduced his code with stb
Less memory ( convolution with layer I and II at once ) but bad performance in openMP model ( about double time ).
Take a look for less memory and keep performance in a way.