why you use the batch normalization **after** the Relu opration instead of the batch normalization **before** the Relu opration