What is meaning of MEAN_LOSS = False | True

What's effective loss scaling? Does it sum or mean over classes? over batch size?

How does it interact with distributed training? Is there anywhere scaling over the world size?