Exploding Gradients With 4 Layers

I'm using EGNN with 4 layers (where I also do global attention after each layer), and I'm seeing exploding gradients after 90 epochs or so. I'm using techniques discussed earlier (sparse attention matrix, coor_weights_clamp_value, norm_coors), but I'm not sure if there's anything else I should be doing.  I'm also not updating the coordinates, so the fix in the pull request doesn't apply.