memory  leak for multi-human policies

Hi,  when running the multi-human policy, such as sarl, lstm-rl, I noticed that there is drastic memory increase with training going on.  The used memory increased from about 4G to 20G after 100 episodes training. I debug for a long time, but still no clue about what's going wrong there.  @ChanganVR  Pls have a look.