The model can produce NaNs if the hyper parameters (like learning rate) is too high (potentially also if the input data contains NaNs?) It would be nice it this was detected and a helpful message would suggest a solution.