In the pre- processing before entering network model, why you shuffle the sequence of training data? Since the sequence is important in time series!