can you explain how you handle the pad token
since in batches, we have to pad caption so that all have the same length
then we feed that padded caption into the LSTM right?
then how and where did you ignore the pad token kindly if anyone explains
thanks