 I’m curious about why you chose the dimensions below. Looking forward to your reply. Thank you! # (1,24,4608,4608) -> (1,24,4096,512)#?why do this? attention_probs = attention_probs[:,:,text_length:,:text_length].cpu()