About the training loss.

Hi Yuxin!

Thanks for your great work!

When reading the paper, I am confused about the training loss of the student model. The paper said "we fine-tune our student model S by minimizing the cross-entropy loss." So, how to use the CE loss to fine-tune the model, and where is the code implementation for this part, thank you very much!

Best wishes!