Hey guys, I am looking for some clarity on:
- If Feature Distillation is suitable (beneficial) for my use case.
- How do I go about the implementation.
Use case: I have trained a SwinV2-L model on a custom dataset and received decent results. But, I want to reduce the model size and inference time. Perhaps by using a smaller model like SwinV2-T.
In the repo, I can only see code for distilling ESVIT. So, I will have to add support for Swin as the teacher model.
Please let me know how to go about this and whether it is suited for my use case.