This repository demonstrates a complete pipeline for optimizing deep learning models through a series of techniques designed to create efficient, deployment-ready models without sacrificing performance.
The pipeline consists of four main stages:
- Teacher Model Fine-tuning - Optimize a large pre-trained model for the target task
- Knowledge Distillation - Transfer knowledge from teacher to a smaller student model
- Student Model Fine-tuning - Compare Student Distillation with Standard Fine-Tuning
- Post-training Quantization - Compress the model for deployment in resource-constrained environments
.
├── Finetuning_Teacher.ipynb # Fine-tune the teacher model on your dataset
├── Distillation_on_student_model.ipynb # Train a lightweight student model via distillation
├── Finetuning_Student.ipynb # Further fine-tune the base student model
├── Post_Training_Quantisation_on_Student.ipynb # Apply quantization for model compression
└── README.md # This file
Model | Accuracy | Precision | Recall | F1 Score | Notes |
---|---|---|---|---|---|
Teacher (BERT base) | 0.8711 | 0.9073 | 0.8267 | 0.8651 | Full-sized model |
Student (Distilled) | 0.9267 | 0.9486 | 0.9022 | 0.9248 | Smaller architecture |
Student (Fine-tuned) | 0.8620 | - | - | - | Base Student model |
Student (Quantized) | 0.9156 | 0.9401 | 0.8908 | 0.9148 | 4-bit quantization |
We start by fine-tuning a BERT base model on our classification task:
model_path = "google-bert/bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_path)
id2label = {0: "Safe", 1: "Not Safe"}
label2id = {"Safe": 0, "Not Safe": 1}
model = AutoModelForSequenceClassification.from_pretrained(model_path,
num_labels=2,
id2label=id2label,
label2id=label2id,)
Knowledge distillation transfers the knowledge from the teacher model to a smaller student model:
from transformers import DistilBertForSequenceClassification, DistilBertConfig
config = DistilBertConfig(n_heads=8, n_layers=4)
student_model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased",
config=config,).to(device)
The distillation process uses a combination of soft targets (teacher logits) and hard targets (true labels):
def distillation_loss(student_logits, teacher_logits, true_labels, temperature, alpha):
soft_targets = nn.functional.softmax(teacher_logits / temperature, dim=1)
student_soft = nn.functional.log_softmax(student_logits / temperature, dim=1)
distill_loss = nn.functional.kl_div(student_soft, soft_targets, reduction='batchmean') * (temperature ** 2)
hard_loss = nn.CrossEntropyLoss()(student_logits, true_labels)
loss = alpha * distill_loss + (1.0 - alpha) * hard_loss
return loss
After distillation, we fine-tune the base student model on the target dataset to compare performance with our distllation training performance.
Finally, we apply 4-bit quantization using the BitsAndBytes library:
from transformers import BitsAndBytesConfig
import bitsandbytes as bnb
nf4_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True
)
model_nf4 = AutoModelForSequenceClassification.from_pretrained(model_id,
device_map=device,
quantization_config=nf4_config)
- Start by running
Finetuning_Teacher.ipynb
to create a well-tuned teacher model - Run
Distillation_on_student_model.ipynb
to transfer knowledge to the student model - Run
Finetuning_Student.ipynb
for comaparitive study of fine-tuning on the student model - Run
Post_Training_Quantisation_on_Student.ipynb
to create a deployment-ready quantized model
- PyTorch
- Transformers
- BitsAndBytes
- Scikit-learn
- NumPy
This pipeline demonstrates how to effectively compress models through knowledge distillation and quantization while maintaining or even improving performance. The final quantized student model is significantly smaller than the original teacher model while maintaining comparable accuracy metrics, making it suitable for deployment in resource-constrained environments.