Knowledge Distillation and Model Optimization Pipeline

This repository demonstrates a complete pipeline for optimizing deep learning models through a series of techniques designed to create efficient, deployment-ready models without sacrificing performance.

Overview

The pipeline consists of four main stages:

Teacher Model Fine-tuning - Optimize a large pre-trained model for the target task
Knowledge Distillation - Transfer knowledge from teacher to a smaller student model
Student Model Fine-tuning - Compare Student Distillation with Standard Fine-Tuning
Post-training Quantization - Compress the model for deployment in resource-constrained environments

Repository Structure

.
├── Finetuning_Teacher.ipynb         # Fine-tune the teacher model on your dataset
├── Distillation_on_student_model.ipynb  # Train a lightweight student model via distillation
├── Finetuning_Student.ipynb         # Further fine-tune the base student model
├── Post_Training_Quantisation_on_Student.ipynb  # Apply quantization for model compression
└── README.md                        # This file

Performance Summary

Model	Accuracy	Precision	Recall	F1 Score	Notes
Teacher (BERT base)	0.8711	0.9073	0.8267	0.8651	Full-sized model
Student (Distilled)	0.9267	0.9486	0.9022	0.9248	Smaller architecture
Student (Fine-tuned)	0.8620	-	-	-	Base Student model
Student (Quantized)	0.9156	0.9401	0.8908	0.9148	4-bit quantization

Pipeline Details

1. Teacher Model Fine-tuning

We start by fine-tuning a BERT base model on our classification task:

model_path = "google-bert/bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_path)
id2label = {0: "Safe", 1: "Not Safe"}
label2id = {"Safe": 0, "Not Safe": 1}
model = AutoModelForSequenceClassification.from_pretrained(model_path,
    num_labels=2,
    id2label=id2label,
    label2id=label2id,)

2. Knowledge Distillation

Knowledge distillation transfers the knowledge from the teacher model to a smaller student model:

from transformers import DistilBertForSequenceClassification, DistilBertConfig
config = DistilBertConfig(n_heads=8, n_layers=4)
student_model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased",
    config=config,).to(device)

The distillation process uses a combination of soft targets (teacher logits) and hard targets (true labels):

def distillation_loss(student_logits, teacher_logits, true_labels, temperature, alpha):
    soft_targets = nn.functional.softmax(teacher_logits / temperature, dim=1)
    student_soft = nn.functional.log_softmax(student_logits / temperature, dim=1)
    distill_loss = nn.functional.kl_div(student_soft, soft_targets, reduction='batchmean') * (temperature ** 2)
    hard_loss = nn.CrossEntropyLoss()(student_logits, true_labels)
    loss = alpha * distill_loss + (1.0 - alpha) * hard_loss
    return loss

3. Student Model Fine-tuning

After distillation, we fine-tune the base student model on the target dataset to compare performance with our distllation training performance.

4. Post-training Quantization

Finally, we apply 4-bit quantization using the BitsAndBytes library:

from transformers import BitsAndBytesConfig
import bitsandbytes as bnb

nf4_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True
)

model_nf4 = AutoModelForSequenceClassification.from_pretrained(model_id, 
                                                             device_map=device, 
                                                             quantization_config=nf4_config)

Usage

Start by running Finetuning_Teacher.ipynb to create a well-tuned teacher model
Run Distillation_on_student_model.ipynb to transfer knowledge to the student model
Run Finetuning_Student.ipynb for comaparitive study of fine-tuning on the student model
Run Post_Training_Quantisation_on_Student.ipynb to create a deployment-ready quantized model

Requirements

PyTorch
Transformers
BitsAndBytes
Scikit-learn
NumPy

Conclusion

This pipeline demonstrates how to effectively compress models through knowledge distillation and quantization while maintaining or even improving performance. The final quantized student model is significantly smaller than the original teacher model while maintaining comparable accuracy metrics, making it suitable for deployment in resource-constrained environments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Knowledge Distillation and Model Optimization Pipeline

Overview

Repository Structure

Performance Summary

Pipeline Details

1. Teacher Model Fine-tuning

2. Knowledge Distillation

3. Student Model Fine-tuning

4. Post-training Quantization

Usage

Requirements

Conclusion

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Distillation_on_student_model.ipynb		Distillation_on_student_model.ipynb
Finetuning_Student.ipynb		Finetuning_Student.ipynb
Finetuning_Teacher.ipynb		Finetuning_Teacher.ipynb
Post_Training_Quantisation_on_Student.ipynb		Post_Training_Quantisation_on_Student.ipynb
README.md		README.md
URL_Phising_classifier_report.pdf		URL_Phising_classifier_report.pdf

dark-horiznz/URL-Threat-Detection-using-Model-Distillation

Folders and files

Latest commit

History

Repository files navigation

Knowledge Distillation and Model Optimization Pipeline

Overview

Repository Structure

Performance Summary

Pipeline Details

1. Teacher Model Fine-tuning

2. Knowledge Distillation

3. Student Model Fine-tuning

4. Post-training Quantization

Usage

Requirements

Conclusion

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages