-
1.1 Distillation
-
2.1 Pytorch Script
2.3 Create an Instance of Metric
2.4 Create an Instance of Criterion(Optional)
Distillation is a widely-used approach to perform network compression, which transfers knowledge from a large model to a smaller one without significant loss of validity. As smaller models are less expensive to evaluate, they can be deployed on less powerful hardware (such as a mobile device). Graph shown below is the workflow of the distillation, the teacher model will take the same input that feed into the student model to produce the output that contains knowledge of the teacher model to instruct the student model.
Knowledge distillation is proposed in Distilling the Knowledge in a Neural Network. It leverages the logits (the input of softmax in the classification tasks) of teacher and student model to minimize the the difference between their predicted class distributions, this can be done by minimizing the below loss function.
Where
There are more information contained in the teacher model beside its logits, for example, the output features of the teacher model's intermediate layers often been used to guide the student model, as in Patient Knowledge Distillation for BERT Model Compression and MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices. The general loss function for this approach can be summarized as follow.
Where
from intel_extension_for_transformers.transformers import metric, objectives, DistillationConfig, Criterion
from intel_extension_for_transformers.transformers.trainer import NLPTrainer
# Replace transformers.Trainer with NLPTrainer
# trainer = transformers.Trainer(......)
trainer = NLPTrainer(......)
metric = metrics.Metric(name="eval_accuracy")
d_conf = DistillationConfig(metrics=tune_metric)
model = trainer.distill(
distillation_config=d_conf, teacher_model=teacher_model
)
Please refer to example for the details.
from intel_extension_for_transformers.transformers import (DistillationConfig, metrics)
from intel_extension_for_transformers.transformers.distillation import Criterion
optimizer = TFOptimization(...)
metric_ = metrics.Metric(name="eval_accuracy")
criterion = Criterion(name='KnowledgeLoss',
layer_mappings=[['classifier', 'classifier']],
loss_types=['CE', 'CE'],
loss_weight_ratio=[0.5, 0.5],
add_origin_loss=False)
distillation_conf = DistillationConfig(metrics=metric_,
criterion=criterion)
distilled_model = optimizer.distill(
distillation_config=distillation_conf,
teacher_model=teacher_model)
Please refer to example for the details.
The Metric defines which metric will be used to measure the performance of tuned models.
-
example:
metric = metrics.Metric(name="eval_accuracy")
Please refer to metrics document for the details.
The criterion used in training phase.
-
arguments:
Argument Type Description Default value name String Name of criterion, like:"KnowledgeLoss", "IntermediateLayersLoss" "KnowledgeLoss" temperature Float parameter for KnowledgeDistillationLoss 1.0 loss_types List of string Type of loss ['CE', 'CE'] loss_weight_ratio List of float weight ratio of loss [0.5, 0.5] layer_mappings List parameter for IntermediateLayersLoss [] add_origin_loss bool parameter for IntermediateLayersLoss False -
example:
criterion = Criterion(name='KnowledgeLoss')
The DistillationConfig contains all the information related to the model distillation behavior. If you created Metric and Criterion instance, then you can create an instance of DistillationConfig. Metric and pruner_config is optional.
-
arguments:
Argument Type Description Default value framework string which framework you used "pytorch" criterion Criterion criterion of training "KnowledgeLoss" metrics Metric Used to evaluate accuracy of tuning model, no need for NoTrainerOptimizer None -
example:
d_conf = DistillationConfig(metrics=metric, criterion=criterion)
- Distill with Trainer
NLPTrainer inherits from transformers.Trainer, so you can create a trainer as in examples of Transformers. Then you can distill model with trainer.distill function.
model = trainer.distill( distillation_config=d_conf, teacher_model=teacher_model )