Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md

Repository files navigation

Awesome-Knowledge-Distillation-For-Transformers

Papers for knowledge distillation for NLP and ASR (mainly focus on BERT-like models).

🌟 represents important papers.

📕 represents NLP.

🎵 represents ASR.

basic papers

Response-Based Knowledge Distilling the Knowledge in a Neural Network
Feature-Based Knowledge FitNets: Hints for Thin Deep Nets
Relation-Based Knowledge A gift from knowledge distillation: Fast optimization, network minimization and transfer learning

2019

📕 PKD: Patient Knowledge Distillation for BERT Model Compression
🌟📕 DistilBERT: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
🌟📕 TinyBERT: TinyBERT: Distilling BERT for Natural Language Understanding
📕 IR-KD: Knowledge Distillation from Internal Representations

2020

📕 MobileBERT: MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
📕 CKD: Why Skip If You Can Combine: A Simple Knowledge Distillation Technique for Intermediate Layers
📕 ALP-KD: ALP-KD: Attention-Based Layer Projection for Knowledge Distillation
📕 Co-DIR: Contrastive Distillation on Intermediate Representations for Language Model Compression

2021

🌟🎵 DistilHubert: DistilHuBERT: Speech Representation Learning by Layer-wise Distillation of Hidden-unit BERT
📕 RAIL-KD: RAIL-KD: RAndom Intermediate Layer Mapping for Knowledge Distillation

2022

📕 CoFi: Structured Pruning Learns Compact and Accurate Models
🎵 FitHubert: FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech Self-Supervised Learning
🎵 LightHubert: LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT

About

KD algorithm for SSL models in NLP and ASR

Report repository

Releases

No releases published

Packages

No packages published