Diffusion Meets Few-shot Class Incremental Learning

Junsu Kim^1,2*, Yunhoe Ku³, Dongyoon Han^2†, Seungryul Baek^1†
_{*Work done during an internship at NAVER AI Lab; † corresponding authors}
_{¹UNIST, ²NAVER AI Lab, ³DeepBrain AI}

TL;DR

Our research introduces a novel method for Few-Shot Class-Incremental Learning (FSCIL) that repurposes the text-to-image diffusion model—Stable Diffusion—as a frozen generative backbone. We extract multi-scale features via both inversion (forward diffusion) and generation (reverse diffusion) processes, apply class-specific prompt tuning, and incorporate noise-augmented feature replay. This approach not only achieves state-of-the-art performance on benchmarks such as CUB-200, miniImageNet, and CIFAR-100, but it also mitigates catastrophic forgetting while maintaining computational efficiency. Unlike conventional methods that rely on large-scale supervised pre-training and modifications to backbone architectures, our design preserves the original structure of the diffusion model, providing robust feature representations even for sparsely sampled new classes.

Abstract

Few-shot class-incremental learning (FSCIL) is challenging due to extremely limited training data; while aiming to reduce catastrophic forgetting and learn new information. We propose Diffusion-FSCIL, a novel approach that employs a text-to-image diffusion model as a frozen backbone. Our conjecture is that FSCIL can be tackled using a large generative model’s capabilities benefiting from 1) generation ability via large-scale pre-training; 2) multi-scale representation; 3) representational flexibility through the text encoder. To maximize the representation capability, we propose to extract multiple complementary diffusion features to play roles as latent replay with slight support from feature distillation for preventing generative biases. Our framework realizes efficiency through 1) using a frozen backbone; 2) minimal trainable components; 3) batch processing of multiple feature extractions. Extensive experiments on CUB-200, miniImageNet, and CIFAR-100 show that Diffusion-FSCIL surpasses state-of-the-art methods, preserving performance on previously learned classes and adapting effectively to new ones.

Main figure:

Core Contributions

Generative Backbone for FSCIL
We introduce a novel framework that uses Stable Diffusion as a frozen feature extractor—moving away from traditional discriminative approaches.
Multi-Scale Feature Extraction
We systematically extract features at multiple decoder layers in both inversion (forward diffusion) and generation (reverse diffusion) steps, yielding rich, complementary representations.
Class-Specific Prompt Tuning
Instead of generic text prompts, we optimize prompts per class to capture fine-grained characteristics. This improves both replay effectiveness and representation quality.
Noise-Augmented Feature Replay
We inject partial noise in the latent space to create augmented features, striking a balance between fidelity (to original data) and diversity (helping generalization).
SOTA Performance & Efficiency
Our framework outperforms existing methods on CUB-200, miniImageNet, and CIFAR-100 FSCIL tasks. Despite using a generative model, it remains computationally efficient by freezing most diffusion parameters.

Motivation: Bridging Generative Strengths and FSCIL

FSCIL demands robust feature expressiveness to handle few-shot data while preventing forgetting.
Discriminative backbones (e.g., ResNet, ViTs, CLIP) often lack the multi-scale, dense representations that come “for free” from diffusion models.
Naively using Stable Diffusion (SD) does not lead to effective FSCIL, we began with this observation (see the following figure):
- SD itself works better than ImageNet-pre-trained ResNet.
- SD with popular bells and whistles such as 1) generative replays; 2) LoRA does not improve the FSCIL performance
By smartly leveraging SD’s semantic understanding, our method could retain old knowledge while flexibly adapting to new classes—achieving less forgetting and better generalization.

Updates

2025/03/31: 🎉 Preprint has been uploaded on ArXiv 🎉.

(Stay tuned for open-source code and trained weights!)

How to Cite

@article{kim2025diffusionfscil,
  title   = {Diffusion Meets Few-shot Class Incremental Learning},
  author  = {Kim, Junsu and Ku, Yunhoe and Han, Dongyoon and Baek, Seungryul},
  journal = {arXiv preprint arXiv:2503.23402},
  year    = {2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Diffusion Meets Few-shot Class Incremental Learning

TL;DR

Abstract

Main figure:

Core Contributions

Motivation: Bridging Generative Strengths and FSCIL

Updates

How to Cite

About

Uh oh!

Releases

Packages

naver-ai/diffusion-fscil

Folders and files

Latest commit

History

Repository files navigation

Diffusion Meets Few-shot Class Incremental Learning

TL;DR

Abstract

Main figure:

Core Contributions

Motivation: Bridging Generative Strengths and FSCIL

Updates

How to Cite

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages