Fix memory consumption issue with quantized Gemini Nano2 models on CPU #32149

evkotov · 2025-09-18T16:11:41Z

Details:

Problem:
Quantized models (i8/fp16 weights) were consuming excessive memory (up to 90GB) due to
ConstantFolding transformation converting compressed weights to fp32.

Root cause:

EinsumDecomposition was not called in CPU pipeline before MarkDequantization
MarkDequantization couldn't recognize decompression patterns with Einsum operations
DisableDecompressionConvertConstantFolding was disabled, allowing unwanted conversions

Solution:

Add EinsumDecomposition to decompression_handling_manager before MarkDequantization
This allows proper pattern recognition for Einsum operations
Keep DisableDecompressionConvertConstantFolding enabled (comment out the disable line)
This preserves the protection against unwanted constant folding

Transformation pipeline flow:
Before fix:
MarkDequantization -> [Einsum blocks pattern] -> ConstantFolding converts to fp32

After fix:
EinsumDecomposition -> MarkDequantization -> [Pattern recognized] -> Constants preserved

Test results on einsum_model_with_fp16_i8:

Before: constants converted to fp32 (4x memory increase for i8)
After: constants remain in i8 format (1057 MB memory usage)

Both changes are required - applying only one results in incorrect behavior.

Tickets:

165827

Prevent i8/fp16 to fp32 constant folding in CPU pipeline for Einsum ops

1cb188a

evkotov requested review from mryzhov and CuriousPanCake September 18, 2025 16:11

evkotov self-assigned this Sep 18, 2025

evkotov requested review from a team as code owners September 18, 2025 16:11

evkotov added the category: transformations OpenVINO Runtime library - Transformations label Sep 18, 2025

github-actions bot added category: CPU OpenVINO CPU plugin and removed category: transformations OpenVINO Runtime library - Transformations labels Sep 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix memory consumption issue with quantized Gemini Nano2 models on CPU #32149

Fix memory consumption issue with quantized Gemini Nano2 models on CPU #32149

evkotov commented Sep 18, 2025

Uh oh!

Uh oh!

Fix memory consumption issue with quantized Gemini Nano2 models on CPU #32149

Are you sure you want to change the base?

Fix memory consumption issue with quantized Gemini Nano2 models on CPU #32149

Conversation

evkotov commented Sep 18, 2025

Details:

Tickets:

Uh oh!

Uh oh!