Skip to content

Conversation

evkotov
Copy link
Contributor

@evkotov evkotov commented Sep 18, 2025

Details:

Problem:
Quantized models (i8/fp16 weights) were consuming excessive memory (up to 90GB) due to
ConstantFolding transformation converting compressed weights to fp32.

Root cause:

  1. EinsumDecomposition was not called in CPU pipeline before MarkDequantization
  2. MarkDequantization couldn't recognize decompression patterns with Einsum operations
  3. DisableDecompressionConvertConstantFolding was disabled, allowing unwanted conversions

Solution:

  1. Add EinsumDecomposition to decompression_handling_manager before MarkDequantization
    This allows proper pattern recognition for Einsum operations
  2. Keep DisableDecompressionConvertConstantFolding enabled (comment out the disable line)
    This preserves the protection against unwanted constant folding

Transformation pipeline flow:
Before fix:
MarkDequantization -> [Einsum blocks pattern] -> ConstantFolding converts to fp32

After fix:
EinsumDecomposition -> MarkDequantization -> [Pattern recognized] -> Constants preserved

Test results on einsum_model_with_fp16_i8:

  • Before: constants converted to fp32 (4x memory increase for i8)
  • After: constants remain in i8 format (1057 MB memory usage)

Both changes are required - applying only one results in incorrect behavior.

Tickets:

  • 165827

@evkotov evkotov self-assigned this Sep 18, 2025
@evkotov evkotov requested review from a team as code owners September 18, 2025 16:11
@evkotov evkotov added the category: transformations OpenVINO Runtime library - Transformations label Sep 18, 2025
@github-actions github-actions bot added category: CPU OpenVINO CPU plugin and removed category: transformations OpenVINO Runtime library - Transformations labels Sep 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: CPU OpenVINO CPU plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant