[TESTS] Use FP32 inference precision, FP16 KV cache precision for pipelines #1485

ilya-lavrenov · 2025-01-06T13:37:35Z

OpenVINO plugins enable different kind of optimizations by default like KV cache compression to int8, fp16 inference precision, while in GenAI tests we want to test pipelines and how they are compared against HF / optimum w/o extra optimizations:

openvino.genai/tests/python_tests/common.py

Lines 318 to 325 in 4db67ae

    
           def get_default_properties(): 
        
               import openvino.properties.hint as hints 
        
               import openvino as ov 
        
               return { 
        
                   hints.inference_precision : ov.Type.f32, 
        
                   hints.kv_cache_precision : ov.Type.f16, 
        
               }

Hopefully, we can merge int8 KV cache by default for CB then #1206, because in tests we will still compare FP16 KV cache, while official Validation should be responsible for validation against reference via WWB metrics.

… pipelines

ilya-lavrenov added this to the 2025.0 milestone Jan 6, 2025

ilya-lavrenov assigned Wovchena Jan 6, 2025

github-actions bot added category: visual language Visual language pipeline category: continuous batching Continuous batching category: whisper Whisper pipeline no-match-files labels Jan 6, 2025

ilya-lavrenov mentioned this pull request Jan 6, 2025

[CPU] Change kvcache default type of PagedAttention to u8 for CPU plugin #1206

Open

Wovchena approved these changes Jan 6, 2025

View reviewed changes

ilya-lavrenov force-pushed the default-config branch from 4db67ae to 6c9b719 Compare January 6, 2025 15:10

github-actions bot added the category: Python API Python API for GenAI label Jan 6, 2025

ilya-lavrenov enabled auto-merge January 6, 2025 15:11

ilya-lavrenov force-pushed the default-config branch 4 times, most recently from a03d7cf to f86a642 Compare January 6, 2025 18:23

github-actions bot removed the category: whisper Whisper pipeline label Jan 6, 2025

ilya-lavrenov disabled auto-merge January 6, 2025 18:26

ilya-lavrenov force-pushed the default-config branch from f86a642 to 1e8757a Compare January 6, 2025 18:34

github-actions bot added the category: samples GenAI samples label Jan 6, 2025

[TESTS] Use FP32 inference precision, FP16 KV cache precision for all…

b6264d8

… pipelines

ilya-lavrenov force-pushed the default-config branch from 1e8757a to b6264d8 Compare January 6, 2025 18:55

ilya-lavrenov enabled auto-merge January 6, 2025 18:55

ilya-lavrenov added this pull request to the merge queue Jan 6, 2025

ilya-lavrenov removed this pull request from the merge queue due to a manual request Jan 6, 2025

ilya-lavrenov merged commit 48dfd16 into openvinotoolkit:master Jan 6, 2025
59 checks passed

ilya-lavrenov deleted the default-config branch January 6, 2025 20:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TESTS] Use FP32 inference precision, FP16 KV cache precision for pipelines #1485

[TESTS] Use FP32 inference precision, FP16 KV cache precision for pipelines #1485

ilya-lavrenov commented Jan 6, 2025 •

edited

Loading

	def get_default_properties():
	import openvino.properties.hint as hints
	import openvino as ov

	return {
	hints.inference_precision : ov.Type.f32,
	hints.kv_cache_precision : ov.Type.f16,
	}

[TESTS] Use FP32 inference precision, FP16 KV cache precision for pipelines #1485

[TESTS] Use FP32 inference precision, FP16 KV cache precision for pipelines #1485

Conversation

ilya-lavrenov commented Jan 6, 2025 • edited Loading

ilya-lavrenov commented Jan 6, 2025 •

edited

Loading