Hello all,
Can Modelopt enable the wikitext-like task-based accuracy test for the quantized output model for NVFP4?
The export model exists some shape fusion due to the pack mechanism for 2 FP4 into 1 INT8, differing from the original model structure.
How can LLM_eval support the quantized model?
//==================================//
python lm_eval_hf.py
--model hf
--model_args pretrained=
--quant_cfg NVFP4_DEFAULT_CFG \
--tasks wikitext
--batch_size 4
Does the cmd shown above support the exported model test?