DLRM Quantization #158

Vasanta1 · 2023-11-24T06:28:56Z

I tried running IntelAI DLRM model with int8 precision with default int8_configure.json. Could someone clarify if quantization happens each time the inference_performance.sh script is triggered, or if the int8 weights are stored after the first run and reused for the later runs.
Currently, the run takes around 10 hours to complete on a 64 core machine. Please let me know if any additional info is required from my end.

sramakintel · 2024-03-25T17:42:55Z

@Vasanta1 per the line https://github.com/IntelAI/models/blob/r3.1/quickstart/recommendation/pytorch/dlrm/inference/cpu/inference_performance.sh#L67 the quantization happens each time the script is invoked. I want to make sure you are using our latest validated centos container for DLRM inference. Does it take the same amount of time with this container? https://github.com/IntelAI/models/blob/r3.1/quickstart/recommendation/pytorch/dlrm/inference/cpu/DEVCATALOG.md#pull-command

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DLRM Quantization #158

DLRM Quantization #158

Vasanta1 commented Nov 24, 2023

sramakintel commented Mar 25, 2024

DLRM Quantization #158

DLRM Quantization #158

Comments

Vasanta1 commented Nov 24, 2023

sramakintel commented Mar 25, 2024