diff --git a/docs/apis.rst b/docs/apis.rst deleted file mode 100644 index 672e2b0..0000000 --- a/docs/apis.rst +++ /dev/null @@ -1,43 +0,0 @@ -Quark APIs -==== - -**User facing APIs:** - -Quark for Pytorch -~~~~~~~~~~~~~~~~~ - -.. toctree:: - :maxdepth: 1 - - Quantization - Export - Quantizer Configuration - Exporter Configuration - -.. - ------------ - -Quark for ONNX -~~~~~~~~~~~~~~~~~ - -.. toctree:: - :maxdepth: 1 - - Quantization - Optimization - Calibration - ONNX Quantizer - QDQ Quantizer - Configuration - Quantization Utilities - - - -.. - ------------ - - ##################################### - License - ##################################### - - Quark is licensed under MIT License. Refer to the LICENSE file for the full license text and copyright notice. \ No newline at end of file diff --git a/docs/example_gen.rst b/docs/example_gen.rst deleted file mode 100644 index 1820fba..0000000 --- a/docs/example_gen.rst +++ /dev/null @@ -1,32 +0,0 @@ -Examples -======== - -Quark for Pytorch ------------------ - -- `Language Model Quantization & - Export <./quark_torch_llm_example_gen.html>`__ - -- `Diffusion Model Quantization & - Export <./quark_torch_diffusers_example_gen.html>`__ - -- `Vision Model Quantization using Quark FX Graph - Mode <./quark_torch_vision_example_gen.html>`__ - -- `Extension for Pytorch-light(AMD internal - project) <./quark_torch_pytorch_light_example_gen.html>`__ - -- `Extension for Brevitas <./quark_torch_brevitas_example_gen.html>`__ - -Quark for ONNX --------------- - -- `Image Classification - Quantization <./quark_onnx_image_classification_example_gen.html>`__ - -.. raw:: html - - diff --git a/docs/getting_started.rst b/docs/getting_started.rst deleted file mode 100644 index c9c16f5..0000000 --- a/docs/getting_started.rst +++ /dev/null @@ -1,14 +0,0 @@ -Getting Started -=============== - -This page will introduce how to run Quark for the first time. - -- `Getting Started with Quark for PyTorch <./pytorch/getting_started.html>`__ -- `Getting Started with Quark for ONNX <./onnx/getting_started.html>`__ - -.. raw:: html - - diff --git a/docs/index.rst b/docs/index.rst index f130f06..41c6d41 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -9,16 +9,33 @@ Welcome to Quark's documentation! .. toctree:: :maxdepth: 1 - - What's New - Quark Overview + :caption: Release Notes + + Release V0.2.0 + +.. toctree:: + :maxdepth: 1 + :caption: Getting Started + Installation - Getting Started - Highlight Features - User Guide - APIs - Examples - Release Note + Quark Overview + +.. toctree:: + :maxdepth: 1 + :caption: PyTorch + + Quark with PyTorch + +.. toctree:: + :maxdepth: 1 + :caption: ONNX + + Quark with ONNX + +.. toctree:: + :maxdepth: 1 + :caption: FAQ + FAQ .. diff --git a/docs/onnx/index.rst b/docs/onnx/index.rst new file mode 100644 index 0000000..76fe8d5 --- /dev/null +++ b/docs/onnx/index.rst @@ -0,0 +1,20 @@ +Quark with ONNX! +=================== + +.. toctree:: + :maxdepth: 1 + + Getting Started + User Guide + Examples + APIs + Advanced Features + +.. + ------------ + + ##################################### + License + ##################################### + + Quark is licensed under MIT License. Refer to the LICENSE file for the full license text and copyright notice. diff --git a/docs/onnx/onnx_adv_features.rst b/docs/onnx/onnx_adv_features.rst new file mode 100644 index 0000000..a4e1177 --- /dev/null +++ b/docs/onnx/onnx_adv_features.rst @@ -0,0 +1,20 @@ +Advanced Features +================== + +This page introduces some key features of Quark. Please refere to the +`user guide <./user_guide.html>`__ for the more details of other features +of Quark. + + +Quark for ONNX +-------------- + +- `AdaRound and AdaQuant <./tutorial_adaround_adaquant.html>`__ +- `Mixed Precision <./tutorial_mix_precision.html>`__ + +.. raw:: html + + diff --git a/docs/onnx/onnx_apis.rst b/docs/onnx/onnx_apis.rst new file mode 100644 index 0000000..dd7448d --- /dev/null +++ b/docs/onnx/onnx_apis.rst @@ -0,0 +1,24 @@ +Quark APIs for ONNX +=================== + +**User facing APIs:** + +.. toctree:: + :maxdepth: 2 + + Quantization <../autoapi/quark/onnx/quantization/api/index.rst> + Optimization <../autoapi/quark/onnx/optimize/index.rst> + Calibration <../autoapi/quark/onnx/calibrate/index.rst> + ONNX Quantizer <../autoapi/quark/onnx/onnx_quantizer/index.rst> + QDQ Quantizer <../autoapi/quark/onnx/qdq_quantizer/index.rst> + Configuration <../autoapi/quark/onnx/quantization/config/config/index.rst> + Quantization Utilities <../autoapi/quark/onnx/quant_utils/index.rst> + +.. + ------------ + + ##################################### + License + ##################################### + + Quark is licensed under MIT License. Refer to the LICENSE file for the full license text and copyright notice. \ No newline at end of file diff --git a/docs/example.rst b/docs/onnx/onnx_examples.rst similarity index 51% rename from docs/example.rst rename to docs/onnx/onnx_examples.rst index d031bc1..40576b7 100644 --- a/docs/example.rst +++ b/docs/onnx/onnx_examples.rst @@ -1,18 +1,7 @@ Examples ======== -Quark for Pytorch ------------------ - -* `Language Model Quantization & Export <./quark_example_torch_llm_gen.html>`__ -* `Diffusion Model Quantization & Export <./quark_example_torch_diffusers_gen.html>`__ -* `Vision Model Quantization using Quark FX Graph Mode <./quark_example_torch_vision_gen.html>`__ -* `Extension for Pytorch-light (AMD internal project) <./quark_example_torch_pytorch_light_gen.html>`__ -* `Extension for Brevitas <./quark_example_torch_brevitas_gen.html>`__ - - -Quark for ONNX --------------- +Examples to run Quark for ONNX. * `Image Classification Quantization <./quark_example_onnx_image_classification_gen.html>`__ * `Fast Finetune AdaRound <./quark_examples_onnx_adaround_gen.html>`__ diff --git a/docs/quark_example_onnx_cle_gen.rst b/docs/onnx/quark_example_onnx_cle_gen.rst similarity index 100% rename from docs/quark_example_onnx_cle_gen.rst rename to docs/onnx/quark_example_onnx_cle_gen.rst diff --git a/docs/quark_example_onnx_image_classification_gen.rst b/docs/onnx/quark_example_onnx_image_classification_gen.rst similarity index 100% rename from docs/quark_example_onnx_image_classification_gen.rst rename to docs/onnx/quark_example_onnx_image_classification_gen.rst diff --git a/docs/quark_examples_onnx_adaround_gen.rst b/docs/onnx/quark_examples_onnx_adaround_gen.rst similarity index 100% rename from docs/quark_examples_onnx_adaround_gen.rst rename to docs/onnx/quark_examples_onnx_adaround_gen.rst diff --git a/docs/quark_onnx_example_mixed_precision_gen.rst b/docs/onnx/quark_onnx_example_mixed_precision_gen.rst similarity index 100% rename from docs/quark_onnx_example_mixed_precision_gen.rst rename to docs/onnx/quark_onnx_example_mixed_precision_gen.rst diff --git a/docs/quark_onnx_image_classification_example_gen.rst b/docs/onnx/quark_onnx_image_classification_example_gen.rst similarity index 100% rename from docs/quark_onnx_image_classification_example_gen.rst rename to docs/onnx/quark_onnx_image_classification_example_gen.rst diff --git a/docs/onnx/user_guide.rst b/docs/onnx/user_guide.rst new file mode 100644 index 0000000..0bc6f17 --- /dev/null +++ b/docs/onnx/user_guide.rst @@ -0,0 +1,27 @@ +Quark for ONNX +============== + +There are several steps to quantize a floating-point model with +``Quark for ONNX``: + +1. Load original float model +2. Set quantization configuration +3. Define datareader +4. Use the Quark API to perform in-place replacement of the model's modules with quantized module. + +More details: + +* `Configuring Quark for ONNX <./user_guide_config_description.html>`__ +* `Adding Calibration Datasets <./user_guide_datareader.html>`__ +* `Feature Description <./user_guide_feature_description.html>`__ +* `Supported Datatype and OpType <./user_guide_supported_optype_datatype.html>`__ +* `Accuracy Improvement <./user_guide_accuracy_improvement.html>`__ +* `Optional Utilities <./user_guide_optional_utilities.html>`__ +* `Tools <./user_guide_tools.html>`__ + +.. raw:: html + + diff --git a/docs/onnx_overview.rst b/docs/onnx_overview.rst new file mode 100644 index 0000000..6b45d5f --- /dev/null +++ b/docs/onnx_overview.rst @@ -0,0 +1,19 @@ +ONNX +==== + +.. toctree:: + :maxdepth: 1 + + Getting Started + User Guide + Examples + APIs + Advanced Features + + +.. raw:: html + + \ No newline at end of file diff --git a/docs/pytorch/index.rst b/docs/pytorch/index.rst new file mode 100644 index 0000000..edd3056 --- /dev/null +++ b/docs/pytorch/index.rst @@ -0,0 +1,23 @@ +Quark with PyTorch! +=================== + +**Quark** is a deep learning model quantization toolkit for quantizing models from PyTorch, ONNX and other frameworks. +It provides easy-to-use APIs for quantization and more advanced features than native frameworks, in support for multiple HW backends. + +.. toctree:: + :maxdepth: 1 + + Getting Started + User Guide + Examples + APIs + Advanced Features + +.. + ------------ + + ##################################### + License + ##################################### + + Quark is licensed under MIT License. Refer to the LICENSE file for the full license text and copyright notice. diff --git a/docs/highlight_features.rst b/docs/pytorch/pytorch_adv_features.rst similarity index 53% rename from docs/highlight_features.rst rename to docs/pytorch/pytorch_adv_features.rst index 0a03e06..6d12364 100644 --- a/docs/highlight_features.rst +++ b/docs/pytorch/pytorch_adv_features.rst @@ -1,4 +1,4 @@ -Highlight Features +Advanced Features ================== This page introduces some key features of Quark. Please refere to the @@ -8,14 +8,10 @@ of Quark. Quark for PyTorch ----------------- -- `Bridge from Quark to llama.cpp <./pytorch/tutorial_gguf.html>`__ -- `Using MX (Microscaling) with Quark <./pytorch/tutorial_mx.html>`__ +- `Bridge from Quark to llama.cpp <./tutorial_gguf.html>`__ +- `Using MX (Microscaling) with Quark <./tutorial_mx.html>`__ -Quark for ONNX --------------- -- `AdaRound and AdaQuant <./onnx/tutorial_adaround_adaquant.html>`__ -- `Mixed Precision <./onnx/tutorial_mix_precision.html>`__ .. raw:: html diff --git a/docs/quark_example_onnx_adaquant_gen.rst b/docs/pytorch/quark_example_onnx_adaquant_gen.rst similarity index 100% rename from docs/quark_example_onnx_adaquant_gen.rst rename to docs/pytorch/quark_example_onnx_adaquant_gen.rst diff --git a/docs/quark_example_torch_brevitas_gen.rst b/docs/pytorch/quark_example_torch_brevitas_gen.rst similarity index 100% rename from docs/quark_example_torch_brevitas_gen.rst rename to docs/pytorch/quark_example_torch_brevitas_gen.rst diff --git a/docs/quark_example_torch_diffusers_gen.rst b/docs/pytorch/quark_example_torch_diffusers_gen.rst similarity index 100% rename from docs/quark_example_torch_diffusers_gen.rst rename to docs/pytorch/quark_example_torch_diffusers_gen.rst diff --git a/docs/quark_example_torch_llm_gen.rst b/docs/pytorch/quark_example_torch_llm_gen.rst similarity index 100% rename from docs/quark_example_torch_llm_gen.rst rename to docs/pytorch/quark_example_torch_llm_gen.rst diff --git a/docs/quark_example_torch_pytorch_light_gen.rst b/docs/pytorch/quark_example_torch_pytorch_light_gen.rst similarity index 100% rename from docs/quark_example_torch_pytorch_light_gen.rst rename to docs/pytorch/quark_example_torch_pytorch_light_gen.rst diff --git a/docs/quark_example_torch_vision_gen.rst b/docs/pytorch/quark_example_torch_vision_gen.rst similarity index 100% rename from docs/quark_example_torch_vision_gen.rst rename to docs/pytorch/quark_example_torch_vision_gen.rst diff --git a/docs/quark_torch_brevitas_example_gen.rst b/docs/pytorch/quark_torch_brevitas_example_gen.rst similarity index 100% rename from docs/quark_torch_brevitas_example_gen.rst rename to docs/pytorch/quark_torch_brevitas_example_gen.rst diff --git a/docs/quark_torch_diffusers_example_gen.rst b/docs/pytorch/quark_torch_diffusers_example_gen.rst similarity index 98% rename from docs/quark_torch_diffusers_example_gen.rst rename to docs/pytorch/quark_torch_diffusers_example_gen.rst index 78d8854..7229fc2 100644 --- a/docs/quark_torch_diffusers_example_gen.rst +++ b/docs/pytorch/quark_torch_diffusers_example_gen.rst @@ -40,6 +40,7 @@ Run with SDXL Without Quantization ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Run original SDXL: + -------------------------------------- .. code:: @@ -51,6 +52,7 @@ Calibration and Export SafeTensor ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Run Calibration: + -------------------------------------- .. code:: @@ -61,6 +63,7 @@ Load SafeTensor and Test ~~~~~~~~~~~~~~~~~~~~~~~~ - Load and Test: + -------------------------------------- .. code:: @@ -68,9 +71,10 @@ Load SafeTensor and Test python quantize_sdxl.py --input_scheme {'per-tensor'} --weight_scheme {'per-tensor', 'per-channel'} --test_data_tsv_file_path {your calibration dataset file path} --load --test Load SafeTensor and Run with a prompt -~~~~~~~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Load and Run: + -------------------------------------- .. code:: diff --git a/docs/quark_torch_llm_example_gen.rst b/docs/pytorch/quark_torch_llm_example_gen.rst similarity index 100% rename from docs/quark_torch_llm_example_gen.rst rename to docs/pytorch/quark_torch_llm_example_gen.rst diff --git a/docs/quark_torch_pytorch_light_example_gen.rst b/docs/pytorch/quark_torch_pytorch_light_example_gen.rst similarity index 100% rename from docs/quark_torch_pytorch_light_example_gen.rst rename to docs/pytorch/quark_torch_pytorch_light_example_gen.rst diff --git a/docs/quark_torch_vision_example_gen.rst b/docs/pytorch/quark_torch_vision_example_gen.rst similarity index 100% rename from docs/quark_torch_vision_example_gen.rst rename to docs/pytorch/quark_torch_vision_example_gen.rst diff --git a/docs/pytorch/user_guide.rst b/docs/pytorch/user_guide.rst new file mode 100644 index 0000000..c1b5b0e --- /dev/null +++ b/docs/pytorch/user_guide.rst @@ -0,0 +1,26 @@ +Quark for PyTorch +================= + +There are several steps to quantize a floating-point model with +``Quark for PyTorch``: + +1. Load original float model +2. Set quantization configuration +3. Define dataloader +4. Use the Quark API to perform in-place replacement of the model's modules with quantized module. +5. (Optional) Export quantized model to other format such as ONNX + +More details: + +* `Configuring Quark for PyTorch <./user_guide_config_description.html>`__ +* `Adding Calibration Datasets <./user_guide_dataloader.html>`__ +* `Exporting for ONNX & Json-Safetensors & GGUF <./user_guide_exporting.html>`__ +* `Feature Description <./user_guide_feature_description.html>`__ + + +.. raw:: html + + diff --git a/docs/pytorch_overview.rst b/docs/pytorch_overview.rst new file mode 100644 index 0000000..4600194 --- /dev/null +++ b/docs/pytorch_overview.rst @@ -0,0 +1,18 @@ +PyTorch +======= + +.. toctree:: + :maxdepth: 1 + + Getting Started + User Guide + Examples + APIs + Advanced Features + +.. raw:: html + + diff --git a/docs/tutorial.rst b/docs/tutorial.rst deleted file mode 100644 index 8026cd5..0000000 --- a/docs/tutorial.rst +++ /dev/null @@ -1,15 +0,0 @@ -Tutorial -======== - -Quark for PyTorch ------------------ - -- `Bridge from Quark to llama.cpp <./pytorch/tutorial_gguf.html>`__ -- `Using MX (Microscaling) with Quark <./pytorch/tutorial_mx.html>`__ - -.. raw:: html - - diff --git a/docs/user_guide.rst b/docs/user_guide.rst deleted file mode 100644 index 37ccfce..0000000 --- a/docs/user_guide.rst +++ /dev/null @@ -1,49 +0,0 @@ -User Guide -========== - -Quark for PyTorch ------------------ - -There are several steps to quantize a floating-point model with -``Quark for PyTorch``: - -1. Load original float model -2. Set quantization configuration -3. Define dataloader -4. Use the Quark API to perform in-place replacement of the model's modules with quantized module. -5. (Optional) Export quantized model to other format such as ONNX - -More details: - -* `Configuring Quark for PyTorch <./pytorch/user_guide_config_description.html>`__ -* `Adding Calibration Datasets <./pytorch/user_guide_dataloader.html>`__ -* `Exporting for ONNX & Json-Safetensors & GGUF <./pytorch/user_guide_exporting.html>`__ -* `Feature Description <./pytorch/user_guide_feature_description.html>`__ - -Quark for ONNX --------------- - -There are several steps to quantize a floating-point model with -``Quark for ONNX``: - -1. Load original float model -2. Set quantization configuration -3. Define datareader -4. Use the Quark API to perform in-place replacement of the model's modules with quantized module. - -More details: - -* `Configuring Quark for ONNX <./onnx/user_guide_config_description.html>`__ -* `Adding Calibration Datasets <./onnx/user_guide_datareader.html>`__ -* `Feature Description <./onnx/user_guide_feature_description.html>`__ -* `Supported Datatype and OpType <./onnx/user_guide_supported_optype_datatype.html>`__ -* `Accuracy Improvement <./onnx/user_guide_accuracy_improvement.html>`__ -* `Optional Utilities <./onnx/user_guide_optional_utilities.html>`__ -* `Tools <./onnx/user_guide_tools.html>`__ - -.. raw:: html - - diff --git a/docs/whats_new.rst b/docs/whats_new.rst deleted file mode 100644 index 60f8cb2..0000000 --- a/docs/whats_new.rst +++ /dev/null @@ -1,73 +0,0 @@ -What's New -========== - -New Features (Version 0.2.0) ----------------------------- - -- **Quark for PyTorch** - - - **PyTorch Quantizer Enhancements**: - - - Post Training Quantization (PTQ) and Quantization-Aware Training (QAT) are now supported in FX graph mode. - - Introduced quantization support of the following modules: torch.nn.Conv2d. - - - **Data Types**: - - - `OCP Microscaling (MX) is supported. Valid element data types include INT8, FP8_E4M3, FP4, FP6_E3M2, and FP6_E2M3. <./pytorch/tutorial_mx.html>`__ - - - **Export Capabilities**: - - - `Quantized models can now be exported in GGUF format. The exported GGUF model is runnable with llama.cpp. Only Llama2 is supported for now. <./pytorch/tutorial_gguf.html>`__ - - Introduced Quark's native Json-Safetensors export format, which is identical to AutoFP8 and AutoAWQ when used for FP8 and AWQ quantization. - - - **Model Support**: - - - Added support for SDXL model quantization in eager mode, including fp8 per-channel and per-tensor quantization. - - Added support for PTQ and QAT of CNN models in graph mode, including architectures like ResNet. - - - **Integration with other toolkits**: - - - Provided the integrated example with APL(AMD Pytorch-light,internal project name), supporting the invocation of APL's INT-K, BFP16, and BRECQ. - - Introduced the experimental Quark extension interface, enabling seamless integration of Brevitas for Stable Diffusion and Imagenet classification model quantization. - -- **Quark for ONNX** - - - **ONNX Quantizer Enhancements**: - - - Multiple optimization and refinement strategies for different deployment backends. - - Supported automatic mixing precision to balance accuracy and performance. - - - **Quantization Strategy**: - - - Supported symmetric and asymmetric quantization. - - Supported float scale, INT16 scale and power-of-two scale. - - Supported static quantization and weight-only quantization. - - - **Quantization Granularity**: - - - Supported for per-tensor and per-channel granularity. - - - **Data Types**: - - - Multiple data types are supported, including INT32/UINT32, Float16, Bfloat16, INT16/UINT16, INT8/UINT8 and BFP. - - - **Calibration Methods**: - - - MinMax, Entropy and Percentile for float scale. - - MinMax for INT16 scale. - - NonOverflow and MinMSE for power-of-two scale. - - - **Custom operations**: - - - "BFPFixNeuron" which supports block floating-point data type. - - "VitisQuantizeLinear" and "VitisDequantizeLinear" which support INT32/UINT32, Float16, Bfloat16, INT16/UINT16 quantization. - - "VitisInstanceNormalization" and "VitisLSTM" which have customized Bfloat16 kernels. - - All custom operations only support running on CPU. - - - **Advanced Quantization Algorithms**: - - - Supported CLE, BiasCorrection, AdaQuant, AdaRound and SmoothQuant. - - - **Operating System Support**: - - - Linux and Windows.