Skip to content
This repository has been archived by the owner on Oct 25, 2024. It is now read-only.

Commit

Permalink
update Readme (#57)
Browse files Browse the repository at this point in the history
  • Loading branch information
VincyZhang authored Apr 4, 2023
1 parent c07cc42 commit 755ba0e
Show file tree
Hide file tree
Showing 6 changed files with 31 additions and 18 deletions.
23 changes: 18 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,27 @@
# Intel® Extension for Transformers: Accelerating Transformer-based Models on Intel Platforms
Intel® Extension for Transformers is an innovative toolkit to accelerate Transformer-based models on Intel platforms, in particular effective on 4th Intel Xeon Scalable processor Sapphire Rapids (codenamed [Sapphire Rapids](https://www.intel.com/content/www/us/en/products/docs/processors/xeon-accelerated/4th-gen-xeon-scalable-processors.html)). The toolkit provides the key features and examples as below:
<div align="center">

Intel® Extension for Transformers
===========================
<h3> An innovative toolkit to accelerate Transformer-based models on Intel platforms</h3>

[Architecture](./docs/architecture.md)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[NeuralChat](./examples/optimization/pytorch/huggingface/language-modeling/chatbot)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[Examples](./docs/examples.md)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[Documentations](https://intel.github.io/intel-extension-for-transformers/latest/docs/Welcome.html)
</div>

* Seamless user experience of model compressions on Transformers-based models by extending [Hugging Face transformers](https://github.com/huggingface/transformers) APIs and leveraging [Intel® Neural Compressor](https://github.com/intel/neural-compressor)
---
<div align="left">

Intel® Extension for Transformers is an innovative toolkit to accelerate Transformer-based models on Intel platforms, in particular effective on 4th Intel Xeon Scalable processor Sapphire Rapids (codenamed [Sapphire Rapids](https://www.intel.com/content/www/us/en/products/docs/processors/xeon-accelerated/4th-gen-xeon-scalable-processors.html)). The toolkit provides the below key features and examples:


* Seamless user experience of model compressions on Transformer-based models by extending [Hugging Face transformers](https://github.com/huggingface/transformers) APIs and leveraging [Intel® Neural Compressor](https://github.com/intel/neural-compressor)


* Advanced software optimizations and unique compression-aware runtime (released with NeurIPS 2022's paper [Fast Distilbert on CPUs](https://arxiv.org/abs/2211.07715) and [QuaLA-MiniLM: a Quantized Length Adaptive MiniLM](https://arxiv.org/abs/2210.17114), and NeurIPS 2021's paper [Prune Once for All: Sparse Pre-Trained Language Models](https://arxiv.org/abs/2111.05754))


* Accelerated end-to-end Transformer-based applications such as [Stable Diffusion](./examples/optimization/pytorch/huggingface/textual_inversion), [GPT-J-6B](./examples/optimization/pytorch/huggingface/language-modeling/inference/README.md#GPT-J), [BLOOM-176B](./examples/optimization/pytorch/huggingface/language-modeling/inference/README.md#BLOOM-176B), [T5](https://github.com/intel/intel-extension-for-transformers/blob/main/examples/optimization/pytorch/huggingface/summarization/quantization), and [SetFit](./docs/tutorials/pytorch/text-classification/SetFit_model_compression_AGNews.ipynb) by leveraging Intel AI software such as [Intel® Extension for PyTorch](https://github.com/intel/intel-extension-for-pytorch)
* Optimized Transformer-based model packages such as [Stable Diffusion](https://github.com/intel/intel-extension-for-transformers/tree/main/examples/deployment/neural_engine/stable_diffusion), [GPT-J-6B](https://github.com/intel/intel-extension-for-transformers/tree/main/examples/deployment/neural_engine/gpt-j), [GPT-NEOX](https://github.com/intel/intel-extension-for-transformers/tree/main/examples/optimization/pytorch/huggingface/language-modeling/quantization/inc#2-validated-model-list), [BLOOM-176B](./examples/optimization/pytorch/huggingface/language-modeling/inference/README.md#BLOOM-176B), [T5](https://github.com/intel/intel-extension-for-transformers/tree/main/examples/optimization/pytorch/huggingface/summarization/quantization#2-validated-model-list), [Flan-T5](https://github.com/intel/intel-extension-for-transformers/tree/main/examples/optimization/pytorch/huggingface/summarization/quantization#2-validated-model-list) and end-to-end workflows such as [SetFit-based text classification](./docs/tutorials/pytorch/text-classification/SetFit_model_compression_AGNews.ipynb) and [document level sentiment analysis (DLSA)](https://github.com/intel/intel-extension-for-transformers/tree/main/examples/E2E-solution/DLSA)

* [NeuralChat](https://github.com/intel/intel-extension-for-transformers/tree/main/examples/optimization/pytorch/huggingface/language-modeling/chatbot), a custom Chatbot trained on Intel CPUs through parameter-efficient fine-tuning [PEFT](https://github.com/huggingface/peft) on domain knowledge


## Installation
Expand Down Expand Up @@ -64,9 +77,9 @@ output = model(**input).logits.argmax().item()
<tbody>
<tr>
<td colspan="2" align="center"><a href="https://github.com/intel/intel-extension-for-transformers/tree/main/docs">Model Compression</a></td>
<td colspan="2" align="center"><a href="https://github.com/intel/intel-extension-for-transformers/tree/main/examples/optimization/pytorch/huggingface/language-modeling/chatbot">NeuralChat</a></td>
<td colspan="2" align="center"><a href="https://github.com/intel/intel-extension-for-transformers/tree/main/intel_extension_for_transformers/backends/neural_engine/docs">Neural Engine</a></td>
<td colspan="2" align="center"><a href="intel_extension_for_transformers/backends/neural_engine/kernels/README.md">Kernel Libraries</a></td>
<td colspan="2" align="center"><a href="https://github.com/intel/intel-extension-for-transformers/tree/main/examples">Examples</a></td>
</tr>
<tr>
<th colspan="8" align="center">MODEL COMPRESSION</th>
Expand Down
2 changes: 1 addition & 1 deletion conda_meta/meta.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{% set version = "1.0" %}
{% set version = "1.0.0" %}
{% set buildnumber = 0 %}
package:
name: intel_extension_for_transformers
Expand Down
2 changes: 1 addition & 1 deletion docs/architecture.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Architecture of Intel® Extension for Transformers

<img src="./imgs/arch.png" width=691 height=444 alt="arch">
<img src="./imgs/arch.png" width=600 height=250 alt="arch">
</br>
8 changes: 4 additions & 4 deletions examples/deployment/neural_engine/gpt-j/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,17 +26,17 @@ export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libiomp5.so
export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
```
## Performance
### SingleNode inference
The fp32 model are in huggingface [EleutherAI/gpt-j-6B](https://huggingface.co/EleutherAI/gpt-j-6B), int8 model has been publiced on [Intel/gpt-j-6B-pytorch-int8-static](https://huggingface.co/Intel/gpt-j-6B-pytorch-int8-static).

#### Generate IR
The fp32 model is [EleutherAI/gpt-j-6B](https://huggingface.co/EleutherAI/gpt-j-6B), and int8 model has been publiced on [Intel/gpt-j-6B-pytorch-int8-static](https://huggingface.co/Intel/gpt-j-6B-pytorch-int8-static).

### Generate IR
```bash
python gen_ir.py --model=EleutherAI/gpt-j-6B --dtype=bf16 --output_model='./ir' --pt_file='new.pt' # dtype could be fp32/ int8/ bf16
```
- When the input dtype is fp32 or bf16, the pt file will be automatically saved if it does not exist.
- When the input dtype is int8, the pt file should exist.

#### Inference
### Inference
```bash
# support single socket and multiple sockets
OMP_NUM_THREADS=<physical cores num> numactl -m <node N> -C <cpu list> python run_gptj.py --max-new-tokens 32 --ir_path <path to ir>
Expand Down
12 changes: 6 additions & 6 deletions examples/deployment/neural_engine/stable_diffusion/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,15 +46,15 @@ export WEIGHT_SHARING=1
export INST_NUM=<inst num>
```
# End-to-End Workflow
## Prepare Models
## 1. Prepare Models

The stable diffusion mainly includes three onnx models: text_encoder, unet, vae_decoder.

The pretrained model [CompVis/stable-diffusion-v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4) and [runwayml/stable-diffusion-v1-5](https://github.com/runwayml/stable-diffusion) provied by diffusers are the same in the default config.

Here we take CompVis/stable-diffusion-v1-4 as an example.

### Download Models
### 1.1 Download Models
Export FP32 ONNX models from the hugginface diffusers module, command as follows:

```python
Expand All @@ -66,7 +66,7 @@ By setting --bf16 to export FP32 and BF16 models.
python prepare_model.py --input_model=CompVis/stable-diffusion-v1-4 --output_path=./model --bf16
```

### Compile Models
### 1.2 Compile Models
Export three FP32 onnx sub models of the stable diffusion to Nerual Engine IRs.

```bash
Expand Down Expand Up @@ -105,7 +105,7 @@ python export_ir.py --onnx_model=./model/unet_bf16/model.onnx --pattern_config=u
python export_ir.py --onnx_model=./model/vae_decoder_bf16/bf16-model.onnx --pattern_config=vae_decoder_pattern.conf --output_path=./bf16_ir/vae_decoder/
```

## Performance
## 2. Performance

Python API command as follows:
```python
Expand All @@ -116,7 +116,7 @@ GLOG_minloglevel=2 python run_executor.py --ir_path=./fp32_ir --mode=performance
GLOG_minloglevel=2 python run_executor.py --ir_path=./bf16_ir --mode=performance
```

## Accuracy
## 3. Accuracy
Frechet Inception Distance(FID) metric is used to evaluate the accuracy. This case we check the FID scores between the pytorch image and engine image.

By setting --accuracy to check FID socre.
Expand All @@ -129,7 +129,7 @@ GLOG_minloglevel=2 python run_executor.py --ir_path=./fp32_ir --mode=accuracy
GLOG_minloglevel=2 python run_executor.py --ir_path=./bf16_ir --mode=accuracy
```

## Text-to-image
## 4. Try Text to Image

Try using one sentence to create a picture!

Expand Down
2 changes: 1 addition & 1 deletion intel_extension_for_transformers/version.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,4 +17,4 @@

"""The neural engine version file."""

__version__ = "1.0"
__version__ = "1.0.0"

0 comments on commit 755ba0e

Please sign in to comment.