[WIP] Export to tflite int8 #9

kyakuno · 2024-12-20T05:19:37Z

高速化のため、FULLY_INTEGER_QUANTIZATIONのモデルの構築の検討。

Windowsだとedge-ai-torchの依存ライブラリが入らないので、WSLが必要。

 python3 export_image_predictor.py --image_size 512 --framework tflite --accuracy int8

現段階では、下記の変換エラーが発生する。

   output = self._dispatch_impl(func, types, args, kwargs)
  File "/home/kyakuno/.local/lib/python3.10/site-packages/torch/_subclasses/fake_tensor.py", line 1757, in _dispatch_impl
    r = func(*args, **kwargs)
  File "/home/kyakuno/.local/lib/python3.10/site-packages/torch/_ops.py", line 667, in __call__
    return self_._op(*args, **kwargs)
  File "/home/kyakuno/.local/lib/python3.10/site-packages/torch/ao/quantization/fx/_decomposed.py", line 81, in quantize_per_tensor_meta
    assert input.dtype == torch.float32, f"Expecting input to have dtype torch.float32, but got dtype: {input.dtype}"
torch._dynamo.exc.TorchRuntimeError: Failed running call_function quantized_decomposed.quantize_per_tensor.default(*(FakeTensor(..., size=(), dtype=torch.int64), 0.007843137718737125, -128, -128, 127, torch.int8), **{}):
Expecting input to have dtype torch.float32, but got dtype: torch.int64

from user code:
   File "<eval_with_key>.12", line 1083, in forward
    quantize_per_tensor_default_261 = torch.ops.quantized_decomposed.quantize_per_tensor.default(_tensor_constant_9, 0.007843137718737125, -128, -128, 127, torch.int8);  _tensor_constant_9 = None

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information

kyakuno · 2024-12-20T07:11:07Z

エラーの_tensor_constant_9はMul -> div -> Powに繋がっている。
position_encoding.pyでdim_tを計算しているところで、torch.arrangeが定数としてグラフの入力になってしまい、量子化に失敗している。
dim_tを事前計算にすると、変換が行えるようになる。

kyakuno · 2024-12-20T07:12:17Z

キャリブレーション画像をtruck.jpgとした場合の出力。グラフの精度は足りそう。

int8

float

kyakuno · 2024-12-20T07:17:44Z

TensorWise Quantization版
https://storage.googleapis.com/ailia-models-tflite/segment-anything-2.1/image_encoder_hiera_t_2.1_512.torch_tw_int8.tflite

kyakuno · 2024-12-21T02:51:14Z

ChannelWise Quantization版
https://storage.googleapis.com/ailia-models-tflite/segment-anything-2.1/image_encoder_hiera_t_2.1_512.torch_cw_int8.tflite

kyakuno · 2024-12-21T02:52:14Z

Image Encoderはレンジのばらつきはあまりないようで、1枚の画像でキャリブレーションしても正常な絵が出力される。

kyakuno · 2024-12-21T02:55:16Z

推論サンプル
axinc-ai/ailia-models-tflite#95

kyakuno · 2024-12-21T05:08:25Z

mask decoderは下記のエラーになる。

ValueError: num_scales must be 1 for per-layer quantization, or 256 for per-axis quantization, but got 128.Tensor 201 has invalid quantization parameters.num_scales must be 1 for per-layer quantization, or 256 for per-axis quantization, but got 128.Tensor 217 has invalid quantization parameters.num_scales must be 1 for per-layer quantization, or 256 for per-axis quantization, but got 128.Tensor 270 has invalid quantization parameters.

kyakuno · 2024-12-21T05:14:26Z

is_per_channel = Falseにした場合も、Convはper channelで量子化される。
FCで新しい規格のis_per_channelを使うかどうかの判定みたい。

kyakuno · 2024-12-21T05:49:02Z

Memory Attention Per Channel
https://storage.googleapis.com/ailia-models-tflite/segment-anything-2.1/memory_attention_hiera_t_2.1_512.torch_cw_int8.tflite
Mask Decoder Per Tensor
https://storage.googleapis.com/ailia-models-tflite/segment-anything-2.1/mask_decoder_hiera_t_2.1_512.torch_tw_int8.tflite

kyakuno · 2024-12-21T06:28:55Z

キャリブレーション用のデータセットを作成する機能を足さないといけない。

kyakuno · 2024-12-21T08:03:08Z

下記の例では、torchを使用せずに、tensorflowの方で量子化している。
google-ai-edge/ai-edge-torch#345

kyakuno · 2024-12-21T08:15:15Z

tensorflowで量子化すると、layernormもint8になる。
https://storage.googleapis.com/ailia-models-tflite/segment-anything-2.1/image_encoder_hiera_t_2.1_512.tf_int8.tflite

torch (layernormがfloatになる)

tensorflow (layernormがint8になる)

kyakuno · 2024-12-22T13:06:08Z

全てのモデルにキャリブレーション機能を追加。とりあえず、COCOデータセットの100枚でキャリブレーションをかける。

kyakuno · 2024-12-22T23:24:58Z

キャリブレーションを実施。image_encoder、mask_decoder、prompt_encoderはCOCOデータセットの100枚、それ以外はbedroomの200枚を使用。
https://storage.googleapis.com/ailia-models-tflite/segment-anything-2.1/image_encoder_hiera_t_2.1_512.int8.tflite
https://storage.googleapis.com/ailia-models-tflite/segment-anything-2.1/mask_decoder_hiera_t_2.1_512.int8.tflite
https://storage.googleapis.com/ailia-models-tflite/segment-anything-2.1/memory_attention_hiera_t_2.1_512.int8.tflite
https://storage.googleapis.com/ailia-models-tflite/segment-anything-2.1/obj_ptr_tpos_proj_hiera_t_2.1_512.int8.tflite
https://storage.googleapis.com/ailia-models-tflite/segment-anything-2.1/memory_encoder_hiera_t_2.1_512.int8.tflite
https://storage.googleapis.com/ailia-models-tflite/segment-anything-2.1/prompt_encoder_hiera_t_2.1_512.int8.tflite
https://storage.googleapis.com/ailia-models-tflite/segment-anything-2.1/mlp_hiera_t_2.1_512.int8.tflite

kyakuno · 2024-12-22T23:36:28Z

テスト方法

./download_tflite_models.sh
python3 export_image_predictor.py --framework tflite --accuracy int8 --mode import --image_size 512
python3 export_video_predictor.py --framework tflite --accuracy int8 --mode import --image_size 512

出力

output/*.png

kyakuno · 2024-12-27T08:13:12Z

tensorflowではなくtorchで量子化し、layernormをFloatで実行したモデル。キャリブレーション条件は同一。
https://storage.googleapis.com/ailia-models-tflite/segment-anything-2.1/layernorm_float/image_encoder_hiera_t_2.1_512.int8.tflite
https://storage.googleapis.com/ailia-models-tflite/segment-anything-2.1/layernorm_float/mask_decoder_hiera_t_2.1_512.int8.tflite

kyakuno added 2 commits December 20, 2024 14:18

Export to tflite int8

93666c1

Fix int8 inference

3ab2f02

Channel wise quantization

f4d50c9

Export mask decoder

d83fcf8

kyakuno added 3 commits December 21, 2024 14:18

Fix data supply

1ad8e46

Export memory attention to int8

c0e95b8

Disable per channel

a23c412

Tensorflow quantization

ff185b2

kyakuno added 10 commits December 21, 2024 21:40

Export mask decoder

d72daa1

Export prompt encoder

466e660

Export memory attention and memory encoder

2bab821

Export memory encoder

c998c4c

Export projection

056f594

Implement calibration mode

445f141

Calibration for memory attention

45cd9b3

Int8 switch

37290b2

Implement calibration

5de0f24

Calibrate prompt encoder

7a765c6

kyakuno added 3 commits December 22, 2024 22:00

Temporally use float model for prompt encoder

ee2f66e

Use large dataset

c10b194

Added quantize script

1cc5769

Use large data for calibration

4d78e77

kyakuno added 4 commits December 24, 2024 17:46

Convert to numpy for ailia

951c6c6

Fix tflite int8 for image encoder of video predictor

2f46cf1

Fix tflite_int8 for prompt_encoder and mask_decoder for video predictor

4ad6b4f

Added torch quantize support

6341bca

kyakuno added 2 commits January 21, 2025 16:52

Added mixed precision support

a292627

Upload mixed precision model

16af7f1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Export to tflite int8 #9

[WIP] Export to tflite int8 #9

kyakuno commented Dec 20, 2024 •

edited

Loading

kyakuno commented Dec 20, 2024

kyakuno commented Dec 20, 2024

kyakuno commented Dec 20, 2024 •

edited

Loading

kyakuno commented Dec 21, 2024 •

edited

Loading

kyakuno commented Dec 21, 2024

kyakuno commented Dec 21, 2024

kyakuno commented Dec 21, 2024

kyakuno commented Dec 21, 2024

kyakuno commented Dec 21, 2024 •

edited

Loading

kyakuno commented Dec 21, 2024

kyakuno commented Dec 21, 2024

kyakuno commented Dec 21, 2024 •

edited

Loading

kyakuno commented Dec 22, 2024

kyakuno commented Dec 22, 2024 •

edited

Loading

kyakuno commented Dec 22, 2024

kyakuno commented Dec 27, 2024 •

edited

Loading

[WIP] Export to tflite int8 #9

Are you sure you want to change the base?

[WIP] Export to tflite int8 #9

Conversation

kyakuno commented Dec 20, 2024 • edited Loading

kyakuno commented Dec 20, 2024

kyakuno commented Dec 20, 2024

kyakuno commented Dec 20, 2024 • edited Loading

kyakuno commented Dec 21, 2024 • edited Loading

kyakuno commented Dec 21, 2024

kyakuno commented Dec 21, 2024

kyakuno commented Dec 21, 2024

kyakuno commented Dec 21, 2024

kyakuno commented Dec 21, 2024 • edited Loading

kyakuno commented Dec 21, 2024

kyakuno commented Dec 21, 2024

kyakuno commented Dec 21, 2024 • edited Loading

kyakuno commented Dec 22, 2024

kyakuno commented Dec 22, 2024 • edited Loading

kyakuno commented Dec 22, 2024

kyakuno commented Dec 27, 2024 • edited Loading

kyakuno commented Dec 20, 2024 •

edited

Loading

kyakuno commented Dec 20, 2024 •

edited

Loading

kyakuno commented Dec 21, 2024 •

edited

Loading

kyakuno commented Dec 21, 2024 •

edited

Loading

kyakuno commented Dec 21, 2024 •

edited

Loading

kyakuno commented Dec 22, 2024 •

edited

Loading

kyakuno commented Dec 27, 2024 •

edited

Loading