[Feature Request] Zenzai and CoreML #115

ensan-hcl · 2024-08-03T08:04:55Z

背景

Zenzaiは現在llama.cppを推論ランタイムとして動いている。llama.cppはプラットフォーム間の移植が容易である利点がある一方で、Apple PlatformsにおいてはNPU（Neural Engine）を利用できないため、効率においてCoreMLに劣る可能性がある。
そこで、CoreMLを用いたZenzaiの実装を実現し、llama.cppを上回る推論パフォーマンスを得たい。

現在の状況

@Skyline-23 さんによるzenzモデルのCoreML版が存在する。
https://github.com/Skyline-23/zenz-CoreML

これを用いて推論を行う実装が以下であり、現在logitの取得までは達成できている。しかし推論時間はllama.cppの1.4倍程度遅く、またNPUが適切に使えていないと見られる挙動を示す。

https://github.com/ensan-hcl/swift-zenz-coreml

やりたいこと

以上を踏まえ、以下を行ってllama.cppを超える性能を得たい。これが実現できれば、原理的にはApple PlatformにおいてCoreMLを利用した高速化が得られることになる。

zenz-v2(無改造のGPT-2)をCoreMLで動かし、入力に対するlogitを得る
Quantization, Palletizationをかけてモデルを圧縮する
KV-Cachingを有効にする
llama.cppよりも高速な推論性能を得る

ensan-hcl added the enhancement New feature or request label Aug 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Zenzai and CoreML #115

[Feature Request] Zenzai and CoreML #115

ensan-hcl commented Aug 3, 2024

[Feature Request] Zenzai and CoreML #115

[Feature Request] Zenzai and CoreML #115

Comments

ensan-hcl commented Aug 3, 2024

背景

現在の状況

やりたいこと