[Feat]: gpu inference #63

kevkid · 2024-10-22T17:42:22Z

Description
Will we see gpu inference to speed up generation

Use Case
In all use cases we want more speed.

a-ghorbani · 2024-10-22T18:34:00Z

Which device are you referring to? If iPhone, Metal is already supported.

kevkid · 2024-10-22T18:47:17Z

Android, if I am remembering correctly i was able to compile llamacpp on my device and it ran fairly quick. But even the 3b model feels very slow

JasonOSX · 2024-11-02T18:13:13Z

Also interesting, I downloaded Qwen-2.5-3B and it worked great, then downloaded some more models and all a sudden all models are extremely slow, only producing 0.5 token per second. it is 6 t/s when removing and installing again. Pixel 8 / Android 15

sotwi · 2024-12-04T12:13:25Z

Android GPU support would be very welcomed

kevkid added the enhancement New feature or request label Oct 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat]: gpu inference #63

[Feat]: gpu inference #63

kevkid commented Oct 22, 2024

a-ghorbani commented Oct 22, 2024

kevkid commented Oct 22, 2024 •

edited

Loading

JasonOSX commented Nov 2, 2024

sotwi commented Dec 4, 2024

[Feat]: gpu inference #63

[Feat]: gpu inference #63

Comments

kevkid commented Oct 22, 2024

a-ghorbani commented Oct 22, 2024

kevkid commented Oct 22, 2024 • edited Loading

JasonOSX commented Nov 2, 2024

sotwi commented Dec 4, 2024

kevkid commented Oct 22, 2024 •

edited

Loading