Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feat]: gpu inference #63

Open
kevkid opened this issue Oct 22, 2024 · 4 comments
Open

[Feat]: gpu inference #63

kevkid opened this issue Oct 22, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@kevkid
Copy link

kevkid commented Oct 22, 2024

Description
Will we see gpu inference to speed up generation

Use Case
In all use cases we want more speed.

@kevkid kevkid added the enhancement New feature or request label Oct 22, 2024
@a-ghorbani
Copy link
Owner

Which device are you referring to? If iPhone, Metal is already supported.

@kevkid
Copy link
Author

kevkid commented Oct 22, 2024

Android, if I am remembering correctly i was able to compile llamacpp on my device and it ran fairly quick. But even the 3b model feels very slow

@JasonOSX
Copy link

JasonOSX commented Nov 2, 2024

Also interesting, I downloaded Qwen-2.5-3B and it worked great, then downloaded some more models and all a sudden all models are extremely slow, only producing 0.5 token per second. it is 6 t/s when removing and installing again. Pixel 8 / Android 15

@sotwi
Copy link

sotwi commented Dec 4, 2024

Android GPU support would be very welcomed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants