-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Support AWS inferentia inf2 instances #8954
Comments
None of this stuff appears to be open source and the interface to Amazon's devices doesn't appear to be documented. You're supposed to use their proprietary compiler to generate executables that can use these devices and Amazon gives you things like PyTorch-compatible wrappers to utilize these from Python. The high-level C++ APIs for this only manage loading these executables and feeding them tensors. If you wanted to implement your own inference from scratch and run custom software like llama.cpp, you would need to either reverse engineer the device driver interface (and/or compiler, possibly using the Python code as a guide) or Amazon would need to release docs for it. This is not a "bring your own framework" setup, at least not yet. The fact that Amazon manages most of this stuff for you and they only have compatibility with their own software to worry about probably contributes to why they can offer those cheaper rates (aside from the hardware itself, of course). |
This was discussed also in #2109 ; I think that getting access to Currently one can compile a NEFF program from How a project like vLLM does that is they ensure that apart from what is mentioned in they install the compiler+torch python packages When someone is willing to look into making and maintaining this functionality into an easy-to-consume build system + library for compiling and loading arbitrary kernels, and maintain the CC: @ggerganov |
The best way to get things going is to make a PoC of a |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
Prerequisites
Feature Description
AWS inf2 instances are supposed to provide better performance at cheaper rates. It would be great to support these instances.
Motivation
#2109
ollama/ollama#6143
Possible Implementation
No response
The text was updated successfully, but these errors were encountered: