-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Paligemma Workflows Block #399
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work getting it working! 🔥
Looks like the tests aren't passing; not sure if that's also the case on main
or not (didn't look into it yet).
"block_type": "model", | ||
} | ||
) | ||
type: Literal["Paligemma"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be PaliGemmaModel
(all model blocks should end in Model
IMO)
) | ||
|
||
response = await self._model_manager.infer_from_request( | ||
paligemma_model_id, inference_request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this require a server to be running?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks more like what I'd expect: https://colab.research.google.com/drive/1_q09OjR2Ldl1FZnvfqwckvxrW_FIYclC?usp=sharing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I don't think requires server, this is how we load all the models in workflow blocks
def get_manifest(cls) -> Type[WorkflowBlockManifest]: | ||
return BlockManifest | ||
|
||
async def run_locally( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want this one to be setup both for local & async execution since it can't run without an NVIDIA GPU. You may want to be doing realtime video for your workflow (eg on a Jetson) but occasionally call out to a beefy server somewhere for a LLM response.
just FYI - unit tests are failing as we are dragging |
Should we add torch as a dep for the tests? |
@yeldarby - we could, but the same would apply for pepole installing |
to fix the test maybe we just have to remove the import here, since actually we are using model manager to load the model? https://github.com/roboflow/inference/pull/399/files#diff-f8d393a3b2cc29b1852b9c9bcedd10a1f4682607a3fb57bb30dcc110954fb0abR26 |
WorkflowBlock, | ||
WorkflowBlockManifest, | ||
) | ||
from inference.models.paligemma.paligemma import PaliGemma |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exactly
I think before we merge this we should figure out wither:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed a lot in EE, cannot be merged now propably
We'll certainly need this ability for other blocks (even if we eventually host Paligemma). @EmilyGavrilenko this seems aligned with your @hansent are there any other blocks which cannot run on Hosted API currently? |
Not in this repo yet I think. The enterprise blocks repo that we are wanting to add blocks from however contains some that can't because they are stateful, like e.g. ByteTrackerBlock. |
Don't we currently support CogVLM in the UI via the LMM block? |
@EmilyGavrilenko your right, the LLM block will error on hosted execution via this code path: https://github.com/roboflow/inference/blob/main/inference/core/workflows/core_steps/models/foundation/lmm.py#L373 Seems like a third class in terms of the LLM block. It can run on hosted and local, but depending on configuration can only run local / wont know until runtime (unless the manifest would somehow expose that certain config values make it non executable on hosted). Another approach would be be to split it up into separate blocks one for GPT4, one for CogVLM? Not sure its worth handling / worrying about now? CogVLM hasn't really been something I've heard requested or come up in examples we want to enable for users very much. |
stale |
Description
This adds a workflow block to use the new Paligemma model
Type of change
How has this change been tested, please provide a testcase or example of how you tested the change?
Locally using a notebook