Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAM and the bioengine / bioimageio-colab #3

Open
constantinpape opened this issue May 16, 2024 · 6 comments
Open

SAM and the bioengine / bioimageio-colab #3

constantinpape opened this issue May 16, 2024 · 6 comments

Comments

@constantinpape
Copy link
Contributor

Running SAM in the Modelzoo Universe

We have started with some efforts on integrating SAM with the bioengine / imjoy / bioimageio-colab.
I want to summarize here the overall goals, the current approaches and the steps and questions for how to achieve these goals.

Goals

I think there are two main goals for this integration:

  1. Implementing a test-run functionality for SAMs in the modelzoo website, where users can upload an example image and test how well a given model works for it (in interactive segmentation).
  2. Implementing annotation tools based on SAM that can be used within other tools build with imjoy e.g. for collaborative annotation.

For now I am mostly interested in implementing goal 1, but I think goal 2 is much more interesting mid/long-term.
Ultimately it would be nice to have a set of functionality that can be used to build apps for both of these approaches.

Current Approaches

We have two prototypes for SAM integration:

  • An imjoy app that uses a model on the hypha triton server here. (Note: that currently doesn't run, presumably because the model is not available).
    • This is the approach to implement goal 1.
  • A script that starts a server to serve a SAM model and an app that connects to it and enables point based segmentation. (The prototype works, I only tested for running server and client on the same machine).
    • This is the approach to implement goal 2. (Though it would ultimately be best to share as much functionality between these as possible).

Next steps / Questions

  • To implement the test functionality we would need a SAM model in hypha to test it. @oeway could you upload this model so I could test it? It contains the image encoder as torchscript and the prompt encoder and mask decoder as onnx.
  • What is the best way to have a library to re-use functionality for the user-interface on the java-script side?
@oeway
Copy link
Collaborator

oeway commented May 17, 2024

Will look more into this! Agreed to the steps forward, I would be very happy to support this.

I have already uploaded the two models you provided to the model repository, and they should be synchronized automatically to the bioengine instances:

For the client side, we can change the api to use the bioengine too. I will try to get back to this, or maybe @nilsmechtel can also help here too.

@constantinpape
Copy link
Contributor Author

I have already uploaded the two models you provided to the model repository, and they should be synchronized automatically to the bioengine instances:

Thanks! I tried to access them, but this failed:

from imjoy_rpc.hypha import connect_to_server

SERVER_URL = "https://hypha.bioimage.io"


async def run():
    server = await connect_to_server(
        {"name": "test client", "server_url": SERVER_URL, "method_timeout": 100}
    )
    triton = await server.get_service("triton-client")
    config = await triton.get_config(model_name="sam-vit_t-encoder")

if __name__ == "__main__":
    import asyncio
    asyncio.run(run())

Fails with

    response = await client.get_model_config(model_name)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.12/site-packages/pyotritonclient/http.py", line 524, in get_model_config
    _raise_if_error(response)
  File "/opt/conda/lib/python3.12/site-packages/pyotritonclient/http.py", line 73, in _raise_if_error
    raise error
pyotritonclient.utils.InferenceServerException: Request for unknown model: 'sam-vit_t-encoder' is not found

Can you share a small snippet for how to correctly access the model?

For the client side, we can change the api to use the bioengine too. I will try to get back to this, or maybe @nilsmechtel can also help here too.

That would be great! We can also set up a zoom meeting at some point to coordinate.

@oeway
Copy link
Collaborator

oeway commented May 17, 2024

Thanks for trying, it appears that the model is failing,

triton-1  | I0517 00:19:33.953968 1 libtorch.cc:1430] TRITONBACKEND_ModelInitialize: sam-vit_t-encoder (version 1)
triton-1  | W0517 00:19:33.954549 1 libtorch.cc:264] skipping model configuration auto-complete for 'sam-vit_t-encoder': not supported for pytorch backend
triton-1  | I0517 00:19:33.954834 1 libtorch.cc:293] Optimized execution is enabled for model instance 'sam-vit_t-encoder'
triton-1  | I0517 00:19:33.954851 1 libtorch.cc:311] Inference Mode is enabled for model instance 'sam-vit_t-encoder'
triton-1  | I0517 00:19:33.954859 1 libtorch.cc:406] NvFuser is not specified for model instance 'sam-vit_t-encoder'

I will need more investigation.

@oeway
Copy link
Collaborator

oeway commented May 20, 2024

I just tried it again, and the encoder is causing triton to crash:

triton-1  | I0520 21:07:56.769857 1 model_repository_manager.cc:1231] successfully loaded 'sam-vit_t-decoder' version 1
triton-1  | I0520 21:07:56.770177 1 backend_model_instance.cc:105] Creating instance sam-vit_t-encoder on GPU 0 (8.6) using artifact 'model.pt'
triton-1  | terminate called after throwing an instance of 'c10::Error'
triton-1  |   what():  isTuple() INTERNAL ASSERT FAILED at "/opt/pytorch/pytorch/aten/src/ATen/core/ivalue_inl.h":1916, please report a bug to PyTorch. Expected Tuple but got String
triton-1  | Exception raised from toTupleRef at /opt/pytorch/pytorch/aten/src/ATen/core/ivalue_inl.h:1916 (most recent call first):
triton-1  | frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6c (0x7fb638c361dc in /opt/tritonserver/backends/pytorch/libc10.so)
triton-1  | frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xfa (0x7fb638c13cd4 in /opt/tritonserver/backends/pytorch/libc10.so)
triton-1  | frame #2: c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x53 (0x7fb638c33ef3 in /opt/tritonserver/backends/pytorch/libc10.so)
triton-1  | frame #3: <unknown function> + 0x370d6da (0x7fb66776a6da in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)
triton-1  | frame #4: <unknown function> + 0x370d849 (0x7fb66776a849 in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)
triton-1  | frame #5: torch::jit::SourceRange::highlight(std::ostream&) const + 0x48 (0x7fb665148e48 in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)
triton-1  | frame #6: torch::jit::ErrorReport::what() const + 0x2c3 (0x7fb66512ee93 in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)
triton-1  | frame #7: <unknown function> + 0x111f9 (0x7fb66dd2d1f9 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so)
triton-1  | frame #8: <unknown function> + 0x1f3c2 (0x7fb66dd3b3c2 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so)
triton-1  | frame #9: <unknown function> + 0x1f8e2 (0x7fb66dd3b8e2 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so)
triton-1  | frame #10: TRITONBACKEND_ModelInstanceInitialize + 0x3f6 (0x7fb66dd3bd26 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so)
triton-1  | frame #11: <unknown function> + 0x1094ea (0x7fb66f97b4ea in /opt/tritonserver/bin/../lib/libtritonserver.so)
triton-1  | frame #12: <unknown function> + 0x10afd1 (0x7fb66f97cfd1 in /opt/tritonserver/bin/../lib/libtritonserver.so)
triton-1  | frame #13: <unknown function> + 0x1007f1 (0x7fb66f9727f1 in /opt/tritonserver/bin/../lib/libtritonserver.so)
triton-1  | frame #14: <unknown function> + 0x1ae2ba (0x7fb66fa202ba in /opt/tritonserver/bin/../lib/libtritonserver.so)
triton-1  | frame #15: <unknown function> + 0x1bbcf1 (0x7fb66fa2dcf1 in /opt/tritonserver/bin/../lib/libtritonserver.so)
triton-1  | frame #16: <unknown function> + 0xd6de4 (0x7fb66f4c2de4 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
triton-1  | frame #17: <unknown function> + 0x8609 (0x7fb6706c9609 in /usr/lib/x86_64-linux-gnu/libpthread.so.0)
triton-1  | frame #18: clone + 0x43 (0x7fb66f1ad163 in /usr/lib/x86_64-linux-gnu/libc.so.6)
triton-1  | 
triton-1 exited with code 0

@constantinpape
Copy link
Contributor Author

Thanks for checking again @oeway. I will check it out locally later.

@constantinpape
Copy link
Contributor Author

Hi @oeway,
I tried it locally, but can't reproduce the error. This code works for me with the encoder:

 import torch

model = torch.jit.load("test-export/sam-vit_t-encoder/1/model.pt")

input_data = torch.randn(1, 3, 1024, 1024)

model.eval()  # Set to evaluation mode
with torch.no_grad():
    output = model(input_data)

print("Run prediction ...")
print(output.shape)

But I went ahead and uploaded another test model, using a vit_b encoder here. Could you see if that one works in hypha / triton?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants