-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tasks related to machine learning are not functioning properly. #10343
Comments
@mertalev this is the second report I've seen of ML hwaccel on cuda failing with a segfault. Any chance we have a regression here? |
We recently updated to ONNX Runtime 1.18.0. It could be a regression there. |
Can you confirm if doing a search (with text in the context field) also causes a segmentation fault? Does face detection cause one? Trying to narrow down whether it's a model-specific behavior. |
The context search and face detection features are working fine, but this is under the condition of solely using CPU for machine learning tasks. I believe it has nothing to do with the model; I've tried models like XLM-Roberta-Large-Vit-B-32, XLM-Roberta-Large-Vit-B-16Plus, and ViT-B-32__openai. If a machine learning task hasn't been successfully executed since creating an immich Docker instance, then I can't even use the context search feature at all. Here is the log when I used the XLM-Roberta-Large-Vit-B-16Plus model for context search:
ViT-B-32__openai:
Facial detection, buffalo_l model:
antelopev2:
|
I suspect it's an issue with the podman-compose here. This problem was supposedly fixed in commit 79865c2, but the latest version of podman-compose hasn't incorporated this fix yet (as can be seen from the merge time of the PR and the release time of podman-compose v1.1.0). |
The bug
When attempting to speed up machine learning tasks using CUDA, the 'immich-machine-learning' reports an error as follows: 'Worker (pid:5) was sent code 139!'
The 'immich-server' is indicating errors like this: "ERROR [Microservices:JobService] Unable to run job handler (smartSearch/smart-search): Error: Machine learning request to "http://immich-machine-learning:3003" failed with SocketError: other side closed."
The OS that Immich Server is running on
Arch
Version of Immich Server
v1.106.4
Version of Immich Mobile App
N/A
Platform with the issue
Your docker-compose.yml content
Your .env content
Reproduction steps
Relevant log output
Additional information
The prerequisites for using CUDA to accelerate machine learning tasks are satisfied. However, there is an issue when hardware acceleration is not being used (using the default configuration in docker-compose.yml) as no such problem arises.
The text was updated successfully, but these errors were encountered: