Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tasks related to machine learning are not functioning properly. #10343

Open
1 of 3 tasks
Tql-ws1 opened this issue Jun 15, 2024 · 5 comments
Open
1 of 3 tasks

Tasks related to machine learning are not functioning properly. #10343

Tql-ws1 opened this issue Jun 15, 2024 · 5 comments

Comments

@Tql-ws1
Copy link

Tql-ws1 commented Jun 15, 2024

The bug

When attempting to speed up machine learning tasks using CUDA, the 'immich-machine-learning' reports an error as follows: 'Worker (pid:5) was sent code 139!'

The 'immich-server' is indicating errors like this: "ERROR [Microservices:JobService] Unable to run job handler (smartSearch/smart-search): Error: Machine learning request to "http://immich-machine-learning:3003" failed with SocketError: other side closed."

The OS that Immich Server is running on

Arch

Version of Immich Server

v1.106.4

Version of Immich Mobile App

N/A

Platform with the issue

  • Server
  • Web
  • Mobile

Your docker-compose.yml content

name: immich

services:
  immich-server:
    container_name: immich_server
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    extends:
      file: hwaccel.transcoding.yml
      service: nvenc # set to one of [nvenc, quicksync, rkmpp, vaapi, vaapi-wsl] for accelerated transcoding
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
      - /etc/localtime:/etc/localtime:ro
    env_file:
      - .env
    ports:
      - 15002:3001
    depends_on:
      - redis
      - database
    restart: always

  immich-machine-learning:
    container_name: immich_machine_learning
    # For hardware acceleration, add one of -[armnn, cuda, openvino] to the image tag.
    # Example tag: ${IMMICH_VERSION:-release}-cuda
    image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}-cuda
    extends: # uncomment this section for hardware acceleration - see https://immich.app/docs/features/ml-hardware-acceleration
      file: hwaccel.ml.yml
      service: cuda # set to one of [armnn, cuda, openvino, openvino-wsl] for accelerated inference - use the `-wsl` version for WSL2 where applicable
    volumes:
      - model-cache:/cache
    env_file:
      - .env
    restart: always

  redis:
    container_name: immich_redis
    image: docker.io/redis:6.2-alpine@sha256:d6c2911ac51b289db208767581a5d154544f2b2fe4914ea5056443f62dc6e900
    healthcheck:
      test: redis-cli ping || exit 1
    restart: always

  database:
    container_name: immich_postgres
    image: docker.io/tensorchord/pgvecto-rs:pg14-v0.2.0@sha256:90724186f0a3517cf6914295b5ab410db9ce23190a2d9d0b9dd6463e3fa298f0
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_USER: ${DB_USERNAME}
      POSTGRES_DB: ${DB_DATABASE_NAME}
      POSTGRES_INITDB_ARGS: '--data-checksums'
    volumes:
      - ${DB_DATA_LOCATION}:/var/lib/postgresql/data
    healthcheck:
      test: pg_isready --dbname='${DB_DATABASE_NAME}' || exit 1; Chksum="$$(psql --dbname='${DB_DATABASE_NAME}' --username='${DB_USERNAME}' --tuples-only --no-align --command='SELECT COALESCE(SUM(checksum_failures), 0) FROM pg_stat_database')"; echo "checksum failure count is $$Chksum"; [ "$$Chksum" = '0' ] || exit 1
      interval: 5m
      start_interval: 30s
      start_period: 5m
    command: ["postgres", "-c" ,"shared_preload_libraries=vectors.so", "-c", 'search_path="$$user", public, vectors', "-c", "logging_collector=on", "-c", "max_wal_size=2GB", "-c", "shared_buffers=512MB", "-c", "wal_compression=on"]
    restart: always

volumes:
  model-cache:

Your .env content

# You can find documentation for all the supported env variables at https://immich.app/docs/install/environment-variables

# The location where your uploaded files are stored
UPLOAD_LOCATION=./library
# The location where your database files are stored
DB_DATA_LOCATION=./postgres

# To set a timezone, uncomment the next line and change Etc/UTC to a TZ identifier from this list: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones#List
TZ=***/***

# The Immich version to use. You can pin this to a specific version like "v1.71.0"
IMMICH_VERSION=release

# Connection secret for postgres. You should change it to a random password
DB_PASSWORD=******

# The values below this line do not need to be changed
###################################################################################
DB_USERNAME=postgres
DB_DATABASE_NAME=immich

Reproduction steps

1. $ podman-compose -f docker-compose.yml.gpu up
2. Access the immich WebUI, navigate to the "Administration Settings" and select the "Jobs" option. Then run either the "Smart Search" or "Face Detection" task.
3. Inspect the container's running logs on the terminal.

Relevant log output

[immich-server]           | [Nest] 12  - 06/15/2024, 5:48:15 PM     LOG [Api:NestApplication] Nest application successfully started
[immich-server]           | [Nest] 12  - 06/15/2024, 5:48:15 PM     LOG [Api:Bootstrap] Immich Server is listening on http://[::1]:3001 [v1.106.4] [PRODUCTION]
[immich-server]           | [Nest] 12  - 06/15/2024, 5:48:28 PM     LOG [Api:EventRepository] Websocket Connect:    _U400IFfpBR3GxgxAAAB
[immich-machine-learning] | [06/15/24 09:52:16] INFO     Setting 'XLM-Roberta-Large-Vit-B-16Plus' execution
[immich-machine-learning] |                              providers to ['CUDAExecutionProvider',
[immich-machine-learning] |                              'CPUExecutionProvider'], in descending order of
[immich-machine-learning] |                              preference
[immich-machine-learning] | [06/15/24 09:52:16] INFO     Loading visual model
[immich-machine-learning] |                              'XLM-Roberta-Large-Vit-B-16Plus' to memory
[immich-machine-learning] | [06/15/24 09:52:22] ERROR    Worker (pid:5) was sent code 139!
[immich-server]           | [Nest] 2  - 06/15/2024, 5:52:22 PM   ERROR [Microservices:JobService] Unable to run job handler (smartSearch/smart-search): Error: Machine learning request to "http://immich-machine-learning:3003" failed with SocketError: other side closed
[immich-server]           | [Nest] 2  - 06/15/2024, 5:52:22 PM   ERROR [Microservices:JobService] Error: Machine learning request to "http://immich-machine-learning:3003" failed with SocketError: other side closed
[immich-server]           |     at /usr/src/app/dist/repositories/machine-learning.repository.js:19:19
[immich-server]           |     at async MachineLearningRepository.predict (/usr/src/app/dist/repositories/machine-learning.repository.js:18:21)
[immich-server]           |     at async MachineLearningRepository.encodeImage (/usr/src/app/dist/repositories/machine-learning.repository.js:42:26)
[immich-server]           |     at async SmartInfoService.handleEncodeClip (/usr/src/app/dist/services/smart-info.service.js:86:27)
[immich-server]           |     at async /usr/src/app/dist/services/job.service.js:148:36
[immich-server]           |     at async Worker.processJob (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:394:28)
[immich-server]           |     at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:581:24)
[immich-server]           | [Nest] 2  - 06/15/2024, 5:52:22 PM   ERROR [Microservices:JobService] Object:
[immich-server]           | {
[immich-server]           |   "id": "f77bc1f0-bdb9-4040-801a-37c7719e1423"
[immich-server]           | }
[immich-server]           |
[immich-server]           | [Nest] 2  - 06/15/2024, 5:52:22 PM   ERROR [Microservices:JobService] Unable to run job handler (smartSearch/smart-search): Error: Machine learning request to "http://immich-machine-learning:3003" failed with SocketError: other side closed
[immich-server]           | [Nest] 2  - 06/15/2024, 5:52:22 PM   ERROR [Microservices:JobService] Error: Machine learning request to "http://immich-machine-learning:3003" failed with SocketError: other side closed
[immich-server]           |     at /usr/src/app/dist/repositories/machine-learning.repository.js:19:19
[immich-server]           |     at async MachineLearningRepository.predict (/usr/src/app/dist/repositories/machine-learning.repository.js:18:21)
[immich-server]           |     at async MachineLearningRepository.encodeImage (/usr/src/app/dist/repositories/machine-learning.repository.js:42:26)
[immich-server]           |     at async SmartInfoService.handleEncodeClip (/usr/src/app/dist/services/smart-info.service.js:86:27)
[immich-server]           |     at async /usr/src/app/dist/services/job.service.js:148:36
[immich-server]           |     at async Worker.processJob (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:394:28)
[immich-server]           |     at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:581:24)
[immich-server]           | [Nest] 2  - 06/15/2024, 5:52:22 PM   ERROR [Microservices:JobService] Object:
[immich-server]           | {
[immich-server]           |   "id": "542539b8-9407-490a-b36e-124c7dfadcea"
[immich-server]           | }
[immich-server]           |
[immich-machine-learning] | [06/15/24 09:52:22] INFO     Booting worker with pid: 38
[immich-machine-learning] | [06/15/24 09:52:26] INFO     Started server process [38]
[immich-machine-learning] | [06/15/24 09:52:26] INFO     Waiting for application startup.
[immich-machine-learning] | [06/15/24 09:52:26] INFO     Created in-memory cache with unloading after 300s
[immich-machine-learning] |                              of inactivity.
[immich-machine-learning] | [06/15/24 09:52:26] INFO     Initialized request thread pool with 12 threads.
[immich-machine-learning] | [06/15/24 09:52:26] INFO     Application startup complete.
[immich-machine-learning] | [06/15/24 09:52:27] INFO     Setting 'XLM-Roberta-Large-Vit-B-16Plus' execution
[immich-machine-learning] |                              providers to ['CUDAExecutionProvider',
[immich-machine-learning] |                              'CPUExecutionProvider'], in descending order of
[immich-machine-learning] |                              preference
[immich-machine-learning] | [06/15/24 09:52:27] INFO     Loading visual model
[immich-machine-learning] |                              'XLM-Roberta-Large-Vit-B-16Plus' to memory
[immich-server]           | [Nest] 2  - 06/15/2024, 5:52:32 PM   ERROR [Microservices:JobService] Unable to run job handler (smartSearch/smart-search): Error: Machine learning request to "http://immich-machine-learning:3003" failed with SocketError: other side closed
[immich-server]           | [Nest] 2  - 06/15/2024, 5:52:32 PM   ERROR [Microservices:JobService] Error: Machine learning request to "http://immich-machine-learning:3003" failed with SocketError: other side closed
[immich-server]           |     at /usr/src/app/dist/repositories/machine-learning.repository.js:19:19
[immich-server]           |     at async MachineLearningRepository.predict (/usr/src/app/dist/repositories/machine-learning.repository.js:18:21)
[immich-server]           |     at async MachineLearningRepository.encodeImage (/usr/src/app/dist/repositories/machine-learning.repository.js:42:26)
[immich-server]           |     at async SmartInfoService.handleEncodeClip (/usr/src/app/dist/services/smart-info.service.js:86:27)
[immich-server]           |     at async /usr/src/app/dist/services/job.service.js:148:36
[immich-server]           |     at async Worker.processJob (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:394:28)
[immich-server]           |     at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:581:24)
[immich-server]           | [Nest] 2  - 06/15/2024, 5:52:32 PM   ERROR [Microservices:JobService] Object:
[immich-server]           | {
[immich-server]           |   "id": "731870bb-f8a0-4b89-8a46-8255f4fd0c43"
[immich-server]           | }
[immich-server]           |
[immich-machine-learning] | [06/15/24 09:52:32] INFO     Booting worker with pid: 69
[immich-machine-learning] | [06/15/24 09:52:36] INFO     Started server process [69]
[immich-machine-learning] | [06/15/24 09:52:36] INFO     Waiting for application startup.
[immich-machine-learning] | [06/15/24 09:52:36] INFO     Created in-memory cache with unloading after 300s
[immich-machine-learning] |                              of inactivity.
[immich-machine-learning] | [06/15/24 09:52:36] INFO     Initialized request thread pool with 12 threads.
[immich-machine-learning] | [06/15/24 09:52:36] INFO     Application startup complete.

Additional information

The prerequisites for using CUDA to accelerate machine learning tasks are satisfied. However, there is an issue when hardware acceleration is not being used (using the default configuration in docker-compose.yml) as no such problem arises.

❯ nvidia-container-cli info
NVRM version:   550.90.07
CUDA version:   12.4
@bo0tzz
Copy link
Member

bo0tzz commented Jun 15, 2024

@mertalev this is the second report I've seen of ML hwaccel on cuda failing with a segfault. Any chance we have a regression here?

@mertalev
Copy link
Contributor

We recently updated to ONNX Runtime 1.18.0. It could be a regression there.

@mertalev
Copy link
Contributor

Can you confirm if doing a search (with text in the context field) also causes a segmentation fault? Does face detection cause one? Trying to narrow down whether it's a model-specific behavior.

@Tql-ws1
Copy link
Author

Tql-ws1 commented Jun 17, 2024

Can you confirm if doing a search (with text in the context field) also causes a segmentation fault? Does face detection cause one? Trying to narrow down whether it's a model-specific behavior.

The context search and face detection features are working fine, but this is under the condition of solely using CPU for machine learning tasks. I believe it has nothing to do with the model; I've tried models like XLM-Roberta-Large-Vit-B-32, XLM-Roberta-Large-Vit-B-16Plus, and ViT-B-32__openai. If a machine learning task hasn't been successfully executed since creating an immich Docker instance, then I can't even use the context search feature at all.

Here is the log when I used the XLM-Roberta-Large-Vit-B-16Plus model for context search:

[immich-server]           | [Nest] 12  - 06/17/2024, 4:19:03 PM     LOG [Api:NestApplication] Nest application successfully started
[immich-server]           | [Nest] 12  - 06/17/2024, 4:19:03 PM     LOG [Api:Bootstrap] Immich Server is listening on http://[::1]:3001 [v1.106.4] [PRODUCTION]
[immich-server]           | [Nest] 12  - 06/17/2024, 4:19:04 PM     LOG [Api:EventRepository] Websocket Connect:    WmHRRT1FjXEXem1aAAAB
[immich-machine-learning] | [06/17/24 08:19:04] INFO     Started server process [5]
[immich-machine-learning] | [06/17/24 08:19:04] INFO     Waiting for application startup.
[immich-machine-learning] | [06/17/24 08:19:04] INFO     Created in-memory cache with unloading after 300s
[immich-machine-learning] |                              of inactivity.
[immich-machine-learning] | [06/17/24 08:19:04] INFO     Initialized request thread pool with 12 threads.
[immich-machine-learning] | [06/17/24 08:19:04] INFO     Application startup complete.
[immich-server]           | [Nest] 12  - 06/17/2024, 4:19:58 PM     LOG [Api:EventRepository] Websocket Disconnect: WmHRRT1FjXEXem1aAAAB
[immich-server]           | [Nest] 12  - 06/17/2024, 4:19:59 PM     LOG [Api:EventRepository] Websocket Connect:    OmwWXmo9PSh-SNSBAAAD
[immich-machine-learning] | [06/17/24 08:27:46] INFO     Setting 'XLM-Roberta-Large-Vit-B-16Plus' execution
[immich-machine-learning] |                              providers to ['CUDAExecutionProvider',
[immich-machine-learning] |                              'CPUExecutionProvider'], in descending order of
[immich-machine-learning] |                              preference
[immich-machine-learning] | [06/17/24 08:27:46] INFO     Loading textual model
[immich-machine-learning] |                              'XLM-Roberta-Large-Vit-B-16Plus' to memory
[immich-server]           | [Nest] 12  - 06/17/2024, 4:27:52 PM   ERROR [Api:Error: Machine learning request to "http://immich-machine-learning:3003" failed with SocketError: other side closed
[immich-server]           |     at /usr/src/app/dist/repositories/machine-learning.repository.js:19:19
[immich-server]           |     at async MachineLearningRepository.predict (/usr/src/app/dist/repositories/machine-learning.repository.js:18:21)
[immich-server]           |     at async MachineLearningRepository.encodeText (/usr/src/app/dist/repositories/machine-learning.repository.js:47:26)
[immich-server]           |     at async SearchService.searchSmart (/usr/src/app/dist/services/search.service.js:96:27)~xnsl29o7] Failed to search smart
[immich-server]           | [Nest] 12  - 06/17/2024, 4:27:52 PM   ERROR [Api:Error: Machine learning request to "http://immich-machine-learning:3003" failed with SocketError: other side closed
[immich-server]           |     at /usr/src/app/dist/repositories/machine-learning.repository.js:19:19
[immich-server]           |     at async MachineLearningRepository.predict (/usr/src/app/dist/repositories/machine-learning.repository.js:18:21)
[immich-server]           |     at async MachineLearningRepository.encodeText (/usr/src/app/dist/repositories/machine-learning.repository.js:47:26)
[immich-server]           |     at async SearchService.searchSmart (/usr/src/app/dist/services/search.service.js:96:27)~xnsl29o7] Error: Machine learning request to "http://immich-machine-learning:3003" failed with SocketError: other side closed
[immich-machine-learning] | [06/17/24 08:27:52] ERROR    Worker (pid:5) was sent code 139!
[immich-machine-learning] | [06/17/24 08:27:52] INFO     Booting worker with pid: 38
[immich-machine-learning] | [06/17/24 08:27:57] INFO     Started server process [38]
[immich-machine-learning] | [06/17/24 08:27:57] INFO     Waiting for application startup.
[immich-machine-learning] | [06/17/24 08:27:57] INFO     Created in-memory cache with unloading after 300s
[immich-machine-learning] |                              of inactivity.
[immich-machine-learning] | [06/17/24 08:27:57] INFO     Initialized request thread pool with 12 threads.
[immich-machine-learning] | [06/17/24 08:27:57] INFO     Application startup complete.

ViT-B-32__openai:

[immich-server]           | [Nest] 12  - 06/17/2024, 4:31:47 PM     LOG [Api:SystemConfigService~1z3bk5aw] LogLevel=log (set via system config)
[immich-server]           | [Nest] 12  - 06/17/2024, 4:31:47 PM     LOG [Api:SystemConfigService~1z3bk5aw] LogLevel=log (set via system config)
[immich-server]           | [Nest] 2  - 06/17/2024, 4:31:47 PM     LOG [Microservices:SystemConfigService] LogLevel=log (set via system config)
[immich-server]           | [Nest] 2  - 06/17/2024, 4:31:47 PM     LOG [Microservices:MapRepository] Initializing metadata repository
[immich-server]           | [Nest] 2  - 06/17/2024, 4:31:47 PM     LOG [Microservices:MetadataService] Initialized local reverse geocoder
[immich-server]           | [Nest] 12  - 06/17/2024, 4:31:47 PM     LOG [Api:SearchRepository~1z3bk5aw] Dimension size of model ViT-B-32__openai is 512, but database expects 640.
[immich-server]           | [Nest] 12  - 06/17/2024, 4:31:47 PM     LOG [Api:SearchRepository~1z3bk5aw] Updating database CLIP dimension size to 512.
[immich-server]           | [Nest] 12  - 06/17/2024, 4:31:49 PM     LOG [Api:SearchRepository~1z3bk5aw] Successfully updated database CLIP dimension size from 640 to 512.
[immich-machine-learning] | [06/17/24 08:31:53] INFO     Setting 'ViT-B-32__openai' execution providers to
[immich-machine-learning] |                              ['CUDAExecutionProvider', 'CPUExecutionProvider'],
[immich-machine-learning] |                              in descending order of preference
[immich-machine-learning] | [06/17/24 08:31:53] INFO     Loading textual model 'ViT-B-32__openai' to memory
[immich-server]           | [Nest] 12  - 06/17/2024, 4:32:03 PM   ERROR [Api:Error: Machine learning request to "http://immich-machine-learning:3003" failed with SocketError: other side closed
[immich-server]           |     at /usr/src/app/dist/repositories/machine-learning.repository.js:19:19
[immich-server]           |     at async MachineLearningRepository.predict (/usr/src/app/dist/repositories/machine-learning.repository.js:18:21)
[immich-server]           |     at async MachineLearningRepository.encodeText (/usr/src/app/dist/repositories/machine-learning.repository.js:47:26)
[immich-server]           |     at async SearchService.searchSmart (/usr/src/app/dist/services/search.service.js:96:27)~1o4h80vn] Failed to search smart
[immich-server]           | [Nest] 12  - 06/17/2024, 4:32:03 PM   ERROR [Api:Error: Machine learning request to "http://immich-machine-learning:3003" failed with SocketError: other side closed
[immich-server]           |     at /usr/src/app/dist/repositories/machine-learning.repository.js:19:19
[immich-server]           |     at async MachineLearningRepository.predict (/usr/src/app/dist/repositories/machine-learning.repository.js:18:21)
[immich-server]           |     at async MachineLearningRepository.encodeText (/usr/src/app/dist/repositories/machine-learning.repository.js:47:26)
[immich-server]           |     at async SearchService.searchSmart (/usr/src/app/dist/services/search.service.js:96:27)~1o4h80vn] Error: Machine learning request to "http://immich-machine-learning:3003" failed with SocketError: other side closed
[immich-machine-learning] | [06/17/24 08:32:03] ERROR    Worker (pid:38) was sent code 139!
[immich-machine-learning] | [06/17/24 08:32:03] INFO     Booting worker with pid: 69
[immich-machine-learning] | [06/17/24 08:32:06] INFO     Started server process [69]
[immich-machine-learning] | [06/17/24 08:32:06] INFO     Waiting for application startup.
[immich-machine-learning] | [06/17/24 08:32:06] INFO     Created in-memory cache with unloading after 300s
[immich-machine-learning] |                              of inactivity.
[immich-machine-learning] | [06/17/24 08:32:06] INFO     Initialized request thread pool with 12 threads.
[immich-machine-learning] | [06/17/24 08:32:06] INFO     Application startup complete.

Facial detection, buffalo_l model:

[immich-machine-learning] | [06/17/24 08:32:03] ERROR    Worker (pid:38) was sent code 139!
[immich-machine-learning] | [06/17/24 08:32:03] INFO     Booting worker with pid: 69
[immich-machine-learning] | [06/17/24 08:32:06] INFO     Started server process [69]
[immich-machine-learning] | [06/17/24 08:32:06] INFO     Waiting for application startup.
[immich-machine-learning] | [06/17/24 08:32:06] INFO     Created in-memory cache with unloading after 300s
[immich-machine-learning] |                              of inactivity.
[immich-machine-learning] | [06/17/24 08:32:06] INFO     Initialized request thread pool with 12 threads.
[immich-machine-learning] | [06/17/24 08:32:06] INFO     Application startup complete.
[immich-machine-learning] | [06/17/24 08:33:35] INFO     Setting 'buffalo_l' execution providers to
[immich-machine-learning] |                              ['CUDAExecutionProvider', 'CPUExecutionProvider'],
[immich-machine-learning] |                              in descending order of preference
[immich-machine-learning] | [06/17/24 08:33:35] INFO     Loading detection model 'buffalo_l' to memory
[immich-machine-learning] | [06/17/24 08:33:39] ERROR    Worker (pid:69) was sent code 139!
[immich-server]           | [Nest] 2  - 06/17/2024, 4:33:39 PM   ERROR [Microservices:JobService] Unable to run job handler (faceDetection/face-detection): Error:
[immich-server]           | [Nest] 2  - 06/17/2024, 4:33:39 PM   ERROR [Microservices:JobService] Error: Machine learning request to "http://immich-machine-learni
[immich-server]           |     at /usr/src/app/dist/repositories/machine-learning.repository.js:19:19
[immich-server]           |     at async MachineLearningRepository.predict (/usr/src/app/dist/repositories/machine-learning.repository.js:18:21)
[immich-server]           |     at async MachineLearningRepository.detectFaces (/usr/src/app/dist/repositories/machine-learning.repository.js:33:26)
[immich-server]           |     at async PersonService.handleDetectFaces (/usr/src/app/dist/services/person.service.js:274:52)
[immich-server]           |     at async /usr/src/app/dist/services/job.service.js:148:36
[immich-server]           |     at async Worker.processJob (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:394:28)
[immich-server]           |     at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:581:24)
[immich-server]           | [Nest] 2  - 06/17/2024, 4:33:39 PM   ERROR [Microservices:JobService] Object:
[immich-server]           | {
[immich-server]           |   "id": "4aed4c9a-5fe8-4c23-af7a-bf6daedb5a08"
[immich-server]           | }
[immich-server]           |
[immich-server]           | [Nest] 2  - 06/17/2024, 4:33:39 PM   ERROR [Microservices:JobService] Unable to run job handler (faceDetection/face-detection): Error:
[immich-server]           | [Nest] 2  - 06/17/2024, 4:33:39 PM   ERROR [Microservices:JobService] Error: Machine learning request to "http://immich-machine-learni
[immich-server]           |     at /usr/src/app/dist/repositories/machine-learning.repository.js:19:19
[immich-server]           |     at async MachineLearningRepository.predict (/usr/src/app/dist/repositories/machine-learning.repository.js:18:21)
[immich-server]           |     at async MachineLearningRepository.detectFaces (/usr/src/app/dist/repositories/machine-learning.repository.js:33:26)
[immich-server]           |     at async PersonService.handleDetectFaces (/usr/src/app/dist/services/person.service.js:274:52)
[immich-server]           |     at async /usr/src/app/dist/services/job.service.js:148:36
[immich-server]           |     at async Worker.processJob (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:394:28)
[immich-server]           |     at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:581:24)
[immich-server]           | [Nest] 2  - 06/17/2024, 4:33:39 PM   ERROR [Microservices:JobService] Object:
[immich-server]           | {
[immich-server]           |   "id": "4757da8c-eb74-4bcd-bfca-bceb34dc4f31"
[immich-server]           | }
[immich-server]           |
[immich-machine-learning] | [06/17/24 08:33:39] INFO     Booting worker with pid: 102
[immich-machine-learning] | [06/17/24 08:33:43] INFO     Started server process [102]
[immich-machine-learning] | [06/17/24 08:33:43] INFO     Waiting for application startup.
[immich-machine-learning] | [06/17/24 08:33:43] INFO     Created in-memory cache with unloading after 300s
[immich-machine-learning] |                              of inactivity.
[immich-machine-learning] | [06/17/24 08:33:43] INFO     Initialized request thread pool with 12 threads.
[immich-machine-learning] | [06/17/24 08:33:43] INFO     Application startup complete.
[immich-machine-learning] | [06/17/24 08:33:43] INFO     Setting 'buffalo_l' execution providers to
[immich-machine-learning] |                              ['CUDAExecutionProvider', 'CPUExecutionProvider'],
[immich-machine-learning] |                              in descending order of preference
[immich-machine-learning] | [06/17/24 08:33:43] INFO     Loading detection model 'buffalo_l' to memory
[immich-server]           | [Nest] 2  - 06/17/2024, 4:33:47 PM   ERROR [Microservices:JobService] Unable to run job handler (faceDetection/face-detection): Error:
[immich-machine-learning] | [06/17/24 08:33:47] ERROR    Worker (pid:102) was sent code 139!
[immich-server]           | [Nest] 2  - 06/17/2024, 4:33:47 PM   ERROR [Microservices:JobService] Error: Machine learning request to "http://immich-machine-learni
[immich-server]           |     at /usr/src/app/dist/repositories/machine-learning.repository.js:19:19
[immich-server]           |     at async MachineLearningRepository.predict (/usr/src/app/dist/repositories/machine-learning.repository.js:18:21)
[immich-server]           |     at async MachineLearningRepository.detectFaces (/usr/src/app/dist/repositories/machine-learning.repository.js:33:26)
[immich-server]           |     at async PersonService.handleDetectFaces (/usr/src/app/dist/services/person.service.js:274:52)
[immich-server]           |     at async /usr/src/app/dist/services/job.service.js:148:36
[immich-server]           |     at async Worker.processJob (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:394:28)
[immich-server]           |     at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:581:24)
[immich-server]           | [Nest] 2  - 06/17/2024, 4:33:47 PM   ERROR [Microservices:JobService] Object:
[immich-server]           | {
[immich-server]           |   "id": "1cef40dc-88e1-4bd6-9fcd-35800a59fb04"
[immich-server]           | }
[immich-server]           |
[immich-server]           | [Nest] 2  - 06/17/2024, 4:33:47 PM   ERROR [Microservices:JobService] Unable to run job handler (faceDetection/face-detection): Error:
[immich-server]           | [Nest] 2  - 06/17/2024, 4:33:47 PM   ERROR [Microservices:JobService] Error: Machine learning request to "http://immich-machine-learni
[immich-server]           |     at /usr/src/app/dist/repositories/machine-learning.repository.js:19:19
[immich-server]           |     at async MachineLearningRepository.predict (/usr/src/app/dist/repositories/machine-learning.repository.js:18:21)
[immich-server]           |     at async MachineLearningRepository.detectFaces (/usr/src/app/dist/repositories/machine-learning.repository.js:33:26)
[immich-server]           |     at async PersonService.handleDetectFaces (/usr/src/app/dist/services/person.service.js:274:52)
[immich-server]           |     at async /usr/src/app/dist/services/job.service.js:148:36
[immich-server]           |     at async Worker.processJob (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:394:28)
[immich-server]           |     at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:581:24)
[immich-server]           | [Nest] 2  - 06/17/2024, 4:33:47 PM   ERROR [Microservices:JobService] Object:
[immich-server]           | {
[immich-server]           |   "id": "ea4214f3-c3a0-476a-8dbf-fd5b4ed84b67"
[immich-server]           | }
[immich-server]           |
[immich-server]           | [Nest] 2  - 06/17/2024, 4:33:47 PM   ERROR [Microservices:JobService] Unable to run job handler (faceDetection/face-detection): Error: Machine learning request to "http://immich-machine-learning:3003" failed with SocketError: other side closed
[immich-server]           | [Nest] 2  - 06/17/2024, 4:33:47 PM   ERROR [Microservices:JobService] Error: Machine learning request to "http://immich-machine-learning:3003" failed with SocketError: other side closed
[immich-server]           |     at /usr/src/app/dist/repositories/machine-learning.repository.js:19:19
[immich-server]           |     at async MachineLearningRepository.predict (/usr/src/app/dist/repositories/machine-learning.repository.js:18:21)
[immich-server]           |     at async MachineLearningRepository.detectFaces (/usr/src/app/dist/repositories/machine-learning.repository.js:33:26)
[immich-server]           |     at async PersonService.handleDetectFaces (/usr/src/app/dist/services/person.service.js:274:52)
[immich-server]           |     at async /usr/src/app/dist/services/job.service.js:148:36
[immich-server]           |     at async Worker.processJob (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:394:28)
[immich-server]           |     at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:581:24)
[immich-server]           | [Nest] 2  - 06/17/2024, 4:33:47 PM   ERROR [Microservices:JobService] Object:
[immich-server]           | {
[immich-server]           |   "id": "61c97d5a-f556-439e-a27e-64b5749d4d2f"
[immich-server]           | }
[immich-server]           |
[immich-machine-learning] | [06/17/24 08:33:47] INFO     Booting worker with pid: 137
[immich-machine-learning] | [06/17/24 08:33:51] INFO     Started server process [137]
[immich-machine-learning] | [06/17/24 08:33:51] INFO     Waiting for application startup.
[immich-machine-learning] | [06/17/24 08:33:51] INFO     Created in-memory cache with unloading after 300s
[immich-machine-learning] |                              of inactivity.
[immich-machine-learning] | [06/17/24 08:33:51] INFO     Initialized request thread pool with 12 threads.
[immich-machine-learning] | [06/17/24 08:33:51] INFO     Application startup complete.

antelopev2:

[immich-machine-learning] | [06/17/24 08:33:51] INFO     Created in-memory cache with unloading after 300s
[immich-machine-learning] |                              of inactivity.
[immich-machine-learning] | [06/17/24 08:33:51] INFO     Initialized request thread pool with 12 threads.
[immich-machine-learning] | [06/17/24 08:33:51] INFO     Application startup complete.
[immich-server]           | [Nest] 12  - 06/17/2024, 4:36:05 PM     LOG [Api:SystemConfigService~sjhnnx0h] LogLevel=log (set via system config)
[immich-server]           | [Nest] 12  - 06/17/2024, 4:36:05 PM     LOG [Api:SystemConfigService~sjhnnx0h] LogLevel=log (set via system config)
[immich-server]           | [Nest] 2  - 06/17/2024, 4:36:05 PM     LOG [Microservices:SystemConfigService] LogLevel=log (set via system config)
[immich-server]           | [Nest] 2  - 06/17/2024, 4:36:05 PM     LOG [Microservices:MapRepository] Initializing metadata repository
[immich-server]           | [Nest] 2  - 06/17/2024, 4:36:05 PM     LOG [Microservices:MetadataService] Initialized local reverse geocoder
[immich-machine-learning] | [06/17/24 08:36:47] INFO     Setting 'antelopev2' execution providers to
[immich-machine-learning] |                              ['CUDAExecutionProvider', 'CPUExecutionProvider'],
[immich-machine-learning] |                              in descending order of preference
[immich-machine-learning] | [06/17/24 08:36:47] INFO     Loading detection model 'antelopev2' to memory
[immich-server]           | [Nest] 2  - 06/17/2024, 4:36:51 PM   ERROR [Microservices:JobService] Unable to run job handler (faceDetection/face-detection): Error:
[immich-server]           | [Nest] 2  - 06/17/2024, 4:36:51 PM   ERROR [Microservices:JobService] Error: Machine learning request to "http://immich-machine-learni
[immich-server]           |     at /usr/src/app/dist/repositories/machine-learning.repository.js:19:19
[immich-server]           |     at async MachineLearningRepository.predict (/usr/src/app/dist/repositories/machine-learning.repository.js:18:21)
[immich-server]           |     at async MachineLearningRepository.detectFaces (/usr/src/app/dist/repositories/machine-learning.repository.js:33:26)
[immich-server]           |     at async PersonService.handleDetectFaces (/usr/src/app/dist/services/person.service.js:274:52)
[immich-server]           |     at async /usr/src/app/dist/services/job.service.js:148:36
[immich-server]           |     at async Worker.processJob (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:394:28)
[immich-server]           |     at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:581:24)
[immich-server]           | [Nest] 2  - 06/17/2024, 4:36:51 PM   ERROR [Microservices:JobService] Object:
[immich-server]           | {
[immich-server]           |   "id": "4aed4c9a-5fe8-4c23-af7a-bf6daedb5a08"
[immich-server]           | }
[immich-server]           |
[immich-machine-learning] | [06/17/24 08:36:51] ERROR    Worker (pid:137) was sent code 139!
[immich-server]           | [Nest] 2  - 06/17/2024, 4:36:51 PM   ERROR [Microservices:JobService] Unable to run job handler (faceDetection/face-detection): Error:
[immich-server]           | [Nest] 2  - 06/17/2024, 4:36:51 PM   ERROR [Microservices:JobService] Error: Machine learning request to "http://immich-machine-learni
[immich-server]           |     at /usr/src/app/dist/repositories/machine-learning.repository.js:19:19
[immich-server]           |     at async MachineLearningRepository.predict (/usr/src/app/dist/repositories/machine-learning.repository.js:18:21)
[immich-server]           |     at async MachineLearningRepository.detectFaces (/usr/src/app/dist/repositories/machine-learning.repository.js:33:26)
[immich-server]           |     at async PersonService.handleDetectFaces (/usr/src/app/dist/services/person.service.js:274:52)
[immich-server]           |     at async /usr/src/app/dist/services/job.service.js:148:36
[immich-server]           |     at async Worker.processJob (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:394:28)
[immich-server]           |     at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:581:24)
[immich-server]           | [Nest] 2  - 06/17/2024, 4:36:51 PM   ERROR [Microservices:JobService] Object:
[immich-server]           | {
[immich-server]           |   "id": "4757da8c-eb74-4bcd-bfca-bceb34dc4f31"
[immich-server]           | }
[immich-server]           |
[immich-machine-learning] | [06/17/24 08:36:51] INFO     Booting worker with pid: 170
[immich-machine-learning] | [06/17/24 08:36:55] INFO     Started server process [170]
[immich-machine-learning] | [06/17/24 08:36:55] INFO     Waiting for application startup.
[immich-machine-learning] | [06/17/24 08:36:55] INFO     Created in-memory cache with unloading after 300s
[immich-machine-learning] |                              of inactivity.
[immich-machine-learning] | [06/17/24 08:36:55] INFO     Initialized request thread pool with 12 threads.
[immich-machine-learning] | [06/17/24 08:36:55] INFO     Application startup complete.
[immich-machine-learning] | [06/17/24 08:36:55] INFO     Setting 'antelopev2' execution providers to
[immich-machine-learning] |                              ['CUDAExecutionProvider', 'CPUExecutionProvider'],
[immich-machine-learning] |                              in descending order of preference
[immich-machine-learning] | [06/17/24 08:36:55] INFO     Loading detection model 'antelopev2' to memory
[immich-server]           | [Nest] 2  - 06/17/2024, 4:36:59 PM   ERROR [Microservices:JobService] Unable to run job handler (faceDetection/face-detection): Error: Machine learning request to "http://immich-machine-learning:3003" failed with SocketError: other side closed
[immich-server]           | [Nest] 2  - 06/17/2024, 4:36:59 PM   ERROR [Microservices:JobService] Error: Machine learning request to "http://immich-machine-learning:3003" failed with SocketError: other side closed
[immich-server]           |     at /usr/src/app/dist/repositories/machine-learning.repository.js:19:19
[immich-server]           |     at async MachineLearningRepository.predict (/usr/src/app/dist/repositories/machine-learning.repository.js:18:21)
[immich-server]           |     at async MachineLearningRepository.detectFaces (/usr/src/app/dist/repositories/machine-learning.repository.js:33:26)
[immich-server]           |     at async PersonService.handleDetectFaces (/usr/src/app/dist/services/person.service.js:274:52)
[immich-server]           |     at async /usr/src/app/dist/services/job.service.js:148:36
[immich-server]           |     at async Worker.processJob (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:394:28)
[immich-server]           |     at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:581:24)
[immich-server]           | [Nest] 2  - 06/17/2024, 4:36:59 PM   ERROR [Microservices:JobService] Object:
[immich-server]           | {
[immich-server]           |   "id": "1cef40dc-88e1-4bd6-9fcd-35800a59fb04"
[immich-server]           | }
[immich-server]           |
[immich-machine-learning] | [06/17/24 08:36:59] ERROR    Worker (pid:170) was sent code 139!
[immich-machine-learning] | [06/17/24 08:36:59] INFO     Booting worker with pid: 201
[immich-machine-learning] | [06/17/24 08:37:03] INFO     Started server process [201]
[immich-machine-learning] | [06/17/24 08:37:03] INFO     Waiting for application startup.
[immich-machine-learning] | [06/17/24 08:37:03] INFO     Created in-memory cache with unloading after 300s
[immich-machine-learning] |                              of inactivity.
[immich-machine-learning] | [06/17/24 08:37:03] INFO     Initialized request thread pool with 12 threads.
[immich-machine-learning] | [06/17/24 08:37:03] INFO     Application startup complete.

@Tql-ws1
Copy link
Author

Tql-ws1 commented Jun 24, 2024

I suspect it's an issue with the podman-compose here. This problem was supposedly fixed in commit 79865c2, but the latest version of podman-compose hasn't incorporated this fix yet (as can be seen from the merge time of the PR and the release time of podman-compose v1.1.0).
If that's indeed the case, I'm sorry for taking up your valuable time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants