Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dragonfly preheat not speeding up subsequent image load; errors reported #3674

Open
amholler opened this issue Nov 27, 2024 · 8 comments
Open
Assignees
Labels

Comments

@amholler
Copy link

amholler commented Nov 27, 2024

Bug report:

On a GKE regional cluster comprising 3 e2-standard-16 nodes, I successfully installed dragonfly via:
helm install --create-namespace --namespace dragonfly-system dragonfly dragonfly/dragonfly --version 1.2.24 -f values.upd.yaml

where values.upd.yaml contained:
manager:
  metrics:
    enable: true
  config:
    verbose: true
    pprofPort: 18066
  resources:
    requests:
      cpu: "1"
      memory: "2Gi"
    limits:
      cpu: "1"
      memory: "2Gi"

scheduler:
  metrics:
    enable: true
  config:
    verbose: true
    pprofPort: 18066
  resources:
    requests:
      cpu: "1"
      memory: "2Gi"
    limits:
      cpu: "1"
      memory: "2Gi"

seedClient:
  metrics:
    enable: true
  config:
    verbose: true
  resources:
    requests:
      cpu: "2"
      memory: "12Gi"
    limits:
      cpu: "2"
      memory: "12Gi"

client:
  metrics:
    enable: true
  config:
    verbose: true
  dfinit:
    enable: true
    config:
      download:
        rateLimit: 10GiB
        concurrentPieceCount: 16
      upload:
        rateLimit: 10GiB
  resources:
    requests:
      cpu: "2"
      memory: "12Gi"
    limits:
      cpu: "2"
      memory: "12Gi"

I created a token and then ran the following all_peers preheat job, which completed successfully:

curl --location --request POST 'http://127.0.0.1:8080/oapi/v1/jobs' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer Njg4NDRhZGYtYmY0ZS00NmIwLTk5MWQtZjIwM2U1MjBjNGM1' \
--data-raw '{
    "type": "preheat",
    "args": {
        "type": "image",
        "url": "https://index.docker.io/v2/rayproject/ray-ml/manifests/2.33.0.914af0-py311",
        "scope": "all_peers"
    }
}'

{"id":1,"created_at":"2024-11-27T18:33:17Z","updated_at":"2024-11-27T18:36:05Z","is_del":0,"task_id":"group_e2a72159-58dc-4a90-acd1-8b7aa1ca5e71","bio":"","type":"preheat","state":"SUCCESS","args":{"concurrent_count":50,"filtered_query_params":"X-Amz-Algorithm\u0026X-Amz-Credential\u0026X-Amz-Date\u0026X-Amz-Expires\u0026X-Amz-SignedHeaders\u0026X-Amz-Signature\u0026X-Amz-Security-Token\u0026X-Amz-User-Agent\u0026X-Goog-Algorithm\u0026X-Goog-Credential\u0026X-Goog-Date\u0026X-Goog-Expires\u0026X-Goog-SignedHeaders\u0026X-Goog-Signature\u0026OSSAccessKeyId\u0026Expires\u0026Signature\u0026SecurityToken\u0026AccessKeyId\u0026Signature\u0026Expires\u0026X-Obs-Date\u0026X-Obs-Security-Token\u0026q-sign-algorithm\u0026q-ak\u0026q-sign-time\u0026q-key-time\u0026q-header-list\u0026q-url-param-list\u0026q-signature\u0026x-cos-security-token\u0026ns","headers":null,"password":"","platform":"","scope":"all_peers","tag":"","timeout":1800000000000,"type":"image","url":"https://index.docker.io/v2/rayproject/ray-ml/manifests/2.33.0.914af0-py311","username":""},"result":{"created_at":"2024-11-27T18:33:17.843257441Z","group_uuid":"group_e2a72159-58dc-4a90-acd1-8b7aa1ca5e71","job_states":[{"created_at":"2024-11-27T18:33:17.843257441Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:61feeac8814edc3f406d9a29a2ee0c3ed71ea2f3c60d3bace48950c1ff588742"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:61feeac8814edc3f406d9a29a2ee0c3ed71ea2f3c60d3bace48950c1ff588742"},{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:61feeac8814edc3f406d9a29a2ee0c3ed71ea2f3c60d3bace48950c1ff588742"},{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:61feeac8814edc3f406d9a29a2ee0c3ed71ea2f3c60d3bace48950c1ff588742"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:61feeac8814edc3f406d9a29a2ee0c3ed71ea2f3c60d3bace48950c1ff588742"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:61feeac8814edc3f406d9a29a2ee0c3ed71ea2f3c60d3bace48950c1ff588742"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_483d138e-68a2-43f4-b9e5-e7676758723a","ttl":0},{"created_at":"2024-11-27T18:33:17.844129378Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:43f89b94cd7df92a2f7e565b8fb1b7f502eff2cd225508cbd7ea2d36a9a3a601"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:43f89b94cd7df92a2f7e565b8fb1b7f502eff2cd225508cbd7ea2d36a9a3a601"},{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:43f89b94cd7df92a2f7e565b8fb1b7f502eff2cd225508cbd7ea2d36a9a3a601"},{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:43f89b94cd7df92a2f7e565b8fb1b7f502eff2cd225508cbd7ea2d36a9a3a601"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:43f89b94cd7df92a2f7e565b8fb1b7f502eff2cd225508cbd7ea2d36a9a3a601"},{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:43f89b94cd7df92a2f7e565b8fb1b7f502eff2cd225508cbd7ea2d36a9a3a601"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_1ba2f78b-e6d3-4db5-96cb-e2693a3fd2b7","ttl":0},{"created_at":"2024-11-27T18:33:17.844666533Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:5e3b7ee7738140e8f4608c3945b6e1ed4f9fb75db53a04e19ba0a6661e7cc4fe"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:5e3b7ee7738140e8f4608c3945b6e1ed4f9fb75db53a04e19ba0a6661e7cc4fe"},{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:5e3b7ee7738140e8f4608c3945b6e1ed4f9fb75db53a04e19ba0a6661e7cc4fe"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:5e3b7ee7738140e8f4608c3945b6e1ed4f9fb75db53a04e19ba0a6661e7cc4fe"},{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:5e3b7ee7738140e8f4608c3945b6e1ed4f9fb75db53a04e19ba0a6661e7cc4fe"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:5e3b7ee7738140e8f4608c3945b6e1ed4f9fb75db53a04e19ba0a6661e7cc4fe"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_0f9bb343-632f-4f07-9ab4-af22eafab36b","ttl":0},{"created_at":"2024-11-27T18:33:17.845247255Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:5bd037f007fdda13ae5a5f43a199d6677db1f9059c2980c84726e3a43fab169a"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:5bd037f007fdda13ae5a5f43a199d6677db1f9059c2980c84726e3a43fab169a"},{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:5bd037f007fdda13ae5a5f43a199d6677db1f9059c2980c84726e3a43fab169a"},{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:5bd037f007fdda13ae5a5f43a199d6677db1f9059c2980c84726e3a43fab169a"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:5bd037f007fdda13ae5a5f43a199d6677db1f9059c2980c84726e3a43fab169a"},{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:5bd037f007fdda13ae5a5f43a199d6677db1f9059c2980c84726e3a43fab169a"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_ff963648-0402-4a18-9ce6-41cc26b7d610","ttl":0},{"created_at":"2024-11-27T18:33:17.845737521Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:4cda774ad2ecef28c9a1cd97594f7199071c83769f91c5d109eb1cb6770ecdff"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:4cda774ad2ecef28c9a1cd97594f7199071c83769f91c5d109eb1cb6770ecdff"},{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:4cda774ad2ecef28c9a1cd97594f7199071c83769f91c5d109eb1cb6770ecdff"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:4cda774ad2ecef28c9a1cd97594f7199071c83769f91c5d109eb1cb6770ecdff"},{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:4cda774ad2ecef28c9a1cd97594f7199071c83769f91c5d109eb1cb6770ecdff"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:4cda774ad2ecef28c9a1cd97594f7199071c83769f91c5d109eb1cb6770ecdff"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_a876cdad-ece2-4768-af3a-221c1006c71b","ttl":0},{"created_at":"2024-11-27T18:33:17.846171112Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:775f22adee620daec0db645bad7027db4c1ecf22520412e1b2466fc73d54d19b"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:775f22adee620daec0db645bad7027db4c1ecf22520412e1b2466fc73d54d19b"},{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:775f22adee620daec0db645bad7027db4c1ecf22520412e1b2466fc73d54d19b"},{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:775f22adee620daec0db645bad7027db4c1ecf22520412e1b2466fc73d54d19b"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:775f22adee620daec0db645bad7027db4c1ecf22520412e1b2466fc73d54d19b"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:775f22adee620daec0db645bad7027db4c1ecf22520412e1b2466fc73d54d19b"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_8562a75f-8727-47a0-aa79-6c39fd783612","ttl":0},{"created_at":"2024-11-27T18:33:17.846592178Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:263fc748118f7937f811e3e9c9355318db07dd2dd1dccc370dadaa7d0b5ed692"},{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:263fc748118f7937f811e3e9c9355318db07dd2dd1dccc370dadaa7d0b5ed692"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:263fc748118f7937f811e3e9c9355318db07dd2dd1dccc370dadaa7d0b5ed692"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:263fc748118f7937f811e3e9c9355318db07dd2dd1dccc370dadaa7d0b5ed692"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:263fc748118f7937f811e3e9c9355318db07dd2dd1dccc370dadaa7d0b5ed692"},{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:263fc748118f7937f811e3e9c9355318db07dd2dd1dccc370dadaa7d0b5ed692"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_c29ca16d-bfbf-4b5b-b0c3-2634dab32cc2","ttl":0},{"created_at":"2024-11-27T18:33:17.84709593Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:16c36d0187d03bd0de84d870ded86c45fabd78f4bfdb2ed90177e5fc4dd33d11"},{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:16c36d0187d03bd0de84d870ded86c45fabd78f4bfdb2ed90177e5fc4dd33d11"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:16c36d0187d03bd0de84d870ded86c45fabd78f4bfdb2ed90177e5fc4dd33d11"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:16c36d0187d03bd0de84d870ded86c45fabd78f4bfdb2ed90177e5fc4dd33d11"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:16c36d0187d03bd0de84d870ded86c45fabd78f4bfdb2ed90177e5fc4dd33d11"},{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:16c36d0187d03bd0de84d870ded86c45fabd78f4bfdb2ed90177e5fc4dd33d11"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_c2660534-1bd9-4d68-9272-add419776cf4","ttl":0},{"created_at":"2024-11-27T18:33:17.848092386Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:e7a56570655c990ecc804c77873efc83f9a6c31064e3e8a5dc02430213f2d74c"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:e7a56570655c990ecc804c77873efc83f9a6c31064e3e8a5dc02430213f2d74c"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:e7a56570655c990ecc804c77873efc83f9a6c31064e3e8a5dc02430213f2d74c"},{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:e7a56570655c990ecc804c77873efc83f9a6c31064e3e8a5dc02430213f2d74c"},{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:e7a56570655c990ecc804c77873efc83f9a6c31064e3e8a5dc02430213f2d74c"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:e7a56570655c990ecc804c77873efc83f9a6c31064e3e8a5dc02430213f2d74c"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_e3b209e8-7ac6-498a-9395-d6a83106e2b2","ttl":0},{"created_at":"2024-11-27T18:33:17.848490128Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:507fc9045cbad45c1c4ca554a6453fe0a1c9ae74667db0612fec7475256d5c23"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:507fc9045cbad45c1c4ca554a6453fe0a1c9ae74667db0612fec7475256d5c23"},{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:507fc9045cbad45c1c4ca554a6453fe0a1c9ae74667db0612fec7475256d5c23"},{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:507fc9045cbad45c1c4ca554a6453fe0a1c9ae74667db0612fec7475256d5c23"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:507fc9045cbad45c1c4ca554a6453fe0a1c9ae74667db0612fec7475256d5c23"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:507fc9045cbad45c1c4ca554a6453fe0a1c9ae74667db0612fec7475256d5c23"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_561899d6-9538-4e6c-968a-de630004fd57","ttl":0},{"created_at":"2024-11-27T18:33:17.8489235Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:23b7d8e07c16707ff4ec3ca558a8099c454953c840156c318a60a6b4273846a0"},{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:23b7d8e07c16707ff4ec3ca558a8099c454953c840156c318a60a6b4273846a0"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:23b7d8e07c16707ff4ec3ca558a8099c454953c840156c318a60a6b4273846a0"},{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:23b7d8e07c16707ff4ec3ca558a8099c454953c840156c318a60a6b4273846a0"},{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:23b7d8e07c16707ff4ec3ca558a8099c454953c840156c318a60a6b4273846a0"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:23b7d8e07c16707ff4ec3ca558a8099c454953c840156c318a60a6b4273846a0"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_93c5d79e-ebfd-4548-88bb-91514e92da50","ttl":0},{"created_at":"2024-11-27T18:33:17.849376889Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:922ac8fcb88926d95550e82f83c14a4f3f3eaab635e7acf43ee0c59dea0c14d7"},{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:922ac8fcb88926d95550e82f83c14a4f3f3eaab635e7acf43ee0c59dea0c14d7"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:922ac8fcb88926d95550e82f83c14a4f3f3eaab635e7acf43ee0c59dea0c14d7"},{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:922ac8fcb88926d95550e82f83c14a4f3f3eaab635e7acf43ee0c59dea0c14d7"},{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:922ac8fcb88926d95550e82f83c14a4f3f3eaab635e7acf43ee0c59dea0c14d7"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:922ac8fcb88926d95550e82f83c14a4f3f3eaab635e7acf43ee0c59dea0c14d7"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_5faf2d1d-5fb8-4b64-92eb-c18606741d05","ttl":0},{"created_at":"2024-11-27T18:33:17.849681518Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:68075f2beca1cfd3f243ec110000716dff39d895f4d5e0d3faba7ace430f9633"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:68075f2beca1cfd3f243ec110000716dff39d895f4d5e0d3faba7ace430f9633"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:68075f2beca1cfd3f243ec110000716dff39d895f4d5e0d3faba7ace430f9633"},{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:68075f2beca1cfd3f243ec110000716dff39d895f4d5e0d3faba7ace430f9633"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:68075f2beca1cfd3f243ec110000716dff39d895f4d5e0d3faba7ace430f9633"},{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:68075f2beca1cfd3f243ec110000716dff39d895f4d5e0d3faba7ace430f9633"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_c6bd7eb3-b51a-4fbc-8766-3875600184a6","ttl":0},{"created_at":"2024-11-27T18:33:17.850123858Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:8509afffa2f4447cf6eb4060aba992e855f61c1fdbaef360bef367a3deea5afd"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:8509afffa2f4447cf6eb4060aba992e855f61c1fdbaef360bef367a3deea5afd"},{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:8509afffa2f4447cf6eb4060aba992e855f61c1fdbaef360bef367a3deea5afd"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:8509afffa2f4447cf6eb4060aba992e855f61c1fdbaef360bef367a3deea5afd"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:8509afffa2f4447cf6eb4060aba992e855f61c1fdbaef360bef367a3deea5afd"},{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:8509afffa2f4447cf6eb4060aba992e855f61c1fdbaef360bef367a3deea5afd"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_fb3c48ba-1a7b-4d6a-8f14-e837fde2f29f","ttl":0},{"created_at":"2024-11-27T18:33:17.850482208Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:b6a3e93ea08fa0a42ceaeda8786839ed06d17429e072e5cba725b3d7e0116b19"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:b6a3e93ea08fa0a42ceaeda8786839ed06d17429e072e5cba725b3d7e0116b19"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:b6a3e93ea08fa0a42ceaeda8786839ed06d17429e072e5cba725b3d7e0116b19"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:b6a3e93ea08fa0a42ceaeda8786839ed06d17429e072e5cba725b3d7e0116b19"},{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:b6a3e93ea08fa0a42ceaeda8786839ed06d17429e072e5cba725b3d7e0116b19"},{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:b6a3e93ea08fa0a42ceaeda8786839ed06d17429e072e5cba725b3d7e0116b19"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_2d2ec513-f5fb-40f1-a775-f60faf17fab4","ttl":0},{"created_at":"2024-11-27T18:33:17.850939108Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"},{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"},{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_e515c281-4d21-40d1-b731-2307a1527cc8","ttl":0},{"created_at":"2024-11-27T18:33:17.851427527Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:0fdc0c77854ade5095e513b49a62f25fcf894ad08e8c93d6ab2418c02d293b2c"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:0fdc0c77854ade5095e513b49a62f25fcf894ad08e8c93d6ab2418c02d293b2c"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:0fdc0c77854ade5095e513b49a62f25fcf894ad08e8c93d6ab2418c02d293b2c"},{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:0fdc0c77854ade5095e513b49a62f25fcf894ad08e8c93d6ab2418c02d293b2c"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:0fdc0c77854ade5095e513b49a62f25fcf894ad08e8c93d6ab2418c02d293b2c"},{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:0fdc0c77854ade5095e513b49a62f25fcf894ad08e8c93d6ab2418c02d293b2c"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_db3a1cea-5ba6-4139-ac23-abda8d4b766c","ttl":0},{"created_at":"2024-11-27T18:33:17.851827952Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:a333090de9f0b4a401a269aebeef18871555056477cf4b607f89a56c4e097a8a"},{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:a333090de9f0b4a401a269aebeef18871555056477cf4b607f89a56c4e097a8a"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:a333090de9f0b4a401a269aebeef18871555056477cf4b607f89a56c4e097a8a"},{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:a333090de9f0b4a401a269aebeef18871555056477cf4b607f89a56c4e097a8a"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:a333090de9f0b4a401a269aebeef18871555056477cf4b607f89a56c4e097a8a"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:a333090de9f0b4a401a269aebeef18871555056477cf4b607f89a56c4e097a8a"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_74851ce1-517d-435f-b582-f42d108fdaf2","ttl":0},{"created_at":"2024-11-27T18:33:17.852345932Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:cfd03f593d3c72597de12049ad9da078ecb03b1cf82f5c547b37db13dce0193c"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:cfd03f593d3c72597de12049ad9da078ecb03b1cf82f5c547b37db13dce0193c"},{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:cfd03f593d3c72597de12049ad9da078ecb03b1cf82f5c547b37db13dce0193c"},{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:cfd03f593d3c72597de12049ad9da078ecb03b1cf82f5c547b37db13dce0193c"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:cfd03f593d3c72597de12049ad9da078ecb03b1cf82f5c547b37db13dce0193c"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:cfd03f593d3c72597de12049ad9da078ecb03b1cf82f5c547b37db13dce0193c"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_c067d25e-57ad-4e4b-a676-a478500edc4e","ttl":0},{"created_at":"2024-11-27T18:33:17.852818036Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:6c0cfc9736597fd3fba4b60241efc7fc1410d3dd66625a30f5caaa0194d913da"},{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:6c0cfc9736597fd3fba4b60241efc7fc1410d3dd66625a30f5caaa0194d913da"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:6c0cfc9736597fd3fba4b60241efc7fc1410d3dd66625a30f5caaa0194d913da"},{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:6c0cfc9736597fd3fba4b60241efc7fc1410d3dd66625a30f5caaa0194d913da"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:6c0cfc9736597fd3fba4b60241efc7fc1410d3dd66625a30f5caaa0194d913da"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:6c0cfc9736597fd3fba4b60241efc7fc1410d3dd66625a30f5caaa0194d913da"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_1c0a5956-3dd7-4147-a0d6-fdbaa1703162","ttl":0},{"created_at":"2024-11-27T18:33:17.853333972Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:a07abd82fba1ad80e123eafea1aff1dd4d7404d10eeebef491ae46d100f24508"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:a07abd82fba1ad80e123eafea1aff1dd4d7404d10eeebef491ae46d100f24508"},{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:a07abd82fba1ad80e123eafea1aff1dd4d7404d10eeebef491ae46d100f24508"},{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:a07abd82fba1ad80e123eafea1aff1dd4d7404d10eeebef491ae46d100f24508"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:a07abd82fba1ad80e123eafea1aff1dd4d7404d10eeebef491ae46d100f24508"},{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:a07abd82fba1ad80e123eafea1aff1dd4d7404d10eeebef491ae46d100f24508"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_4d3d6c33-8bed-426b-830a-aeb5f2384a04","ttl":0},{"created_at":"2024-11-27T18:33:17.854141265Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"},{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"},{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_2b1bfa6b-7307-45da-9a9c-a816b51e4b3c","ttl":0},{"created_at":"2024-11-27T18:33:17.854647246Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:f2b356ed4af6ca3647340bb3b8d6c95e66e5d22d2f790b8d80dc251cbd4ee24d"},{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:f2b356ed4af6ca3647340bb3b8d6c95e66e5d22d2f790b8d80dc251cbd4ee24d"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:f2b356ed4af6ca3647340bb3b8d6c95e66e5d22d2f790b8d80dc251cbd4ee24d"},{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:f2b356ed4af6ca3647340bb3b8d6c95e66e5d22d2f790b8d80dc251cbd4ee24d"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:f2b356ed4af6ca3647340bb3b8d6c95e66e5d22d2f790b8d80dc251cbd4ee24d"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:f2b356ed4af6ca3647340bb3b8d6c95e66e5d22d2f790b8d80dc251cbd4ee24d"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_e7bb70dc-f677-4986-b516-6ad36284d4cb","ttl":0},{"created_at":"2024-11-27T18:33:17.855116976Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:7c10ed83ba3318d7ef10210ae469247c685a5be84f3b97fe12d21169d31d3dd2"},{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:7c10ed83ba3318d7ef10210ae469247c685a5be84f3b97fe12d21169d31d3dd2"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:7c10ed83ba3318d7ef10210ae469247c685a5be84f3b97fe12d21169d31d3dd2"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:7c10ed83ba3318d7ef10210ae469247c685a5be84f3b97fe12d21169d31d3dd2"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:7c10ed83ba3318d7ef10210ae469247c685a5be84f3b97fe12d21169d31d3dd2"},{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:7c10ed83ba3318d7ef10210ae469247c685a5be84f3b97fe12d21169d31d3dd2"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_67e37355-ff8a-4d9b-801f-e64c6bb44d99","ttl":0},{"created_at":"2024-11-27T18:33:17.855531247Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:8b55ca9b80a879137ae6d9937a84fc4c5bee3aaeefb7dd815e7fe17a98a9e351"},{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:8b55ca9b80a879137ae6d9937a84fc4c5bee3aaeefb7dd815e7fe17a98a9e351"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:8b55ca9b80a879137ae6d9937a84fc4c5bee3aaeefb7dd815e7fe17a98a9e351"},{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:8b55ca9b80a879137ae6d9937a84fc4c5bee3aaeefb7dd815e7fe17a98a9e351"},{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:8b55ca9b80a879137ae6d9937a84fc4c5bee3aaeefb7dd815e7fe17a98a9e351"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:8b55ca9b80a879137ae6d9937a84fc4c5bee3aaeefb7dd815e7fe17a98a9e351"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_fcb229fa-b20a-405f-9193-25785356c723","ttl":0},{"created_at":"2024-11-27T18:33:17.855951857Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:5d7bcf957c0928678aa351babc3e2eede543bf26e9bcec47907ee1598d87c8c7"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:5d7bcf957c0928678aa351babc3e2eede543bf26e9bcec47907ee1598d87c8c7"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:5d7bcf957c0928678aa351babc3e2eede543bf26e9bcec47907ee1598d87c8c7"},{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:5d7bcf957c0928678aa351babc3e2eede543bf26e9bcec47907ee1598d87c8c7"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:5d7bcf957c0928678aa351babc3e2eede543bf26e9bcec47907ee1598d87c8c7"},{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:5d7bcf957c0928678aa351babc3e2eede543bf26e9bcec47907ee1598d87c8c7"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_b5486c26-77ed-4f34-bb50-a993b52f2594","ttl":0},{"created_at":"2024-11-27T18:33:17.856466669Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:777e2226c898d04426bed857c87ed5f1e5a6e605125725d4c772bee365abef47"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:777e2226c898d04426bed857c87ed5f1e5a6e605125725d4c772bee365abef47"},{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:777e2226c898d04426bed857c87ed5f1e5a6e605125725d4c772bee365abef47"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:777e2226c898d04426bed857c87ed5f1e5a6e605125725d4c772bee365abef47"},{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:777e2226c898d04426bed857c87ed5f1e5a6e605125725d4c772bee365abef47"},{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:777e2226c898d04426bed857c87ed5f1e5a6e605125725d4c772bee365abef47"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_503f418f-4cfc-4437-a434-c636e55041eb","ttl":0}],"state":"SUCCESS","updated_at":"2024-11-27T18:36:05.080487112Z"},"user_id":0,"user":{"id":0,"created_at":"0001-01-01T00:00:00Z","updated_at":"0001-01-01T00:00:00Z","is_del":0,"email":"","name":"","avatar":"","phone":"","state":"","location":"","bio":"","configs":null},"seed_peer_clusters":[],"scheduler_clusters":[{"id":1,"created_at":"2024-11-27T18:23:27Z","updated_at":"2024-11-27T18:23:27Z","is_del":0,"name":"cluster-1","bio":"","config":{"candidate_parent_limit":4,"filter_parent_limit":15,"job_rate_limit":10},"client_config":{"load_limit":200},"scopes":{},"is_default":true,"seed_peer_clusters":null,"schedulers":null,"peers":null,"jobs":null}]}

I then executed a deployment that used the preheated image:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: rayproject
spec:
  replicas: 1
  selector:
    matchLabels:
      app: rayproject
  template:
    metadata:
      labels:
        app: rayproject
        elotl-luna: "true"
    spec:
      containers:
        - name: rayproject
          image: rayproject/ray-ml:2.33.0.914af0-py311
          resources:
            requests:
              cpu: 1
              memory: 2Gi
            limits:
              cpu: 1
              memory: 2Gi
          command:
            - sleep
            - "infinity"

and the image pull in the deployment was just as slow as usual:

kubectl describe pod/rayproject-b75949b97-5xlws
...
  Normal  Pulled     23s   kubelet            Successfully pulled image "rayproject/ray-ml:2.33.0.914af0-py311" in 5m42.029s (5m42.029s including waiting). Image size: 11092293251 bytes.
  
I checked the dfdaemon logs on the 3 clients and they contained error messages such as (I can upload full logs):

2024-11-27T18:33:18.220138717+00:00 ERROR download_task:download:download_partial_with_scheduler: dragonfly-client/src/resource/task.rs:519: announce p
eer failed: TonicStatus(Status { code: NotFound, message: "host 10.128.15.206-gke-anne-regional-default-pool-23f7047d-0c4w not found", metadata: Metada
taMap { headers: {"content-type": "application/grpc"} }, source: None }) host_id="10.128.15.206-gke-anne-regional-default-pool-23f7047d-0c4w" task_id="
b253b75db00bc5c4b749493f6349d2d93d4778d6ccd375d96db50aa5251a328e" peer_id="10.128.15.206-gke-anne-regional-default-pool-23f7047d-0c4w-b33de249-93d7-412
a-9d71-ebf08280345f"
2024-11-27T18:33:18.220258164+00:00 ERROR download_task:download: dragonfly-client/src/resource/task.rs:391: download with scheduler error: TonicStatus
(Status { code: NotFound, message: "host 10.128.15.206-gke-anne-regional-default-pool-23f7047d-0c4w not found", metadata: MetadataMap { headers: {"cont
ent-type": "application/grpc"} }, source: None }) host_id="10.128.15.206-gke-anne-regional-default-pool-23f7047d-0c4w" task_id="b253b75db00bc5c4b749493
f6349d2d93d4778d6ccd375d96db50aa5251a328e" peer_id="10.128.15.206-gke-anne-regional-default-pool-23f7047d-0c4w-b33de249-93d7-412a-9d71-ebf08280345f"
2024-11-27T18:33:18.246442266+00:00 ERROR download_task:download:download_partial_with_scheduler:download_partial_with_scheduler_from_remote_peer:run:c
ollect_from_remote_peers: dragonfly-client/src/resource/piece_collector.rs:279: sync pieces failed: task 533 was cancelled host_id="10.128.15.206-gke-a
nne-regional-default-pool-23f7047d-0c4w" task_id="752e6c00b91aa95cf393a5fb243b84c0dc39d91497faf60cd050af51be61596f" peer_id="10.128.15.206-gke-anne-reg
ional-default-pool-23f7047d-0c4w-a079ad57-33ce-405a-ac69-c4b0f8e6b502"
<etc>

The host name that is reported as not found does not match any of my host names.  I don't actually know if these errors are associated with the preheat not speeding up the image load but I'm assuming they may be.

$ kubectl get nodes -o wide
NAME                                           STATUS   ROLES    AGE   VERSION               INTERNAL-IP     EXTERNAL-IP      OS-IMAGE                             KERNEL-VERSION   CONTAINER-RUNTIME
gke-anne-regional-default-pool-23f7047d-0c4w   Ready    <none>   59m   v1.30.5-gke.1014003   10.128.15.206   34.55.239.12     Container-Optimized OS from Google   6.1.100+         containerd://1.7.19
gke-anne-regional-default-pool-cbcddcd0-mcp6   Ready    <none>   59m   v1.30.5-gke.1014003   10.128.15.205   34.172.252.60    Container-Optimized OS from Google   6.1.100+         containerd://1.7.19
gke-anne-regional-default-pool-f7fcf7d7-68px   Ready    <none>   59m   v1.30.5-gke.1014003   10.128.15.207   35.192.141.105   Container-Optimized OS from Google   6.1.100+         containerd://1.7.19

Expected behavior:

Expect image pull time to be greatly reduced after the preheat. Don't expect the client logs to be filed with errors.

How to reproduce it:

See details in section above.

Environment:

  • Dragonfly version: 1.2.24
  • OS: cos-113-18244-151-27
  • Kernel (e.g. uname -a): Linux gke-anne-regional-default-pool-23f7047d-0c4w 6.1.100+ [WIP] Implement df daemon #1 SMP PREEMPT_DYNAMIC Sat Aug 24 16:19:44 UTC 2024 x86_64 GNU/Linux
  • Others: GKE 1.30.5-gke.1014003
@amholler amholler added the bug label Nov 27, 2024
@gaius-qi
Copy link
Member

@amholler Please provide the full log in dfdaemon.log, thanks.

@gaius-qi gaius-qi self-assigned this Nov 28, 2024
@amholler
Copy link
Author

clientlogs.tar.gz

@amholler
Copy link
Author

Hi, @gaius-qi have uploaded a tar that includes the dfdaemon.log files from each of the 3 client instances. Thanks!

@gaius-qi
Copy link
Member

gaius-qi commented Dec 5, 2024

@amholler This log is the time when you downloaded a piece from the remote:

2024-11-27T18:33:23.710454826+00:00  INFO download_task:download:download_partial_with_scheduler:download_partial_with_scheduler_from_remote_peer:run:collect_from_remote_peers: dragonfly-client/src/resource/piece_collector.rs:212: received piece 7608d777a63129791161b983810864ec81ec5fb793e73f53088fcd4710f13c68-224 metadata from parent 10.24.2.6-dragonfly-seed-client-0-78426839-bf26-47bc-ae5e-bd7337b49b87-seed host_id="10.128.15.206-gke-anne-regional-default-pool-23f7047d-0c4w" task_id="7608d777a63129791161b983810864ec81ec5fb793e73f53088fcd4710f13c68" peer_id="10.128.15.206-gke-anne-regional-default-pool-23f7047d-0c4w-0acbd9dd-8ceb-4aa9-9495-b0ac4787c1e3"
2024-11-27T18:33:23.710551717+00:00  INFO download_task:download:download_partial_with_scheduler:download_partial_with_scheduler_from_remote_peer: dragonfly-client/src/resource/task.rs:969: start to download piece 7608d777a63129791161b983810864ec81ec5fb793e73f53088fcd4710f13c68-224 from remote peer "10.24.2.6-dragonfly-seed-client-0-78426839-bf26-47bc-ae5e-bd7337b49b87-seed" host_id="10.128.15.206-gke-anne-regional-default-pool-23f7047d-0c4w" task_id="7608d777a63129791161b983810864ec81ec5fb793e73f53088fcd4710f13c68" peer_id="10.128.15.206-gke-anne-regional-default-pool-23f7047d-0c4w-0acbd9dd-8ceb-4aa9-9495-b0ac4787c1e3"
2024-11-27T18:33:24.576013333+00:00  INFO download_task:download:download_partial_with_scheduler:download_partial_with_scheduler_from_remote_peer: dragonfly-client/src/resource/task.rs:1065: finished piece 7608d777a63129791161b983810864ec81ec5fb793e73f53088fcd4710f13c68-224 from remote peer Some("10.24.2.6-dragonfly-seed-client-0-78426839-bf26-47bc-ae5e-bd7337b49b87-seed") host_id="10.128.15.206-gke-anne-regional-default-pool-23f7047d-0c4w" task_id="7608d777a63129791161b983810864ec81ec5fb793e73f53088fcd4710f13c68" peer_id="10.128.15.206-gke-anne-regional-default-pool-23f7047d-0c4w-0acbd9dd-8ceb-4aa9-9495-b0ac4787c1e3"

This log is the time when you downloaded a piece from the source:

2024-11-27T18:33:18.220331735+00:00  INFO download_task:download:download_partial_from_source: dragonfly-client/src/resource/task.rs:1546: start to download piece b253b75db00bc5c4b749493f6349d2d93d4778d6ccd375d96db50aa5251a328e-0 from source host_id="10.128.15.206-gke-anne-regional-default-pool-23f7047d-0c4w" task_id="b253b75db00bc5c4b749493f6349d2d93d4778d6ccd375d96db50aa5251a328e" peer_id="10.128.15.206-gke-anne-regional-default-pool-23f7047d-0c4w-b33de249-93d7-412a-9d71-ebf08280345f"
2024-11-27T18:33:18.297044117+00:00  INFO download_task:download:download_partial_from_source: dragonfly-client/src/resource/task.rs:1601: finished piece b253b75db00bc5c4b749493f6349d2d93d4778d6ccd375d96db50aa5251a328e-0 from source host_id="10.128.15.206-gke-anne-regional-default-pool-23f7047d-0c4w" task_id="b253b75db00bc5c4b749493f6349d2d93d4778d6ccd375d96db50aa5251a328e" peer_id="10.128.15.206-gke-anne-regional-default-pool-23f7047d-0c4w-b33de249-93d7-412a-9d71-ebf08280345f"

The faster P2P download speed needs to meet the following:

  1. The download speed directly back to the source is slower than the download speed between Peer and peer.
  2. When the source bandwidth is full during large-scale downloads, P2P will be faster.

@amholler
Copy link
Author

amholler commented Dec 5, 2024

Hi, @gaius-qi Thanks for your analysis! But I'm missing something here: I had issued a successful preheat to all peers, so why does any peer to peer operation need to happen? I thought all peers should have the full image after the preheat from source completed, so image load after the preheat should come directly from the local cache, and hence be very fast.

@gaius-qi
Copy link
Member

@amholler How do you preheat the image to all peers?

@amholler
Copy link
Author

amholler commented Dec 10, 2024

Hi, @gaius-qi , Thanks for responding.

How do you preheat the image to all peers?

As I mentioned in the first comment in this ticket, I did it as follows:

I created a token and then ran the following all_peers preheat job, which completed successfully:

curl --location --request POST 'http://127.0.0.1:8080/oapi/v1/jobs' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer Njg4NDRhZGYtYmY0ZS00NmIwLTk5MWQtZjIwM2U1MjBjNGM1' \
--data-raw '{
    "type": "preheat",
    "args": {
        "type": "image",
        "url": "https://index.docker.io/v2/rayproject/ray-ml/manifests/2.33.0.914af0-py311",
        "scope": "all_peers"
    }
}'

@gaius-qi
Copy link
Member

@amholler Please provide Manager, Scheduler and one of the peer logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants