Description
https://open.fedml.ai/octopus/collaboratorGroup/runDetail?projectId=1591&groupId=2751&runId=5249
[FedML-Client @device-id-11226] [Wed, 15 Feb 2023 16:07:19] [ERROR]
[FedML-Client @device-id-11226] [Wed, 15 Feb 2023 16:07:19] [ERROR] device = validate_cuda_device(location)
[FedML-Client @device-id-11226] [Wed, 15 Feb 2023 16:07:19] [ERROR]
[FedML-Client @device-id-11226] [Wed, 15 Feb 2023 16:07:19] [ERROR] File "C:\Users\chaoy\anaconda3\envs\fedml\lib\site-packages\torch\serialization.py", line 166, in validate_cuda_device
[FedML-Client @device-id-11226] [Wed, 15 Feb 2023 16:07:19] [ERROR]
[FedML-Client @device-id-11226] [Wed, 15 Feb 2023 16:07:19] [ERROR] raise RuntimeError('Attempting to deserialize object on a CUDA '
[FedML-Client @device-id-11226] [Wed, 15 Feb 2023 16:07:19] [ERROR]
[FedML-Client @device-id-11226] [Wed, 15 Feb 2023 16:07:19] [ERROR] RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
[FedML-Client @device-id-11226] [Wed, 15 Feb 2023 16:07:19] [ERROR]
[FedML-Client @device-id-11226] [Wed, 15 Feb 2023 16:07:19] [INFO] [init.py:312:log_training_failed_status] log training inner status FAILED
[FedML-Client @device-id-11226] [Wed, 15 Feb 2023 16:07:19] [INFO]
[FedML-Client @device-id-11226] [Wed, 15 Feb 2023 16:07:19] [INFO] log training inner status FAILED
[FedML-Client @device-id-11226] [Wed, 15 Feb 2023 16:07:19] [INFO]
[FedML-Client @device-id-11226] [Wed, 15 Feb 2023 16:07:19] [INFO] [mlops_metrics.py:160:common_broadcast_client_training_status] report_client_training_status. message_json = {"edge_id": 11226, "run_id": "5249", "status": "FAILED"}