You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 4, 2024. It is now read-only.
When the train_work is about to stop after finishing training, we get a OSError: [Errno 5] An error occurred (413) when calling the PutObject operation: Request Entity Too Large error.
To Reproduce
Steps to reproduce the behavior:
lightning run app app.py --cloud --name quick-start-3
Code sample
The app.py from this repo.
Error and logs
root.train_work] Epoch 9: 100% 12/12 [00:00<00:00, 16.72it/s, v_num=0]
[root.train_work] Traceback (most recent call last):
[root.train_work] File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/s3fs/core.py", line 112, in _error_wrapper
[root.train_work] return await func(*args, **kwargs)
[root.train_work] File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/aiobotocore/client.py", line 358, in _make_api_call
[root.train_work] raise error_class(parsed_response, operation_name)
[root.train_work] botocore.exceptions.ClientError: An error occurred (413) when calling the PutObject operation: Request Entity Too Large
[root.train_work] The above exception was the direct cause of the following exception:
[root.train_work] Traceback (most recent call last):
[root.train_work] File "/home/zeus/miniconda3/envs/cloudspace/bin/lightning-cloud-launcher", line 8, in <module>
[root.train_work] sys.exit(main())
[root.train_work] File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
[root.train_work] return self.main(*args, **kwargs)
[root.train_work] File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/click/core.py", line 1055, in main
[root.train_work] rv = self.invoke(ctx)
[root.train_work] File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
[root.train_work] return _process_result(sub_ctx.command.invoke(sub_ctx))
[root.train_work] File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
[root.train_work] return _process_result(sub_ctx.command.invoke(sub_ctx))
[root.train_work] File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
[root.train_work] return ctx.invoke(self.callback, **ctx.params)
[root.train_work] File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/click/core.py", line 760, in invoke
[root.train_work] return __callback(*args, **kwargs)
[root.train_work] File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/lightning_launcher/cli/__main__.py", line 87, in run_work
[root.train_work] run_lightning_work(
[root.train_work] File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/lightning_launcher/utils.py", line 51, in wrapper
[root.train_work] res = func(*args, **kwargs)
[root.train_work] File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/lightning_launcher/utils.py", line 77, in wrapper
[root.train_work] res = func(*args, **kwargs)
[root.train_work] File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/lightning_launcher/launcher.py", line 181, in run_lightning_work
[root.train_work] WorkRunner(
[root.train_work] File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/lightning/app/utilities/proxies.py", line 437, in __call__
[root.train_work] raise e
[root.train_work] File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/lightning/app/utilities/proxies.py", line 418, in __call__
[root.train_work] self.run_once()
[root.train_work] File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/lightning/app/utilities/proxies.py", line 582, in run_once
[root.train_work] persist_artifacts(work=self.work)
[root.train_work] File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/lightning/app/utilities/proxies.py", line 722, in persist_artifacts
[root.train_work] _copy_files(artifact_path, destination_path)
[root.train_work] File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/lightning/app/storage/copier.py", line 152, in _copy_files
[root.train_work] fs.put(str(source_path), str(destination_path))
[root.train_work] File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/fsspec/asyn.py", line 113, in wrapper
[root.train_work] return sync(self.loop, func, *args, **kwargs)
[root.train_work] File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/fsspec/asyn.py", line 98, in sync
[root.train_work] raise return_result
[root.train_work] File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/fsspec/asyn.py", line 53, in _runner
[root.train_work] result[0] = await coro
[root.train_work] File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/fsspec/asyn.py", line 523, in _put
[root.train_work] return await _run_coros_in_chunks(
[root.train_work] File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/fsspec/asyn.py", line 269, in _run_coros_in_chunks
[root.train_work] await asyncio.gather(*chunk, return_exceptions=return_exceptions),
[root.train_work] File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/asyncio/tasks.py", line 455, in wait_for
[root.train_work] return await fut
[root.train_work] File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/s3fs/core.py", line 1073, in _put_file
[root.train_work] await self._call_s3(
[root.train_work] File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/s3fs/core.py", line 339, in _call_s3
[root.train_work] return await _error_wrapper(
[root.train_work] File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/s3fs/core.py", line 139, in _error_wrapper
[root.train_work] raise err
[root.train_work] OSError: [Errno 5] An error occurred (413) when calling the PutObject operation: Request Entity Too Large
Environment
PyTorch Version (e.g., 1.0): 2.0.0
OS (e.g., Linux): Linux
How you installed PyTorch (conda, pip, source): pip
Build command you used (if compiling from source): -
🐛 Bug
When the train_work is about to stop after finishing training, we get a
OSError: [Errno 5] An error occurred (413) when calling the PutObject operation: Request Entity Too Large
error.To Reproduce
Steps to reproduce the behavior:
Code sample
The app.py from this repo.
Error and logs
Environment
conda
,pip
, source): pipAdditional context
Found while running #30
The text was updated successfully, but these errors were encountered: