Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSError: Request Entity Too Large #31

Open
awaelchli opened this issue Mar 21, 2023 · 0 comments
Open

OSError: Request Entity Too Large #31

awaelchli opened this issue Mar 21, 2023 · 0 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@awaelchli
Copy link
Contributor

馃悰 Bug

When the train_work is about to stop after finishing training, we get a OSError: [Errno 5] An error occurred (413) when calling the PutObject operation: Request Entity Too Large error.

To Reproduce

Steps to reproduce the behavior:

lightning run app app.py --cloud  --name quick-start-3

Code sample

The app.py from this repo.

Error and logs

root.train_work] Epoch 9: 100% 12/12 [00:00<00:00, 16.72it/s, v_num=0]
[root.train_work] Traceback (most recent call last):
[root.train_work]   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/s3fs/core.py", line 112, in _error_wrapper
[root.train_work]     return await func(*args, **kwargs)
[root.train_work]   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/aiobotocore/client.py", line 358, in _make_api_call
[root.train_work]     raise error_class(parsed_response, operation_name)
[root.train_work] botocore.exceptions.ClientError: An error occurred (413) when calling the PutObject operation: Request Entity Too Large
[root.train_work] The above exception was the direct cause of the following exception:
[root.train_work] Traceback (most recent call last):
[root.train_work]   File "/home/zeus/miniconda3/envs/cloudspace/bin/lightning-cloud-launcher", line 8, in <module>
[root.train_work]     sys.exit(main())
[root.train_work]   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
[root.train_work]     return self.main(*args, **kwargs)
[root.train_work]   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/click/core.py", line 1055, in main
[root.train_work]     rv = self.invoke(ctx)
[root.train_work]   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
[root.train_work]     return _process_result(sub_ctx.command.invoke(sub_ctx))
[root.train_work]   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
[root.train_work]     return _process_result(sub_ctx.command.invoke(sub_ctx))
[root.train_work]   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
[root.train_work]     return ctx.invoke(self.callback, **ctx.params)
[root.train_work]   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/click/core.py", line 760, in invoke
[root.train_work]     return __callback(*args, **kwargs)
[root.train_work]   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/lightning_launcher/cli/__main__.py", line 87, in run_work
[root.train_work]     run_lightning_work(
[root.train_work]   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/lightning_launcher/utils.py", line 51, in wrapper
[root.train_work]     res = func(*args, **kwargs)
[root.train_work]   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/lightning_launcher/utils.py", line 77, in wrapper
[root.train_work]     res = func(*args, **kwargs)
[root.train_work]   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/lightning_launcher/launcher.py", line 181, in run_lightning_work
[root.train_work]     WorkRunner(
[root.train_work]   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/lightning/app/utilities/proxies.py", line 437, in __call__
[root.train_work]     raise e
[root.train_work]   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/lightning/app/utilities/proxies.py", line 418, in __call__
[root.train_work]     self.run_once()
[root.train_work]   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/lightning/app/utilities/proxies.py", line 582, in run_once
[root.train_work]     persist_artifacts(work=self.work)
[root.train_work]   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/lightning/app/utilities/proxies.py", line 722, in persist_artifacts
[root.train_work]     _copy_files(artifact_path, destination_path)
[root.train_work]   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/lightning/app/storage/copier.py", line 152, in _copy_files
[root.train_work]     fs.put(str(source_path), str(destination_path))
[root.train_work]   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/fsspec/asyn.py", line 113, in wrapper
[root.train_work]     return sync(self.loop, func, *args, **kwargs)
[root.train_work]   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/fsspec/asyn.py", line 98, in sync
[root.train_work]     raise return_result
[root.train_work]   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/fsspec/asyn.py", line 53, in _runner
[root.train_work]     result[0] = await coro
[root.train_work]   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/fsspec/asyn.py", line 523, in _put
[root.train_work]     return await _run_coros_in_chunks(
[root.train_work]   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/fsspec/asyn.py", line 269, in _run_coros_in_chunks
[root.train_work]     await asyncio.gather(*chunk, return_exceptions=return_exceptions),
[root.train_work]   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/asyncio/tasks.py", line 455, in wait_for
[root.train_work]     return await fut
[root.train_work]   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/s3fs/core.py", line 1073, in _put_file
[root.train_work]     await self._call_s3(
[root.train_work]   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/s3fs/core.py", line 339, in _call_s3
[root.train_work]     return await _error_wrapper(
[root.train_work]   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.8/site-packages/s3fs/core.py", line 139, in _error_wrapper
[root.train_work]     raise err
[root.train_work] OSError: [Errno 5] An error occurred (413) when calling the PutObject operation: Request Entity Too Large

Environment

  • PyTorch Version (e.g., 1.0): 2.0.0
  • OS (e.g., Linux): Linux
  • How you installed PyTorch (conda, pip, source): pip
  • Build command you used (if compiling from source): -
  • Python version: 3.10
  • CUDA/cuDNN version: -
  • GPU models and configuration: -
  • Any other relevant information: Lightning 2.0

Additional context

Found while running #30

@awaelchli awaelchli added bug Something isn't working help wanted Extra attention is needed labels Mar 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant