Skip to content

DataLoader error? #14

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
yyhosmium64 opened this issue Apr 8, 2025 · 4 comments
Closed

DataLoader error? #14

yyhosmium64 opened this issue Apr 8, 2025 · 4 comments

Comments

@yyhosmium64
Copy link

yyhosmium64 commented Apr 8, 2025

Hello,

Regardless of the dataset, training proceeds for a while but then fails around iter 3000 (2700~3300) with the following error:

Runtime exception: Caught AttributeError in DataLoader worker process

The process number associated with the error varies each time.
I’ve confirmed that the dataset is placed in the correct path and is not corrupted.

If you have any idea how to resolve this issue, I would really appreciate your help.
Thank you in advance!

My environment :
OS : Ubuntu 22.04 LTS
GPU : RTX4090, CPU : i7-13700K, RAM : 64GB
OptiX==7.7 (in this repo.), torch==2.3.1, CUDA==11.8, nvidia-driver==535.183.01

###########################################################################################
`4:47:25 Runtime exception: Caught AttributeError in DataLoader worker process 6. console_utils.py:395
Original Traceback (most recent call last):
File "/home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index) # type: ignore
^^^^^^^^^^^^^^^^^^^^
File "/home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset for idx in possibly_batched_index]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in
data = [self.dataset for idx in possibly_batched_index]
~~~~~~~~~~~~^^^^^
File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 1409, in getitem
output = self.get_ground_truth(index) # load images, camera parameters, etc (10ms)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 1255, in get_ground_truth
rgb, msk, wet, dpt, bkg, norm = self.get_image(output.view_index, output.latent_index, output) # H, W, 3
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 1000, in get_image
return self.get_image_from_bytes(view_index, latent_index)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 870, in get_image_from_bytes
rgb = torch.as_tensor(load_image_from_bytes(im_bytes, normalize=True)) # 4-5ms for 400 * 592 jpeg, sooo slow
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/EnvGS/easyvolcap/utils/data_utils.py", line 1514, in load_image_from_bytes
if image.ndim == 2:
^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'ndim'

╭────────────────────────────────────────────────────────────────────────────── Traceback (most recent call last) ──────────────────────────────────────────────────────────────────────────────╮
│ /home/user/EnvGS/easyvolcap/utils/console_utils.py:392 in inner │
│ │
│ ❱ 392 │ │ │ │ return func(*args, **kwargs) │
│ │
│ /home/user/EnvGS/easyvolcap/scripts/../../easyvolcap/scripts/main.py:308 in main │
│ │
│ ❱ 308 │ else: globals()args.type # invoke this (call callable_from_cfg -> call_from_cfg) │
│ │
│ /home/user/EnvGS/easyvolcap/engine/registry.py:56 in inner │
│ │
│ ❱ 56 │ │ return call_from_cfg(func, cfg) │
│ │
│ /home/user/EnvGS/easyvolcap/engine/registry.py:47 in call_from_cfg │
│ │
│ ❱ 47 │ return func(**call_args) │
│ │
│ /home/user/EnvGS/easyvolcap/scripts/../../easyvolcap/scripts/main.py:302 in train │
│ │
│ ❱ 302 │ launcher(**kwargs, runner_function=runner.train, runner_object=runner) │
│ │
│ /home/user/EnvGS/easyvolcap/scripts/../../easyvolcap/scripts/main.py:52 in launcher │
│ │
│ ❱ 52 │ runner_function() │
│ │
│ /home/user/EnvGS/easyvolcap/runners/volumetric_video_runner.py:322 in train │
│ │
│ ❱ 322 │ │ │ next(train_generator, None) # avoid reconstruction of the dataloader │
│ │
│ /home/user/EnvGS/easyvolcap/runners/volumetric_video_runner.py:422 in train_generator │
│ │
│ ❱ 422 │ │ │ │ index, flying_batch = next(enumerater, (None, None)) │
│ │
│ /home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/dataloader.py:631 in next
│ │
│ ❱ 631 │ │ │ data = self._next_data() │
│ │
│ /home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/dataloader.py:1346 in _next_data │
│ │
│ ❱ 1346 │ │ │ │ return self._process_data(data) │
│ │
│ /home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/dataloader.py:1372 in _process_data │
│ │
│ ❱ 1372 │ │ │ data.reraise() │
│ │
│ /home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/_utils.py:705 in reraise │
│ │
│ ❱ 705 │ │ raise exception │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: Caught AttributeError in DataLoader worker process 6.
Original Traceback (most recent call last):
File "/home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index) # type: ignore[possibly-undefined]
^^^^^^^^^^^^^^^^^^^^
File "/home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in
data = [self.dataset[idx] for idx in possibly_batched_index]
~~~~~~~~~~~~^^^^^
File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 1409, in getitem
output = self.get_ground_truth(index) # load images, camera parameters, etc (10ms)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 1255, in get_ground_truth
rgb, msk, wet, dpt, bkg, norm = self.get_image(output.view_index, output.latent_index, output) # H, W, 3
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 1000, in get_image
return self.get_image_from_bytes(view_index, latent_index)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 870, in get_image_from_bytes
rgb = torch.as_tensor(load_image_from_bytes(im_bytes, normalize=True)) # 4-5ms for 400 * 592 jpeg, sooo slow
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/EnvGS/easyvolcap/utils/data_utils.py", line 1514, in load_image_from_bytes
if image.ndim == 2:
^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'ndim'

                                                       envgs/ref_real/envgs_toycar                                                            

0:33:53 6 3240 671923 163840 0.438146 23.401836 0.047936 0.240869 0.105387 0.132602 0.0231 0.6184 0.000727 1659
0:33:57 6 3241 671923 163840 0.383249 24.081594 0.043934 0.228212 0.096644 0.117945 0.0180 0.6118 0.000727 1646
0:34:01 6 3242 671923 163840 0.351371 24.747236 0.040949 0.234179 0.091455 0.109033 0.0184 0.5458 0.000727 1643
0:34:05 6 3243 671923 163840 0.339653 24.915779 0.040101 0.241138 0.090496 0.106043 0.0175 0.6338 0.000727 1656
0:34:09 6 3244 671923 163840 0.324889 24.712225 0.039866 0.228212 0.080322 0.102365 0.0190 0.5852 0.000727 1648
eta epoch iter num_pts env_num_pts ssim_loss psnr img_loss norm_loss gs_norm_loss loss data batch lr max_mem
*** Caught AttributeError in DataLoader worker process 6.
Original Traceback (most recent call last):
File "/home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index) # type: ignore[possibly-undefined]
^^^^^^^^^^^^^^^^^^^^
File "/home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in
data = [self.dataset[idx] for idx in possibly_batched_index]
~~~~~~~~~~~~^^^^^
File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 1409, in getitem
output = self.get_ground_truth(index) # load images, camera parameters, etc (10ms)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 1255, in get_ground_truth
rgb, msk, wet, dpt, bkg, norm = self.get_image(output.view_index, output.latent_index, output) # H, W, 3
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 1000, in get_image
return self.get_image_from_bytes(view_index, latent_index)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 870, in get_image_from_bytes
rgb = torch.as_tensor(load_image_from_bytes(im_bytes, normalize=True)) # 4-5ms for 400 * 592 jpeg, sooo slow
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/EnvGS/easyvolcap/utils/data_utils.py", line 1514, in load_image_from_bytes
if image.ndim == 2:
^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'ndim'

╭────────────────────────────────────────────────────────────────────────────── Traceback (most recent call last) ──────────────────────────────────────────────────────────────────────────────╮
│ /home/user/EnvGS/easyvolcap/utils/console_utils.py:392 in inner │
│ │
│ 389 │ │ # This function catches errors and stops the execution for easier inspection │
│ 390 │ │ def inner(*args, **kwargs): │
│ 391 │ │ │ try: │
│ ❱ 392 │ │ │ │ return func(*args, **kwargs) │
│ 393 │ │ │ except Exception as e: │
│ 394 │ │ │ │ if isinstance(e, BdbQuit): return # so that nested catch_throw will respect each other │
│ 395 │ │ │ │ log(red(f'Runtime exception: {e}')) │
│ │
│ /home/user/EnvGS/easyvolcap/scripts/../../easyvolcap/scripts/main.py:308 in main │
│ │
│ 305 @catch_throw │
│ 306 def main(): │
│ 307 │ if cfg.mocking: log(f'{green("Modules imported.")} Mode: {yellow(args.type)}. No config loaded, pass config file using -c <PATH_TO_CONFIG>') # MARK: GLOBAL │
│ ❱ 308 │ else: globals()args.type # invoke this (call callable_from_cfg -> call_from_cfg) │
│ 309 │
│ 310 │
│ 311 # Module name == 'main', this is the outermost commandline entry point │
│ │
│ /home/user/EnvGS/easyvolcap/engine/registry.py:56 in inner │
│ │
│ 53 │ │ elif 'cfg' in kwargs: cfg: dict = kwargs['cfg'] │
│ 54 │ │ else: return func(*args, **kwargs) │
│ 55 │ │ cfg.update(kwargs) │
│ ❱ 56 │ │ return call_from_cfg(func, cfg) │
│ 57 │ return inner │
│ 58 │
│ 59 │
│ │
│ /home/user/EnvGS/easyvolcap/engine/registry.py:47 in call_from_cfg │
│ │
│ 44 │ │ │ │ else: │
│ 45 │ │ │ │ │ pass # in case of BASE_KEY, DELETE_KEY, APPEND_KEY, DEPRECATION_KEY │
│ 46 │ │ │ else: call_args[k] = v │
│ ❱ 47 │ return func(**call_args) │
│ 48 │
│ 49 │
│ 50 def callable_from_cfg(func: Callable): │
│ │
│ /home/user/EnvGS/easyvolcap/scripts/../../easyvolcap/scripts/main.py:302 in train │
│ │
│ 299 │ if dry_run: return runner # just construct everything, then return │
│ 300 │ │
│ 301 │ # The actual calling, with grace full exit │
│ ❱ 302 │ launcher(**kwargs, runner_function=runner.train, runner_object=runner) │
│ 303 │
│ 304 │
│ 305 @catch_throw │
│ │
│ /home/user/EnvGS/easyvolcap/scripts/../../easyvolcap/scripts/main.py:52 in launcher │
│ │
│ 49 │ # Give the user some time to save states │
│ 50 │ log('Launching runner for experiment:', magenta(exp_name)) │
│ 51 │ cfg.runner = runner_object # holds a global reference for hacky usage # MARK: GLOBAL │
│ ❱ 52 │ runner_function() │
│ 53 │ │
│ 54 │ profiler_stop() # already setup │
│ 55 │ torch.set_anomaly_enabled(prev_anomaly) │
│ │
│ /home/user/EnvGS/easyvolcap/runners/volumetric_video_runner.py:322 in train │
│ │
│ 319 │ │ for epoch in range(epoch, self.epochs): │
│ 320 │ │ │ │
│ 321 │ │ │ # Possible to make this a decorator? │
│ ❱ 322 │ │ │ next(train_generator, None) # avoid reconstruction of the dataloader │
│ 323 │ │ │ │
│ 324 │ │ │ # Leave some breathing room for other applications │
│ 325 │ │ │ if (epoch + 1) % self.empty_cache_ep == 0: │
│ │
│ /home/user/EnvGS/easyvolcap/runners/volumetric_video_runner.py:422 in train_generator │
│ │
│ 419 │ │ │ │
│ 420 │ │ │ # Get next data and start copying │
│ 421 │ │ │ with torch.cuda.stream(data_stream): │
│ ❱ 422 │ │ │ │ index, flying_batch = next(enumerater, (None, None)) │
│ 423 │ │ │ │ if flying_batch is None: │
│ 424 │ │ │ │ │ flying_batch = dotdict(meta=dotdict(iter=-1)) │
│ 425 │ │ │ │ else: │
│ │
│ /home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/dataloader.py:631 in next
│ │
│ 628 │ │ │ if self._sampler_iter is None: │
│ 629 │ │ │ │ # TODO(pytorch/pytorch#76750) │
│ 630 │ │ │ │ self._reset() # type: ignore[call-arg] │
│ ❱ 631 │ │ │ data = self._next_data() │
│ 632 │ │ │ self._num_yielded += 1 │
│ 633 │ │ │ if self._dataset_kind == _DatasetKind.Iterable and \ │
│ 634 │ │ │ │ │ self._IterableDataset_len_called is not None and \ │
│ │
│ /home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/dataloader.py:1346 in _next_data │
│ │
│ 1343 │ │ │ │ self._task_info[idx] += (data,) │
│ 1344 │ │ │ else: │
│ 1345 │ │ │ │ del self._task_info[idx] │
│ ❱ 1346 │ │ │ │ return self._process_data(data) │
│ 1347 │ │
│ 1348 │ def _try_put_index(self): │
│ 1349 │ │ assert self._tasks_outstanding < self._prefetch_factor * self._num_workers │
│ │
│ /home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/dataloader.py:1372 in _process_data │
│ │
│ 1369 │ │ self._rcvd_idx += 1 │
│ 1370 │ │ self._try_put_index() │
│ 1371 │ │ if isinstance(data, ExceptionWrapper): │
│ ❱ 1372 │ │ │ data.reraise() │
│ 1373 │ │ return data │
│ 1374 │ │
│ 1375 │ def _mark_worker_as_unavailable(self, worker_id, shutdown=False): │
│ │
│ /home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/_utils.py:705 in reraise │
│ │
│ 702 │ │ │ # If the exception takes multiple arguments, don't try to │
│ 703 │ │ │ # instantiate since we don't know how to │
│ 704 │ │ │ raise RuntimeError(msg) from None │
│ ❱ 705 │ │ raise exception │
│ 706 │
│ 707 │
│ 708 def _get_available_device_type(): │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: Caught AttributeError in DataLoader worker process 6.
Original Traceback (most recent call last):
File "/home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index) # type: ignore[possibly-undefined]
^^^^^^^^^^^^^^^^^^^^
File "/home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in
data = [self.dataset[idx] for idx in possibly_batched_index]
~~~~~~~~~~~~^^^^^
File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 1409, in getitem
output = self.get_ground_truth(index) # load images, camera parameters, etc (10ms)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 1255, in get_ground_truth
rgb, msk, wet, dpt, bkg, norm = self.get_image(output.view_index, output.latent_index, output) # H, W, 3
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 1000, in get_image
return self.get_image_from_bytes(view_index, latent_index)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 870, in get_image_from_bytes
rgb = torch.as_tensor(load_image_from_bytes(im_bytes, normalize=True)) # 4-5ms for 400 * 592 jpeg, sooo slow
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/EnvGS/easyvolcap/utils/data_utils.py", line 1514, in load_image_from_bytes
if image.ndim == 2:
^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'ndim'

During handling of the above exception, another exception occurred:

╭────────────────────────────────────────────────────────────────────────────── Traceback (most recent call last) ──────────────────────────────────────────────────────────────────────────────╮
│ /home/user/EnvGS/easyvolcap/scripts/../../easyvolcap/scripts/main.py:313 in │
│ │
│ 310 │
│ 311 # Module name == 'main', this is the outermost commandline entry point │
│ 312 if name == 'main': │
│ ❱ 313 │ main() │
│ 314 │
│ │
│ /home/user/EnvGS/easyvolcap/utils/console_utils.py:397 in inner │
│ │
│ 394 │ │ │ │ if isinstance(e, BdbQuit): return # so that nested catch_throw will respect each other │
│ 395 │ │ │ │ log(red(f'Runtime exception: {e}')) │
│ 396 │ │ │ │ stacktrace() │
│ ❱ 397 │ │ │ │ post_mortem() │
│ 398 │ │ │ │ if fatal: exit(1) # catched variable │
│ 399 │ │ return inner │
│ 400 │
│ │
│ /home/user/EnvGS/easyvolcap/utils/console_utils.py:244 in post_mortem │
│ │
│ 241 def post_mortem(*args, **kwargs): │
│ 242 │ stop_live() │
│ 243 │ stop_prog() │
│ ❱ 244 │ pdbr.post_mortem() # break on the last exception's stack for inpection │
│ 245 │
│ 246 │
│ 247 def line(obj): │
│ │
│ /home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/pdbr/main.py:37 in post_mortem │
│ │
│ 34 │ p.reset() │
│ 35 │ if value: │
│ 36 │ │ p.error(value) │
│ ❱ 37 │ p.interaction(None, traceback) │
│ 38 │
│ 39 │
│ 40 def pm(): │
│ │
│ /home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/IPython/core/debugger.py:492 in interaction │
│ │
│ 489 │ │ │ │ if isinstance(tb_or_exc, BaseException): │
│ 490 │ │ │ │ │ assert tb is not None, "main exception must have a traceback" │
│ 491 │ │ │ │ with self._hold_exceptions(_chained_exceptions): │
│ ❱ 492 │ │ │ │ │ OldPdb.interaction(self, frame, tb) │
│ 493 │ │ │ else: │
│ 494 │ │ │ │ OldPdb.interaction(self, frame, tb_or_exc) │
│ 495 │
│ │
│ /home/user/anaconda3/envs/envgs/lib/python3.11/pdb.py:418 in interaction │
│ │
│ 415 │ │ self.setup(frame, traceback) │
│ 416 │ │ # if we have more commands to process, do not show the stack entry │
│ 417 │ │ if not self.cmdqueue: │
│ ❱ 418 │ │ │ self.print_stack_entry(self.stack[self.curindex]) │
│ 419 │ │ self._cmdloop() │
│ 420 │ │ self.forget() │
│ 421 │
│ │
│ /home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/pdbr/_pdbr.py:449 in print_stack_entry │
│ │
│ 446 │ │ │ elif base == Pdb: │
│ 447 │ │ │ │ print_syntax(frame_lineno, prompt_prefix) │
│ 448 │ │ │ else: │
│ ❱ 449 │ │ │ │ print_syntax(frame_lineno, "", context) │
│ 450 │ │ │ │ │
│ 451 │ │ │ │ # vds: >> │
│ 452 │ │ │ │ frame, lineno = frame_lineno │
│ │
│ /home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/pdbr/_pdbr.py:437 in print_syntax │
│ │
│ 434 │ │ │ │ # Remove color format. │
│ 435 │ │ │ │ self._print( │
│ 436 │ │ │ │ │ Syntax( │
│ ❱ 437 │ │ │ │ │ │ ANSI_ESCAPE.sub("", self.format_stack_entry(*args)), │
│ 438 │ │ │ │ │ │ "python", │
│ 439 │ │ │ │ │ │ theme=self._theme or DEFAULT_THEME, │
│ 440 │ │ │ │ │ ), │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: Pdb.format_stack_entry() takes from 2 to 3 positional arguments but 4 were given

`

@tlxhlll
Copy link

tlxhlll commented Apr 9, 2025

Maybe this is because the dir of dataset has many files like .DS_Store which is generated by macos automatically.

@yyhosmium64
Copy link
Author

Maybe this is because the dir of dataset has many files like .DS_Store which is generated by macos automatically.

I think the issue might be related to how multiprocessing works in the DataLoader. Even though self.ims_bytes seems to store all the image bytes just fine, when I try decoding them one by one using cv2.imdecode, a few of them fail occasionally.

It looks like one of the subprocesses might be having trouble pulling image bytes from ims_bytes, which ends up causing either the error I mentioned above or sometimes even a segmentation fault.

I’m also considering the possibility that there might be something wrong with my hardware.

If anyone's seen something similar or has any ideas, I'd really appreciate the help!

@xbillowy
Copy link
Collaborator

I've never

Hello,

Regardless of the dataset, training proceeds for a while but then fails around iter 3000 (2700~3300) with the following error:

Runtime exception: Caught AttributeError in DataLoader worker process

The process number associated with the error varies each time. I’ve confirmed that the dataset is placed in the correct path and is not corrupted.

If you have any idea how to resolve this issue, I would really appreciate your help. Thank you in advance!

My environment : OS : Ubuntu 22.04 LTS GPU : RTX4090, CPU : i7-13700K, RAM : 64GB OptiX==7.7 (in this repo.), torch==2.3.1, CUDA==11.8, nvidia-driver==535.183.01

########################################################################################### `4:47:25 Runtime exception: Caught AttributeError in DataLoader worker process 6. console_utils.py:395 Original Traceback (most recent call last): File "/home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop data = fetcher.fetch(index) # type: ignore ^^^^^^^^^^^^^^^^^^^^ File "/home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch data = [self.dataset for idx in possibly_batched_index] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in data = [self.dataset for idx in possibly_batched_index] ~~~~~~~~~~~~^^^^^ File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 1409, in getitem output = self.get_ground_truth(index) # load images, camera parameters, etc (10ms) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 1255, in get_ground_truth rgb, msk, wet, dpt, bkg, norm = self.get_image(output.view_index, output.latent_index, output) # H, W, 3 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 1000, in get_image return self.get_image_from_bytes(view_index, latent_index) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 870, in get_image_from_bytes rgb = torch.as_tensor(load_image_from_bytes(im_bytes, normalize=True)) # 4-5ms for 400 * 592 jpeg, sooo slow ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/EnvGS/easyvolcap/utils/data_utils.py", line 1514, in load_image_from_bytes if image.ndim == 2: ^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'ndim'

╭────────────────────────────────────────────────────────────────────────────── Traceback (most recent call last) ──────────────────────────────────────────────────────────────────────────────╮ │ /home/user/EnvGS/easyvolcap/utils/console_utils.py:392 in inner │ │ │ │ ❱ 392 │ │ │ │ return func(*args, **kwargs) │ │ │ │ /home/user/EnvGS/easyvolcap/scripts/../../easyvolcap/scripts/main.py:308 in main │ │ │ │ ❱ 308 │ else: globals()args.type # invoke this (call callable_from_cfg -> call_from_cfg) │ │ │ │ /home/user/EnvGS/easyvolcap/engine/registry.py:56 in inner │ │ │ │ ❱ 56 │ │ return call_from_cfg(func, cfg) │ │ │ │ /home/user/EnvGS/easyvolcap/engine/registry.py:47 in call_from_cfg │ │ │ │ ❱ 47 │ return func(**call_args) │ │ │ │ /home/user/EnvGS/easyvolcap/scripts/../../easyvolcap/scripts/main.py:302 in train │ │ │ │ ❱ 302 │ launcher(**kwargs, runner_function=runner.train, runner_object=runner) │ │ │ │ /home/user/EnvGS/easyvolcap/scripts/../../easyvolcap/scripts/main.py:52 in launcher │ │ │ │ ❱ 52 │ runner_function() │ │ │ │ /home/user/EnvGS/easyvolcap/runners/volumetric_video_runner.py:322 in train │ │ │ │ ❱ 322 │ │ │ next(train_generator, None) # avoid reconstruction of the dataloader │ │ │ │ /home/user/EnvGS/easyvolcap/runners/volumetric_video_runner.py:422 in train_generator │ │ │ │ ❱ 422 │ │ │ │ index, flying_batch = next(enumerater, (None, None)) │ │ │ │ /home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/dataloader.py:631 in next │ │ │ │ ❱ 631 │ │ │ data = self._next_data() │ │ │ │ /home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/dataloader.py:1346 in _next_data │ │ │ │ ❱ 1346 │ │ │ │ return self._process_data(data) │ │ │ │ /home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/dataloader.py:1372 in _process_data │ │ │ │ ❱ 1372 │ │ │ data.reraise() │ │ │ │ /home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/_utils.py:705 in reraise │ │ │ │ ❱ 705 │ │ raise exception │ ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ AttributeError: Caught AttributeError in DataLoader worker process 6. Original Traceback (most recent call last): File "/home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop data = fetcher.fetch(index) # type: ignore[possibly-undefined] ^^^^^^^^^^^^^^^^^^^^ File "/home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in data = [self.dataset[idx] for idx in possibly_batched_index] ~~~~~~~~~~~~^^^^^ File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 1409, in getitem output = self.get_ground_truth(index) # load images, camera parameters, etc (10ms) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 1255, in get_ground_truth rgb, msk, wet, dpt, bkg, norm = self.get_image(output.view_index, output.latent_index, output) # H, W, 3 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 1000, in get_image return self.get_image_from_bytes(view_index, latent_index) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 870, in get_image_from_bytes rgb = torch.as_tensor(load_image_from_bytes(im_bytes, normalize=True)) # 4-5ms for 400 * 592 jpeg, sooo slow ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/EnvGS/easyvolcap/utils/data_utils.py", line 1514, in load_image_from_bytes if image.ndim == 2: ^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'ndim'

                                                       envgs/ref_real/envgs_toycar                                                            

0:33:53 6 3240 671923 163840 0.438146 23.401836 0.047936 0.240869 0.105387 0.132602 0.0231 0.6184 0.000727 1659 0:33:57 6 3241 671923 163840 0.383249 24.081594 0.043934 0.228212 0.096644 0.117945 0.0180 0.6118 0.000727 1646 0:34:01 6 3242 671923 163840 0.351371 24.747236 0.040949 0.234179 0.091455 0.109033 0.0184 0.5458 0.000727 1643 0:34:05 6 3243 671923 163840 0.339653 24.915779 0.040101 0.241138 0.090496 0.106043 0.0175 0.6338 0.000727 1656 0:34:09 6 3244 671923 163840 0.324889 24.712225 0.039866 0.228212 0.080322 0.102365 0.0190 0.5852 0.000727 1648 eta epoch iter num_pts env_num_pts ssim_loss psnr img_loss norm_loss gs_norm_loss loss data batch lr max_mem *** Caught AttributeError in DataLoader worker process 6. Original Traceback (most recent call last): File "/home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop data = fetcher.fetch(index) # type: ignore[possibly-undefined] ^^^^^^^^^^^^^^^^^^^^ File "/home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in data = [self.dataset[idx] for idx in possibly_batched_index] ~~~~~~~~~~~~^^^^^ File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 1409, in getitem output = self.get_ground_truth(index) # load images, camera parameters, etc (10ms) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 1255, in get_ground_truth rgb, msk, wet, dpt, bkg, norm = self.get_image(output.view_index, output.latent_index, output) # H, W, 3 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 1000, in get_image return self.get_image_from_bytes(view_index, latent_index) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 870, in get_image_from_bytes rgb = torch.as_tensor(load_image_from_bytes(im_bytes, normalize=True)) # 4-5ms for 400 * 592 jpeg, sooo slow ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/EnvGS/easyvolcap/utils/data_utils.py", line 1514, in load_image_from_bytes if image.ndim == 2: ^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'ndim'

╭────────────────────────────────────────────────────────────────────────────── Traceback (most recent call last) ──────────────────────────────────────────────────────────────────────────────╮ │ /home/user/EnvGS/easyvolcap/utils/console_utils.py:392 in inner │ │ │ │ 389 │ │ # This function catches errors and stops the execution for easier inspection │ │ 390 │ │ def inner(*args, **kwargs): │ │ 391 │ │ │ try: │ │ ❱ 392 │ │ │ │ return func(*args, **kwargs) │ │ 393 │ │ │ except Exception as e: │ │ 394 │ │ │ │ if isinstance(e, BdbQuit): return # so that nested catch_throw will respect each other │ │ 395 │ │ │ │ log(red(f'Runtime exception: {e}')) │ │ │ │ /home/user/EnvGS/easyvolcap/scripts/../../easyvolcap/scripts/main.py:308 in main │ │ │ │ 305 @catch_throw │ │ 306 def main(): │ │ 307 │ if cfg.mocking: log(f'{green("Modules imported.")} Mode: {yellow(args.type)}. No config loaded, pass config file using -c <PATH_TO_CONFIG>') # MARK: GLOBAL │ │ ❱ 308 │ else: globals()args.type # invoke this (call callable_from_cfg -> call_from_cfg) │ │ 309 │ │ 310 │ │ 311 # Module name == 'main', this is the outermost commandline entry point │ │ │ │ /home/user/EnvGS/easyvolcap/engine/registry.py:56 in inner │ │ │ │ 53 │ │ elif 'cfg' in kwargs: cfg: dict = kwargs['cfg'] │ │ 54 │ │ else: return func(*args, **kwargs) │ │ 55 │ │ cfg.update(kwargs) │ │ ❱ 56 │ │ return call_from_cfg(func, cfg) │ │ 57 │ return inner │ │ 58 │ │ 59 │ │ │ │ /home/user/EnvGS/easyvolcap/engine/registry.py:47 in call_from_cfg │ │ │ │ 44 │ │ │ │ else: │ │ 45 │ │ │ │ │ pass # in case of BASE_KEY, DELETE_KEY, APPEND_KEY, DEPRECATION_KEY │ │ 46 │ │ │ else: call_args[k] = v │ │ ❱ 47 │ return func(**call_args) │ │ 48 │ │ 49 │ │ 50 def callable_from_cfg(func: Callable): │ │ │ │ /home/user/EnvGS/easyvolcap/scripts/../../easyvolcap/scripts/main.py:302 in train │ │ │ │ 299 │ if dry_run: return runner # just construct everything, then return │ │ 300 │ │ │ 301 │ # The actual calling, with grace full exit │ │ ❱ 302 │ launcher(**kwargs, runner_function=runner.train, runner_object=runner) │ │ 303 │ │ 304 │ │ 305 @catch_throw │ │ │ │ /home/user/EnvGS/easyvolcap/scripts/../../easyvolcap/scripts/main.py:52 in launcher │ │ │ │ 49 │ # Give the user some time to save states │ │ 50 │ log('Launching runner for experiment:', magenta(exp_name)) │ │ 51 │ cfg.runner = runner_object # holds a global reference for hacky usage # MARK: GLOBAL │ │ ❱ 52 │ runner_function() │ │ 53 │ │ │ 54 │ profiler_stop() # already setup │ │ 55 │ torch.set_anomaly_enabled(prev_anomaly) │ │ │ │ /home/user/EnvGS/easyvolcap/runners/volumetric_video_runner.py:322 in train │ │ │ │ 319 │ │ for epoch in range(epoch, self.epochs): │ │ 320 │ │ │ │ │ 321 │ │ │ # Possible to make this a decorator? │ │ ❱ 322 │ │ │ next(train_generator, None) # avoid reconstruction of the dataloader │ │ 323 │ │ │ │ │ 324 │ │ │ # Leave some breathing room for other applications │ │ 325 │ │ │ if (epoch + 1) % self.empty_cache_ep == 0: │ │ │ │ /home/user/EnvGS/easyvolcap/runners/volumetric_video_runner.py:422 in train_generator │ │ │ │ 419 │ │ │ │ │ 420 │ │ │ # Get next data and start copying │ │ 421 │ │ │ with torch.cuda.stream(data_stream): │ │ ❱ 422 │ │ │ │ index, flying_batch = next(enumerater, (None, None)) │ │ 423 │ │ │ │ if flying_batch is None: │ │ 424 │ │ │ │ │ flying_batch = dotdict(meta=dotdict(iter=-1)) │ │ 425 │ │ │ │ else: │ │ │ │ /home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/dataloader.py:631 in next │ │ │ │ 628 │ │ │ if self._sampler_iter is None: │ │ 629 │ │ │ │ # TODO(pytorch/pytorch#76750) │ │ 630 │ │ │ │ self._reset() # type: ignore[call-arg] │ │ ❱ 631 │ │ │ data = self._next_data() │ │ 632 │ │ │ self._num_yielded += 1 │ │ 633 │ │ │ if self._dataset_kind == _DatasetKind.Iterable and \ │ │ 634 │ │ │ │ │ self._IterableDataset_len_called is not None and \ │ │ │ │ /home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/dataloader.py:1346 in _next_data │ │ │ │ 1343 │ │ │ │ self._task_info[idx] += (data,) │ │ 1344 │ │ │ else: │ │ 1345 │ │ │ │ del self._task_info[idx] │ │ ❱ 1346 │ │ │ │ return self._process_data(data) │ │ 1347 │ │ │ 1348 │ def _try_put_index(self): │ │ 1349 │ │ assert self._tasks_outstanding < self._prefetch_factor * self._num_workers │ │ │ │ /home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/dataloader.py:1372 in _process_data │ │ │ │ 1369 │ │ self._rcvd_idx += 1 │ │ 1370 │ │ self._try_put_index() │ │ 1371 │ │ if isinstance(data, ExceptionWrapper): │ │ ❱ 1372 │ │ │ data.reraise() │ │ 1373 │ │ return data │ │ 1374 │ │ │ 1375 │ def _mark_worker_as_unavailable(self, worker_id, shutdown=False): │ │ │ │ /home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/_utils.py:705 in reraise │ │ │ │ 702 │ │ │ # If the exception takes multiple arguments, don't try to │ │ 703 │ │ │ # instantiate since we don't know how to │ │ 704 │ │ │ raise RuntimeError(msg) from None │ │ ❱ 705 │ │ raise exception │ │ 706 │ │ 707 │ │ 708 def _get_available_device_type(): │ ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ AttributeError: Caught AttributeError in DataLoader worker process 6. Original Traceback (most recent call last): File "/home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop data = fetcher.fetch(index) # type: ignore[possibly-undefined] ^^^^^^^^^^^^^^^^^^^^ File "/home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in data = [self.dataset[idx] for idx in possibly_batched_index] ~~~~~~~~~~~~^^^^^ File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 1409, in getitem output = self.get_ground_truth(index) # load images, camera parameters, etc (10ms) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 1255, in get_ground_truth rgb, msk, wet, dpt, bkg, norm = self.get_image(output.view_index, output.latent_index, output) # H, W, 3 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 1000, in get_image return self.get_image_from_bytes(view_index, latent_index) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 870, in get_image_from_bytes rgb = torch.as_tensor(load_image_from_bytes(im_bytes, normalize=True)) # 4-5ms for 400 * 592 jpeg, sooo slow ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/EnvGS/easyvolcap/utils/data_utils.py", line 1514, in load_image_from_bytes if image.ndim == 2: ^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'ndim'

During handling of the above exception, another exception occurred:

╭────────────────────────────────────────────────────────────────────────────── Traceback (most recent call last) ──────────────────────────────────────────────────────────────────────────────╮ │ /home/user/EnvGS/easyvolcap/scripts/../../easyvolcap/scripts/main.py:313 in │ │ │ │ 310 │ │ 311 # Module name == 'main', this is the outermost commandline entry point │ │ 312 if name == 'main': │ │ ❱ 313 │ main() │ │ 314 │ │ │ │ /home/user/EnvGS/easyvolcap/utils/console_utils.py:397 in inner │ │ │ │ 394 │ │ │ │ if isinstance(e, BdbQuit): return # so that nested catch_throw will respect each other │ │ 395 │ │ │ │ log(red(f'Runtime exception: {e}')) │ │ 396 │ │ │ │ stacktrace() │ │ ❱ 397 │ │ │ │ post_mortem() │ │ 398 │ │ │ │ if fatal: exit(1) # catched variable │ │ 399 │ │ return inner │ │ 400 │ │ │ │ /home/user/EnvGS/easyvolcap/utils/console_utils.py:244 in post_mortem │ │ │ │ 241 def post_mortem(*args, **kwargs): │ │ 242 │ stop_live() │ │ 243 │ stop_prog() │ │ ❱ 244 │ pdbr.post_mortem() # break on the last exception's stack for inpection │ │ 245 │ │ 246 │ │ 247 def line(obj): │ │ │ │ /home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/pdbr/main.py:37 in post_mortem │ │ │ │ 34 │ p.reset() │ │ 35 │ if value: │ │ 36 │ │ p.error(value) │ │ ❱ 37 │ p.interaction(None, traceback) │ │ 38 │ │ 39 │ │ 40 def pm(): │ │ │ │ /home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/IPython/core/debugger.py:492 in interaction │ │ │ │ 489 │ │ │ │ if isinstance(tb_or_exc, BaseException): │ │ 490 │ │ │ │ │ assert tb is not None, "main exception must have a traceback" │ │ 491 │ │ │ │ with self._hold_exceptions(_chained_exceptions): │ │ ❱ 492 │ │ │ │ │ OldPdb.interaction(self, frame, tb) │ │ 493 │ │ │ else: │ │ 494 │ │ │ │ OldPdb.interaction(self, frame, tb_or_exc) │ │ 495 │ │ │ │ /home/user/anaconda3/envs/envgs/lib/python3.11/pdb.py:418 in interaction │ │ │ │ 415 │ │ self.setup(frame, traceback) │ │ 416 │ │ # if we have more commands to process, do not show the stack entry │ │ 417 │ │ if not self.cmdqueue: │ │ ❱ 418 │ │ │ self.print_stack_entry(self.stack[self.curindex]) │ │ 419 │ │ self._cmdloop() │ │ 420 │ │ self.forget() │ │ 421 │ │ │ │ /home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/pdbr/_pdbr.py:449 in print_stack_entry │ │ │ │ 446 │ │ │ elif base == Pdb: │ │ 447 │ │ │ │ print_syntax(frame_lineno, prompt_prefix) │ │ 448 │ │ │ else: │ │ ❱ 449 │ │ │ │ print_syntax(frame_lineno, "", context) │ │ 450 │ │ │ │ │ │ 451 │ │ │ │ # vds: >> │ │ 452 │ │ │ │ frame, lineno = frame_lineno │ │ │ │ /home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/pdbr/_pdbr.py:437 in print_syntax │ │ │ │ 434 │ │ │ │ # Remove color format. │ │ 435 │ │ │ │ self._print( │ │ 436 │ │ │ │ │ Syntax( │ │ ❱ 437 │ │ │ │ │ │ ANSI_ESCAPE.sub("", self.format_stack_entry(*args)), │ │ 438 │ │ │ │ │ │ "python", │ │ 439 │ │ │ │ │ │ theme=self._theme or DEFAULT_THEME, │ │ 440 │ │ │ │ │ ), │ ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ TypeError: Pdb.format_stack_entry() takes from 2 to 3 positional arguments but 4 were given

`

Hi, I've never encountered such an error before in all the machines I've used.

You can try to set dataloader_cfg.dataset_cfg.disk_dataset=True and val_dataloader_cfg.dataset_cfg.disk_dataset=True, which will load images from the disk every batch, but not the cached bytes in memory. NOTE that this will slow down the training process.

You can also try to set some breakpoints to see what actually happened in the dataset, remember to set the dataloader_cfg.num_workers=0 to avoid directly exiting.

@yyhosmium64
Copy link
Author

yyhosmium64 commented Apr 12, 2025

I've never

Hello,
Regardless of the dataset, training proceeds for a while but then fails around iter 3000 (2700~3300) with the following error:
Runtime exception: Caught AttributeError in DataLoader worker process
The process number associated with the error varies each time. I’ve confirmed that the dataset is placed in the correct path and is not corrupted.
If you have any idea how to resolve this issue, I would really appreciate your help. Thank you in advance!
My environment : OS : Ubuntu 22.04 LTS GPU : RTX4090, CPU : i7-13700K, RAM : 64GB OptiX==7.7 (in this repo.), torch==2.3.1, CUDA==11.8, nvidia-driver==535.183.01
########################################################################################### `4:47:25 Runtime exception: Caught AttributeError in DataLoader worker process 6. console_utils.py:395 Original Traceback (most recent call last): File "/home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop data = fetcher.fetch(index) # type: ignore ^^^^^^^^^^^^^^^^^^^^ File "/home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch data = [self.dataset for idx in possibly_batched_index] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in data = [self.dataset for idx in possibly_batched_index] ~~~~~~~~~~~~^^^^^ File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 1409, in getitem output = self.get_ground_truth(index) # load images, camera parameters, etc (10ms) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 1255, in get_ground_truth rgb, msk, wet, dpt, bkg, norm = self.get_image(output.view_index, output.latent_index, output) # H, W, 3 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 1000, in get_image return self.get_image_from_bytes(view_index, latent_index) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 870, in get_image_from_bytes rgb = torch.as_tensor(load_image_from_bytes(im_bytes, normalize=True)) # 4-5ms for 400 * 592 jpeg, sooo slow ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/EnvGS/easyvolcap/utils/data_utils.py", line 1514, in load_image_from_bytes if image.ndim == 2: ^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'ndim'
╭────────────────────────────────────────────────────────────────────────────── Traceback (most recent call last) ──────────────────────────────────────────────────────────────────────────────╮ │ /home/user/EnvGS/easyvolcap/utils/console_utils.py:392 in inner │ │ │ │ ❱ 392 │ │ │ │ return func(*args, **kwargs) │ │ │ │ /home/user/EnvGS/easyvolcap/scripts/../../easyvolcap/scripts/main.py:308 in main │ │ │ │ ❱ 308 │ else: globals()args.type # invoke this (call callable_from_cfg -> call_from_cfg) │ │ │ │ /home/user/EnvGS/easyvolcap/engine/registry.py:56 in inner │ │ │ │ ❱ 56 │ │ return call_from_cfg(func, cfg) │ │ │ │ /home/user/EnvGS/easyvolcap/engine/registry.py:47 in call_from_cfg │ │ │ │ ❱ 47 │ return func(**call_args) │ │ │ │ /home/user/EnvGS/easyvolcap/scripts/../../easyvolcap/scripts/main.py:302 in train │ │ │ │ ❱ 302 │ launcher(**kwargs, runner_function=runner.train, runner_object=runner) │ │ │ │ /home/user/EnvGS/easyvolcap/scripts/../../easyvolcap/scripts/main.py:52 in launcher │ │ │ │ ❱ 52 │ runner_function() │ │ │ │ /home/user/EnvGS/easyvolcap/runners/volumetric_video_runner.py:322 in train │ │ │ │ ❱ 322 │ │ │ next(train_generator, None) # avoid reconstruction of the dataloader │ │ │ │ /home/user/EnvGS/easyvolcap/runners/volumetric_video_runner.py:422 in train_generator │ │ │ │ ❱ 422 │ │ │ │ index, flying_batch = next(enumerater, (None, None)) │ │ │ │ /home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/dataloader.py:631 in next │ │ │ │ ❱ 631 │ │ │ data = self._next_data() │ │ │ │ /home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/dataloader.py:1346 in _next_data │ │ │ │ ❱ 1346 │ │ │ │ return self._process_data(data) │ │ │ │ /home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/dataloader.py:1372 in _process_data │ │ │ │ ❱ 1372 │ │ │ data.reraise() │ │ │ │ /home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/_utils.py:705 in reraise │ │ │ │ ❱ 705 │ │ raise exception │ ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ AttributeError: Caught AttributeError in DataLoader worker process 6. Original Traceback (most recent call last): File "/home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop data = fetcher.fetch(index) # type: ignore[possibly-undefined] ^^^^^^^^^^^^^^^^^^^^ File "/home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in data = [self.dataset[idx] for idx in possibly_batched_index] ~~~~~~~~~~~~^^^^^ File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 1409, in getitem output = self.get_ground_truth(index) # load images, camera parameters, etc (10ms) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 1255, in get_ground_truth rgb, msk, wet, dpt, bkg, norm = self.get_image(output.view_index, output.latent_index, output) # H, W, 3 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 1000, in get_image return self.get_image_from_bytes(view_index, latent_index) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 870, in get_image_from_bytes rgb = torch.as_tensor(load_image_from_bytes(im_bytes, normalize=True)) # 4-5ms for 400 * 592 jpeg, sooo slow ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/EnvGS/easyvolcap/utils/data_utils.py", line 1514, in load_image_from_bytes if image.ndim == 2: ^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'ndim'

                                                       envgs/ref_real/envgs_toycar                                                            

0:33:53 6 3240 671923 163840 0.438146 23.401836 0.047936 0.240869 0.105387 0.132602 0.0231 0.6184 0.000727 1659 0:33:57 6 3241 671923 163840 0.383249 24.081594 0.043934 0.228212 0.096644 0.117945 0.0180 0.6118 0.000727 1646 0:34:01 6 3242 671923 163840 0.351371 24.747236 0.040949 0.234179 0.091455 0.109033 0.0184 0.5458 0.000727 1643 0:34:05 6 3243 671923 163840 0.339653 24.915779 0.040101 0.241138 0.090496 0.106043 0.0175 0.6338 0.000727 1656 0:34:09 6 3244 671923 163840 0.324889 24.712225 0.039866 0.228212 0.080322 0.102365 0.0190 0.5852 0.000727 1648 eta epoch iter num_pts env_num_pts ssim_loss psnr img_loss norm_loss gs_norm_loss loss data batch lr max_mem *** Caught AttributeError in DataLoader worker process 6. Original Traceback (most recent call last): File "/home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop data = fetcher.fetch(index) # type: ignore[possibly-undefined] ^^^^^^^^^^^^^^^^^^^^ File "/home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in data = [self.dataset[idx] for idx in possibly_batched_index] ~~~~~~~~~~~~^^^^^ File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 1409, in getitem output = self.get_ground_truth(index) # load images, camera parameters, etc (10ms) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 1255, in get_ground_truth rgb, msk, wet, dpt, bkg, norm = self.get_image(output.view_index, output.latent_index, output) # H, W, 3 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 1000, in get_image return self.get_image_from_bytes(view_index, latent_index) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 870, in get_image_from_bytes rgb = torch.as_tensor(load_image_from_bytes(im_bytes, normalize=True)) # 4-5ms for 400 * 592 jpeg, sooo slow ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/EnvGS/easyvolcap/utils/data_utils.py", line 1514, in load_image_from_bytes if image.ndim == 2: ^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'ndim'
╭────────────────────────────────────────────────────────────────────────────── Traceback (most recent call last) ──────────────────────────────────────────────────────────────────────────────╮ │ /home/user/EnvGS/easyvolcap/utils/console_utils.py:392 in inner │ │ │ │ 389 │ │ # This function catches errors and stops the execution for easier inspection │ │ 390 │ │ def inner(*args, **kwargs): │ │ 391 │ │ │ try: │ │ ❱ 392 │ │ │ │ return func(*args, **kwargs) │ │ 393 │ │ │ except Exception as e: │ │ 394 │ │ │ │ if isinstance(e, BdbQuit): return # so that nested catch_throw will respect each other │ │ 395 │ │ │ │ log(red(f'Runtime exception: {e}')) │ │ │ │ /home/user/EnvGS/easyvolcap/scripts/../../easyvolcap/scripts/main.py:308 in main │ │ │ │ 305 @catch_throw │ │ 306 def main(): │ │ 307 │ if cfg.mocking: log(f'{green("Modules imported.")} Mode: {yellow(args.type)}. No config loaded, pass config file using -c <PATH_TO_CONFIG>') # MARK: GLOBAL │ │ ❱ 308 │ else: globals()args.type # invoke this (call callable_from_cfg -> call_from_cfg) │ │ 309 │ │ 310 │ │ 311 # Module name == 'main', this is the outermost commandline entry point │ │ │ │ /home/user/EnvGS/easyvolcap/engine/registry.py:56 in inner │ │ │ │ 53 │ │ elif 'cfg' in kwargs: cfg: dict = kwargs['cfg'] │ │ 54 │ │ else: return func(*args, **kwargs) │ │ 55 │ │ cfg.update(kwargs) │ │ ❱ 56 │ │ return call_from_cfg(func, cfg) │ │ 57 │ return inner │ │ 58 │ │ 59 │ │ │ │ /home/user/EnvGS/easyvolcap/engine/registry.py:47 in call_from_cfg │ │ │ │ 44 │ │ │ │ else: │ │ 45 │ │ │ │ │ pass # in case of BASE_KEY, DELETE_KEY, APPEND_KEY, DEPRECATION_KEY │ │ 46 │ │ │ else: call_args[k] = v │ │ ❱ 47 │ return func(**call_args) │ │ 48 │ │ 49 │ │ 50 def callable_from_cfg(func: Callable): │ │ │ │ /home/user/EnvGS/easyvolcap/scripts/../../easyvolcap/scripts/main.py:302 in train │ │ │ │ 299 │ if dry_run: return runner # just construct everything, then return │ │ 300 │ │ │ 301 │ # The actual calling, with grace full exit │ │ ❱ 302 │ launcher(**kwargs, runner_function=runner.train, runner_object=runner) │ │ 303 │ │ 304 │ │ 305 @catch_throw │ │ │ │ /home/user/EnvGS/easyvolcap/scripts/../../easyvolcap/scripts/main.py:52 in launcher │ │ │ │ 49 │ # Give the user some time to save states │ │ 50 │ log('Launching runner for experiment:', magenta(exp_name)) │ │ 51 │ cfg.runner = runner_object # holds a global reference for hacky usage # MARK: GLOBAL │ │ ❱ 52 │ runner_function() │ │ 53 │ │ │ 54 │ profiler_stop() # already setup │ │ 55 │ torch.set_anomaly_enabled(prev_anomaly) │ │ │ │ /home/user/EnvGS/easyvolcap/runners/volumetric_video_runner.py:322 in train │ │ │ │ 319 │ │ for epoch in range(epoch, self.epochs): │ │ 320 │ │ │ │ │ 321 │ │ │ # Possible to make this a decorator? │ │ ❱ 322 │ │ │ next(train_generator, None) # avoid reconstruction of the dataloader │ │ 323 │ │ │ │ │ 324 │ │ │ # Leave some breathing room for other applications │ │ 325 │ │ │ if (epoch + 1) % self.empty_cache_ep == 0: │ │ │ │ /home/user/EnvGS/easyvolcap/runners/volumetric_video_runner.py:422 in train_generator │ │ │ │ 419 │ │ │ │ │ 420 │ │ │ # Get next data and start copying │ │ 421 │ │ │ with torch.cuda.stream(data_stream): │ │ ❱ 422 │ │ │ │ index, flying_batch = next(enumerater, (None, None)) │ │ 423 │ │ │ │ if flying_batch is None: │ │ 424 │ │ │ │ │ flying_batch = dotdict(meta=dotdict(iter=-1)) │ │ 425 │ │ │ │ else: │ │ │ │ /home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/dataloader.py:631 in next │ │ │ │ 628 │ │ │ if self._sampler_iter is None: │ │ 629 │ │ │ │ # TODO(pytorch/pytorch#76750) │ │ 630 │ │ │ │ self._reset() # type: ignore[call-arg] │ │ ❱ 631 │ │ │ data = self._next_data() │ │ 632 │ │ │ self._num_yielded += 1 │ │ 633 │ │ │ if self._dataset_kind == _DatasetKind.Iterable and \ │ │ 634 │ │ │ │ │ self._IterableDataset_len_called is not None and \ │ │ │ │ /home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/dataloader.py:1346 in _next_data │ │ │ │ 1343 │ │ │ │ self._task_info[idx] += (data,) │ │ 1344 │ │ │ else: │ │ 1345 │ │ │ │ del self._task_info[idx] │ │ ❱ 1346 │ │ │ │ return self._process_data(data) │ │ 1347 │ │ │ 1348 │ def _try_put_index(self): │ │ 1349 │ │ assert self._tasks_outstanding < self._prefetch_factor * self._num_workers │ │ │ │ /home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/dataloader.py:1372 in _process_data │ │ │ │ 1369 │ │ self._rcvd_idx += 1 │ │ 1370 │ │ self._try_put_index() │ │ 1371 │ │ if isinstance(data, ExceptionWrapper): │ │ ❱ 1372 │ │ │ data.reraise() │ │ 1373 │ │ return data │ │ 1374 │ │ │ 1375 │ def _mark_worker_as_unavailable(self, worker_id, shutdown=False): │ │ │ │ /home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/_utils.py:705 in reraise │ │ │ │ 702 │ │ │ # If the exception takes multiple arguments, don't try to │ │ 703 │ │ │ # instantiate since we don't know how to │ │ 704 │ │ │ raise RuntimeError(msg) from None │ │ ❱ 705 │ │ raise exception │ │ 706 │ │ 707 │ │ 708 def _get_available_device_type(): │ ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ AttributeError: Caught AttributeError in DataLoader worker process 6. Original Traceback (most recent call last): File "/home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop data = fetcher.fetch(index) # type: ignore[possibly-undefined] ^^^^^^^^^^^^^^^^^^^^ File "/home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in data = [self.dataset[idx] for idx in possibly_batched_index] ~~~~~~~~~~~~^^^^^ File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 1409, in getitem output = self.get_ground_truth(index) # load images, camera parameters, etc (10ms) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 1255, in get_ground_truth rgb, msk, wet, dpt, bkg, norm = self.get_image(output.view_index, output.latent_index, output) # H, W, 3 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 1000, in get_image return self.get_image_from_bytes(view_index, latent_index) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/EnvGS/easyvolcap/dataloaders/datasets/volumetric_video_dataset.py", line 870, in get_image_from_bytes rgb = torch.as_tensor(load_image_from_bytes(im_bytes, normalize=True)) # 4-5ms for 400 * 592 jpeg, sooo slow ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/EnvGS/easyvolcap/utils/data_utils.py", line 1514, in load_image_from_bytes if image.ndim == 2: ^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'ndim'
During handling of the above exception, another exception occurred:
╭────────────────────────────────────────────────────────────────────────────── Traceback (most recent call last) ──────────────────────────────────────────────────────────────────────────────╮ │ /home/user/EnvGS/easyvolcap/scripts/../../easyvolcap/scripts/main.py:313 in │ │ │ │ 310 │ │ 311 # Module name == 'main', this is the outermost commandline entry point │ │ 312 if name == 'main': │ │ ❱ 313 │ main() │ │ 314 │ │ │ │ /home/user/EnvGS/easyvolcap/utils/console_utils.py:397 in inner │ │ │ │ 394 │ │ │ │ if isinstance(e, BdbQuit): return # so that nested catch_throw will respect each other │ │ 395 │ │ │ │ log(red(f'Runtime exception: {e}')) │ │ 396 │ │ │ │ stacktrace() │ │ ❱ 397 │ │ │ │ post_mortem() │ │ 398 │ │ │ │ if fatal: exit(1) # catched variable │ │ 399 │ │ return inner │ │ 400 │ │ │ │ /home/user/EnvGS/easyvolcap/utils/console_utils.py:244 in post_mortem │ │ │ │ 241 def post_mortem(*args, **kwargs): │ │ 242 │ stop_live() │ │ 243 │ stop_prog() │ │ ❱ 244 │ pdbr.post_mortem() # break on the last exception's stack for inpection │ │ 245 │ │ 246 │ │ 247 def line(obj): │ │ │ │ /home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/pdbr/main.py:37 in post_mortem │ │ │ │ 34 │ p.reset() │ │ 35 │ if value: │ │ 36 │ │ p.error(value) │ │ ❱ 37 │ p.interaction(None, traceback) │ │ 38 │ │ 39 │ │ 40 def pm(): │ │ │ │ /home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/IPython/core/debugger.py:492 in interaction │ │ │ │ 489 │ │ │ │ if isinstance(tb_or_exc, BaseException): │ │ 490 │ │ │ │ │ assert tb is not None, "main exception must have a traceback" │ │ 491 │ │ │ │ with self._hold_exceptions(_chained_exceptions): │ │ ❱ 492 │ │ │ │ │ OldPdb.interaction(self, frame, tb) │ │ 493 │ │ │ else: │ │ 494 │ │ │ │ OldPdb.interaction(self, frame, tb_or_exc) │ │ 495 │ │ │ │ /home/user/anaconda3/envs/envgs/lib/python3.11/pdb.py:418 in interaction │ │ │ │ 415 │ │ self.setup(frame, traceback) │ │ 416 │ │ # if we have more commands to process, do not show the stack entry │ │ 417 │ │ if not self.cmdqueue: │ │ ❱ 418 │ │ │ self.print_stack_entry(self.stack[self.curindex]) │ │ 419 │ │ self._cmdloop() │ │ 420 │ │ self.forget() │ │ 421 │ │ │ │ /home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/pdbr/_pdbr.py:449 in print_stack_entry │ │ │ │ 446 │ │ │ elif base == Pdb: │ │ 447 │ │ │ │ print_syntax(frame_lineno, prompt_prefix) │ │ 448 │ │ │ else: │ │ ❱ 449 │ │ │ │ print_syntax(frame_lineno, "", context) │ │ 450 │ │ │ │ │ │ 451 │ │ │ │ # vds: >> │ │ 452 │ │ │ │ frame, lineno = frame_lineno │ │ │ │ /home/user/anaconda3/envs/envgs/lib/python3.11/site-packages/pdbr/_pdbr.py:437 in print_syntax │ │ │ │ 434 │ │ │ │ # Remove color format. │ │ 435 │ │ │ │ self._print( │ │ 436 │ │ │ │ │ Syntax( │ │ ❱ 437 │ │ │ │ │ │ ANSI_ESCAPE.sub("", self.format_stack_entry(*args)), │ │ 438 │ │ │ │ │ │ "python", │ │ 439 │ │ │ │ │ │ theme=self._theme or DEFAULT_THEME, │ │ 440 │ │ │ │ │ ), │ ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ TypeError: Pdb.format_stack_entry() takes from 2 to 3 positional arguments but 4 were given

`

Hi, I've never encountered such an error before in all the machines I've used.

You can try to set dataloader_cfg.dataset_cfg.disk_dataset=True and val_dataloader_cfg.dataset_cfg.disk_dataset=True, which will load images from the disk every batch, but not the cached bytes in memory. NOTE that this will slow down the training process.

You can also try to set some breakpoints to see what actually happened in the dataset, remember to set the dataloader_cfg.num_workers=0 to avoid directly exiting.

Hi, I found the cause of the problem — it was actually a silly mistake on my end.

I cloned a base environment that I use for my setup (with some default packages pre-installed), and during that process, libstdcxx-ng==14 got installed, which caused the error.

Also, while trying out the methods you suggested, I noticed something that might be a bug:

In get_image_from_disk(), it looks like this line:

dp, _, _, _ = load_resize_undist_im_bytes(nm, ...)

should actually be:

nm, _, _, _ = load_resize_undist_im_bytes(nm, ...)

Thanks a lot for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants