Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nuage solitaire avec unique point dans un batch = crash de l'apprentissage #83

Closed
2 of 3 tasks
CharlesGaydon opened this issue Aug 16, 2023 · 3 comments
Closed
2 of 3 tasks

Comments

@CharlesGaydon
Copy link
Collaborator

CharlesGaydon commented Aug 16, 2023

J'ai d'abord cru que c'était du à la configuration par défaut des DataLoader pour laquelle drop_last=False. Mais l'erreur n'apparaît pas en fin d'epoch mais avant : à 93% de la donnée (et on est bien au training_step d'après les logs)

ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 512])

La taille 1,512 m'évoque cependant un combo de :

  • Un seul point dans le nuage -> possible depuis le merge de Dev: allow inference for smallest clouds possible + addresses many smaller issues. #81
  • Un seul nuage dans le batch -> Ça c'est inattendu : ça doit pouvoir arriver quand on part de tuiles de 1km*1km et qu'on ignore des patches de données trop petites. Par contre ça ne devrait pas arriver sur un dataset issu de pacasam (ce qui est le cas ici). Explication pas hyper convaincante : on était bien à la fin de l'epoch et les logs sont simplement "en retard".

Bon, sans avoir l'explication de pourquoi la situation se présente ici, elle peut se présenter sur des données normalesqu'on découpe à la volée donc ça vaut le coup de s'en soucier.

  • Déjà on peut commencer par passer drop_last=True
  • Ensuite on peut éditer GeometricNoneProofCollater pour vérifier la validité du batch.
  • EDIT: on répare le subsampling qui a lieu dans MinimumNumNodes.

Autrement : dans cette situation le bloc qui n'accepte pas un unique point est mlp_summit. Peut-être que garder minimum deux points dans la décimation qui précède directement mlp_summit est une solution plus efficace. Dans ce cas ça se passe dans ces lignes :

On passe de

    # Decimation should not empty clouds completely.
    decimated_bincount = torch.max(
        torch.ones_like(decimated_bincount), decimated_bincount
    )

à

    # Decimation should not empty clouds completely.
    decimated_bincount = torch.max(
        2 * torch.ones_like(decimated_bincount), decimated_bincount
    )

sans impact sur le comportement normal du modèle.

Trace complète :

Epoch 43:  93%|█████████▎| 1396/1501 [21:46<01:38,  1.07it/s, loss=0.183, v_num=d139, train/iou_step=0.687, val/iou_step=0.675, val/iou_epoch=0.701, train/iou_epoch=0.729]
Epoch 43:  93%|█████████▎| 1396/1501 [21:46<01:38,  1.07it/s, loss=0.186, v_num=d139, train/iou_step=0.424, val/iou_step=0.675, val/iou_epoch=0.701, train/iou_epoch=0.729]
Epoch 43:  93%|█████████▎| 1397/1501 [21:47<01:37,  1.07it/s, loss=0.186, v_num=d139, train/iou_step=0.424, val/iou_step=0.675, val/iou_epoch=0.701, train/iou_epoch=0.729]
Epoch 43:  93%|█████████▎| 1397/1501 [21:47<01:37,  1.07it/s, loss=0.183, v_num=d139, train/iou_step=0.547, val/iou_step=0.675, val/iou_epoch=0.701, train/iou_epoch=0.729]
Epoch 43:  93%|█████████▎| 1398/1501 [21:48<01:36,  1.07it/s, loss=0.183, v_num=d139, train/iou_step=0.547, val/iou_step=0.675, val/iou_epoch=0.701, train/iou_epoch=0.729]
Epoch 43:  93%|█████████▎| 1398/1501 [21:48<01:36,  1.07it/s, loss=0.184, v_num=d139, train/iou_step=0.535, val/iou_step=0.675, val/iou_epoch=0.701, train/iou_epoch=0.729]
Epoch 43:  93%|█████████▎| 1399/1501 [21:49<01:35,  1.07it/s, loss=0.184, v_num=d139, train/iou_step=0.535, val/iou_step=0.675, val/iou_epoch=0.701, train/iou_epoch=0.729]
Epoch 43:  93%|█████████▎| 1399/1501 [21:49<01:35,  1.07it/s, loss=0.183, v_num=d139, train/iou_step=0.693, val/iou_step=0.675, val/iou_epoch=0.701, train/iou_epoch=0.729]
Epoch 43:  93%|█████████▎| 1400/1501 [21:49<01:34,  1.07it/s, loss=0.183, v_num=d139, train/iou_step=0.693, val/iou_step=0.675, val/iou_epoch=0.701, train/iou_epoch=0.729]
Epoch 43:  93%|█████████▎| 1400/1501 [21:49<01:34,  1.07it/s, loss=0.189, v_num=d139, train/iou_step=0.479, val/iou_step=0.675, val/iou_epoch=0.701, train/iou_epoch=0.729]
Epoch 43:  93%|█████████▎| 1401/1501 [21:51<01:33,  1.07it/s, loss=0.189, v_num=d139, train/iou_step=0.479, val/iou_step=0.675, val/iou_epoch=0.701, train/iou_epoch=0.729]
Epoch 43:  93%|█████████▎| 1401/1501 [21:51<01:33,  1.07it/s, loss=0.18, v_num=d139, train/iou_step=0.394, val/iou_step=0.675, val/iou_epoch=0.701, train/iou_epoch=0.729] Error executing job with overrides: ['task.task_name=fit', 'datamodule.hdf5_file_path=/var/data/CGaydon/myria3d_datasets/20230727_75km2_diverse.hdf5', 'dataset_description=20230601_lidarhd_pacasam_dataset', 'datamodule.tile_width=50', 'experiment=RandLaNet_base_run_FR-MultiGPU', 'logger.comet.experiment_name=20230727_75km2_diverse-2GPUS', 'trainer.gpus=[0,1]', 'trainer.min_epochs=300', 'trainer.max_epochs=300']
Traceback (most recent call last):
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 777, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1199, in _run
    self._dispatch()
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1279, in _dispatch
    self.training_type_plugin.start_training(self)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 202, in start_training
    self._results = trainer.run_stage()
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1289, in run_stage
    return self._run_train()
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1319, in _run_train
    self.fit_loop.run()
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 145, in run
    self.advance(*args, **kwargs)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/loops/fit_loop.py", line 234, in advance
    self.epoch_loop.run(data_fetcher)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 145, in run
    self.advance(*args, **kwargs)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 193, in advance
    batch_output = self.batch_loop.run(batch, batch_idx)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 145, in run
    self.advance(*args, **kwargs)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 88, in advance
    outputs = self.optimizer_loop.run(split_batch, optimizers, batch_idx)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 145, in run
    self.advance(*args, **kwargs)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 215, in advance
    result = self._run_optimization(
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 266, in _run_optimization
    self._optimizer_step(optimizer, opt_idx, batch_idx, closure)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 378, in _optimizer_step
    lightning_module.optimizer_step(
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/core/lightning.py", line 1652, in optimizer_step
    optimizer.step(closure=optimizer_closure)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/core/optimizer.py", line 164, in step
    trainer.accelerator.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 336, in optimizer_step
    self.precision_plugin.optimizer_step(model, optimizer, opt_idx, closure, **kwargs)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 163, in optimizer_step
    optimizer.step(closure=closure, **kwargs)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/torch/optim/optimizer.py", line 88, in wrapper
    return func(*args, **kwargs)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/torch/optim/adam.py", line 100, in step
    loss = closure()
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 148, in _wrap_closure
    closure_result = closure()
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 160, in __call__
    self._result = self.closure(*args, **kwargs)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 142, in closure
    step_output = self._step_fn()
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 435, in _training_step
    training_step_output = self.trainer.accelerator.training_step(step_kwargs)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 216, in training_step
    return self.training_type_plugin.training_step(*step_kwargs.values())
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/ddp.py", line 439, in training_step
    return self.model(*args, **kwargs)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 963, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/overrides/base.py", line 81, in forward
    output = self.module.training_step(*inputs, **kwargs)
  File "/home/CGaydon/repositories/myria3d/myria3d/models/model.py", line 139, in training_step
    targets, logits = self.forward(batch)
  File "/home/CGaydon/repositories/myria3d/myria3d/models/model.py", line 93, in forward
    logits = self.model(batch.x, batch.pos, batch.batch, batch.ptr)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/CGaydon/repositories/myria3d/myria3d/models/modules/pyg_randla_net.py", line 73, in forward
    self.mlp_summit(b4_out_decimated[0]),
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/torch_geometric/nn/models/mlp.py", line 186, in forward
    x = norm(x)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/torch_geometric/nn/norm/batch_norm.py", line 45, in forward
    return self.module(x)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/torch/nn/modules/batchnorm.py", line 168, in forward
    return F.batch_norm(
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/torch/nn/functional.py", line 2419, in batch_norm
    _verify_batch_size(input.size())
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/torch/nn/functional.py", line 2387, in _verify_batch_size
    raise ValueError("Expected more than 1 value per channel when training, got input size {}".format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 512])

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/CGaydon/repositories/myria3d/run.py", line 121, in <module>
    launch_train()
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/hydra/main.py", line 48, in decorated_main
    _run_hydra(
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/hydra/_internal/utils.py", line 377, in _run_hydra
    run_and_report(
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/hydra/_internal/utils.py", line 214, in run_and_report
    raise ex
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/hydra/_internal/utils.py", line 211, in run_and_report
    return func()
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/hydra/_internal/utils.py", line 378, in <lambda>
    lambda: hydra.run(
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/hydra/_internal/hydra.py", line 111, in run
    _ = ret.return_value
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/hydra/core/utils.py", line 233, in return_value
    raise self._return_value
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/hydra/core/utils.py", line 160, in run_job
    ret.return_value = task_function(task_cfg)
  File "/home/CGaydon/repositories/myria3d/run.py", line 57, in launch_train
    return train(config)
  File "/home/CGaydon/repositories/myria3d/myria3d/train.py", line 143, in train
    trainer.fit(model=model, datamodule=datamodule, ckpt_path=config.model.ckpt_path)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 740, in fit
    self._call_and_handle_interrupt(
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 698, in _call_and_handle_interrupt
    self.training_type_plugin.reconciliate_processes(traceback.format_exc())
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/ddp.py", line 533, in reconciliate_processes
    raise DeadlockDetectedException(f"DeadLock detected from rank: {self.global_rank} \n {trace}")
pytorch_lightning.utilities.exceptions.DeadlockDetectedException: DeadLock detected from rank: 1 
 Traceback (most recent call last):
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 777, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1199, in _run
    self._dispatch()
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1279, in _dispatch
    self.training_type_plugin.start_training(self)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 202, in start_training
    self._results = trainer.run_stage()
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1289, in run_stage
    return self._run_train()
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1319, in _run_train
    self.fit_loop.run()
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 145, in run
    self.advance(*args, **kwargs)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/loops/fit_loop.py", line 234, in advance
    self.epoch_loop.run(data_fetcher)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 145, in run
    self.advance(*args, **kwargs)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 193, in advance
    batch_output = self.batch_loop.run(batch, batch_idx)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 145, in run
    self.advance(*args, **kwargs)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 88, in advance
    outputs = self.optimizer_loop.run(split_batch, optimizers, batch_idx)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 145, in run
    self.advance(*args, **kwargs)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 215, in advance
    result = self._run_optimization(
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 266, in _run_optimization
    self._optimizer_step(optimizer, opt_idx, batch_idx, closure)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 378, in _optimizer_step
    lightning_module.optimizer_step(
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/core/lightning.py", line 1652, in optimizer_step
    optimizer.step(closure=optimizer_closure)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/core/optimizer.py", line 164, in step
    trainer.accelerator.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 336, in optimizer_step
    self.precision_plugin.optimizer_step(model, optimizer, opt_idx, closure, **kwargs)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 163, in optimizer_step
    optimizer.step(closure=closure, **kwargs)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/torch/optim/optimizer.py", line 88, in wrapper
    return func(*args, **kwargs)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/torch/optim/adam.py", line 100, in step
    loss = closure()
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 148, in _wrap_closure
    closure_result = closure()
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 160, in __call__
    self._result = self.closure(*args, **kwargs)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 142, in closure
    step_output = self._step_fn()
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 435, in _training_step
    training_step_output = self.trainer.accelerator.training_step(step_kwargs)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 216, in training_step
    return self.training_type_plugin.training_step(*step_kwargs.values())
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/ddp.py", line 439, in training_step
    return self.model(*args, **kwargs)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 963, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/pytorch_lightning/overrides/base.py", line 81, in forward
    output = self.module.training_step(*inputs, **kwargs)
  File "/home/CGaydon/repositories/myria3d/myria3d/models/model.py", line 139, in training_step
    targets, logits = self.forward(batch)
  File "/home/CGaydon/repositories/myria3d/myria3d/models/model.py", line 93, in forward
    logits = self.model(batch.x, batch.pos, batch.batch, batch.ptr)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/CGaydon/repositories/myria3d/myria3d/models/modules/pyg_randla_net.py", line 73, in forward
    self.mlp_summit(b4_out_decimated[0]),
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/torch_geometric/nn/models/mlp.py", line 186, in forward
    x = norm(x)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/torch_geometric/nn/norm/batch_norm.py", line 45, in forward
    return self.module(x)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/torch/nn/modules/batchnorm.py", line 168, in forward
    return F.batch_norm(
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/torch/nn/functional.py", line 2419, in batch_norm
    _verify_batch_size(input.size())
  File "/home/CGaydon/.conda/envs/myria3d/lib/python3.9/site-packages/torch/nn/functional.py", line 2387, in _verify_batch_size
    raise ValueError("Expected more than 1 value per channel when training, got input size {}".format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 512])


@CharlesGaydon
Copy link
Collaborator Author

CharlesGaydon commented Aug 16, 2023

Sujet 1 : pourquoi un seul sample dans ce batch d'apprentissage
-> on ajoute drop_last à tout hasard.
-> l'erreur à l'apprentissage disparait 🥳

Sujet 2 : comment conserver la fonctionnalité "Prédire un unique nuage avec un unique point", situation qui peut arriver en inférence.
-> En fait on avait oublié la transform MinimumNumNodes. Celle-ci est supposée dupliquer des points pour éviter les erreurs au sein du modèle. MAIS : maintenant qu'on accepte les nuages de points ayant num_nodes=1, on rentre dans un cas limite de la fonction subsample_data qui est impliquée dans MinimumNumNodes. la condition and item.size(0) != 1: est atteinte ce qui empêche la transform d'être appliquée !
-> Pas de raison d'être de cette condition : elle avait été ajoutée à FixedPoints pour gérer un cas différent à la suite de cette discussion .

@CharlesGaydon
Copy link
Collaborator Author

Sujet 2 devra être réglé en supprimant la condition and item.size(0) != 1 susmentionnée.

@CharlesGaydon
Copy link
Collaborator Author

Pas reproductible. Rouvrir si le problème se représente.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant