Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tensorflow.python.framework.errors_impl.InvalidArgumentError message while trying to training a model #27

Open
BasicAutism opened this issue Nov 19, 2022 · 6 comments

Comments

@BasicAutism
Copy link

(musika) C:\Users\ПК>python X:\musika\musika_train.py --train_path X:\musika\encodings --log_path X:\logs --mixed_precision False


Using GPU without mixed precision...

Calculating total number of samples in data folder...
Found 1720 total samples
Dataset is ready!
Checking if models are already available...
Models are available!
X:\cnd\envs\musika\lib\site-packages\keras\initializers\initializers_v2.py:120: UserWarning: The initializer HeUniform is unseeded and being called multiple times, which will return identical values  each time (even if the initializer is unseeded). Please update your code to provide a seed to the initializer, or avoid using the same initalizer instance more than once.
  warnings.warn(
Encoders/Decoders loaded from checkpoints/ae
Networks initialized
Critic params: 20786689
Generator params: 15499530
--------------------------------
--------------------------------
--------------------------------
--------------------------------
--------------------------------
CLICK ON LINK BELOW TO OPEN GRADIO INTERFACE
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
--------------------------------
--------------------------------
--------------------------------
--------------------------------
--------------------------------
Traceback (most recent call last):
  File "X:\musika\musika_train.py", line 31, in <module>
    T.train(ds, models_ls)
  File "X:\musika\train.py", line 134, in train
    train_summary_writer = tf.summary.create_file_writer(train_log_dir)
  File "X:\cnd\envs\musika\lib\site-packages\tensorflow\python\ops\summary_ops_v2.py", line 559, in create_file_writer_v2
    return _ResourceSummaryWriter(
  File "X:\cnd\envs\musika\lib\site-packages\tensorflow\python\ops\summary_ops_v2.py", line 311, in __init__
    self._init_op = init_op_fn(self._resource)
  File "X:\cnd\envs\musika\lib\site-packages\tensorflow\python\ops\gen_summary_ops.py", line 145, in create_summary_file_writer
    _ops.raise_from_not_ok_status(e, name)
  File "X:\cnd\envs\musika\lib\site-packages\tensorflow\python\framework\ops.py", line 7209, in raise_from_not_ok_status
    raise core._status_to_exception(e) from None  # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node __wrapped__CreateSummaryFileWriter_device_/job:localhost/replica:0/task:0/device:CPU:0}} Failed to create a NewWriteableFile: X:\logs/MUSIKA_latlen_256_latdepth_64_sr_44100/20221119-234355/train/events.out.tfevents.1668890635.??-??.6756.0.v2 : ?????????????? ?????? ? ????? ?????, ????? ????? ??? ????? ????.
; no protocol option
        Creating writable file X:\logs/MUSIKA_latlen_256_latdepth_64_sr_44100/20221119-234355/train/events.out.tfevents.1668890635.??-??.6756.0.v2
        Could not initialize events writer. [Op:CreateSummaryFileWriter]

Are there any options to solve this issue?

@marcoppasini
Copy link
Owner

Can you try again by executing the command in the musika base folder and without specifying the log path?

C:\Users\ПК\musika>python musika_train.py --train_path encodings --mixed_precision False

@BasicAutism
Copy link
Author

Can you try again by executing the command in the musika base folder and without specifying the log path?

C:\Users\ПК\musika>python musika_train.py --train_path encodings --mixed_precision False

Gives me the same message.

(musika) C:\Users\ПК>python musika_train.py --train_path encodings --mixed_precision False


Using GPU without mixed precision...

Calculating total number of samples in data folder...
Found 1720 total samples
Dataset is ready!
Checking if models are already available...
Models are available!
X:\cnd\envs\musika\lib\site-packages\keras\initializers\initializers_v2.py:120: UserWarning: The initializer HeUniform is unseeded and being called multiple times, which will return identical values  each time (even if the initializer is unseeded). Please update your code to provide a seed to the initializer, or avoid using the same initalizer instance more than once.
  warnings.warn(
Encoders/Decoders loaded from checkpoints/ae
Networks initialized
Critic params: 20786689
Generator params: 15499530
--------------------------------
--------------------------------
--------------------------------
--------------------------------
--------------------------------
CLICK ON LINK BELOW TO OPEN GRADIO INTERFACE
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
--------------------------------
--------------------------------
--------------------------------
--------------------------------
--------------------------------
Traceback (most recent call last):
  File "C:\Users\ПК\musika_train.py", line 31, in <module>
    T.train(ds, models_ls)
  File "C:\Users\ПК\train.py", line 134, in train
    train_summary_writer = tf.summary.create_file_writer(train_log_dir)
  File "X:\cnd\envs\musika\lib\site-packages\tensorflow\python\ops\summary_ops_v2.py", line 559, in create_file_writer_v2
    return _ResourceSummaryWriter(
  File "X:\cnd\envs\musika\lib\site-packages\tensorflow\python\ops\summary_ops_v2.py", line 311, in __init__
    self._init_op = init_op_fn(self._resource)
  File "X:\cnd\envs\musika\lib\site-packages\tensorflow\python\ops\gen_summary_ops.py", line 145, in create_summary_file_writer
    _ops.raise_from_not_ok_status(e, name)
  File "X:\cnd\envs\musika\lib\site-packages\tensorflow\python\framework\ops.py", line 7209, in raise_from_not_ok_status
    raise core._status_to_exception(e) from None  # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node __wrapped__CreateSummaryFileWriter_device_/job:localhost/replica:0/task:0/device:CPU:0}} Failed to create a NewWriteableFile: logs/MUSIKA_latlen_256_latdepth_64_sr_44100/20221120-005952/train/events.out.tfevents.1668895192.??-??.11224.0.v2 : ?????????????? ?????? ? ????? ?????, ????? ????? ??? ????? ????.
; no protocol option
        Creating writable file logs/MUSIKA_latlen_256_latdepth_64_sr_44100/20221120-005952/train/events.out.tfevents.1668895192.??-??.11224.0.v2
        Could not initialize events writer. [Op:CreateSummaryFileWriter]

@marcoppasini
Copy link
Owner

I am not sure, maybe a permissions problem? I am noticing that in the first command you are located in the C drive and reference files in the X drive
Can you cd to the X:\musika directory and execute:

X:\musika>python musika_train.py --train_path encodings --mixed_precision False

with administrator privileges?
And if you run musika_test.py and musika_generate.py does everything work?

@BasicAutism
Copy link
Author

BasicAutism commented Nov 19, 2022

Did not help, same error message.

(musika) C:\WINDOWS\system32>cd /d x:\musika

(musika) x:\musika>python musika_train.py --train_path encodings --mixed_precision False


Using GPU without mixed precision...

Calculating total number of samples in data folder...
Found 1720 total samples
Dataset is ready!
Checking if models are already available...
Models are available!
X:\cnd\envs\musika\lib\site-packages\keras\initializers\initializers_v2.py:120: UserWarning: The initializer HeUniform is unseeded and being called multiple times, which will return identical values  each time (even if the initializer is unseeded). Please update your code to provide a seed to the initializer, or avoid using the same initalizer instance more than once.
  warnings.warn(
Encoders/Decoders loaded from checkpoints/ae
Networks initialized
Critic params: 20786689
Generator params: 15499530
--------------------------------
--------------------------------
--------------------------------
--------------------------------
--------------------------------
CLICK ON LINK BELOW TO OPEN GRADIO INTERFACE
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
--------------------------------
--------------------------------
--------------------------------
--------------------------------
--------------------------------
Traceback (most recent call last):
  File "x:\musika\musika_train.py", line 31, in <module>
    T.train(ds, models_ls)
  File "x:\musika\train.py", line 134, in train
    train_summary_writer = tf.summary.create_file_writer(train_log_dir)
  File "X:\cnd\envs\musika\lib\site-packages\tensorflow\python\ops\summary_ops_v2.py", line 559, in create_file_writer_v2
    return _ResourceSummaryWriter(
  File "X:\cnd\envs\musika\lib\site-packages\tensorflow\python\ops\summary_ops_v2.py", line 311, in __init__
    self._init_op = init_op_fn(self._resource)
  File "X:\cnd\envs\musika\lib\site-packages\tensorflow\python\ops\gen_summary_ops.py", line 145, in create_summary_file_writer
    _ops.raise_from_not_ok_status(e, name)
  File "X:\cnd\envs\musika\lib\site-packages\tensorflow\python\framework\ops.py", line 7209, in raise_from_not_ok_status
    raise core._status_to_exception(e) from None  # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node __wrapped__CreateSummaryFileWriter_device_/job:localhost/replica:0/task:0/device:CPU:0}} Failed to create a NewWriteableFile: logs/MUSIKA_latlen_256_latdepth_64_sr_44100/20221120-013455/train/events.out.tfevents.1668897295.??-??.3604.0.v2 : ?????????????? ?????? ? ????? ?????, ????? ????? ??? ????? ????.
; no protocol option
        Creating writable file logs/MUSIKA_latlen_256_latdepth_64_sr_44100/20221120-013455/train/events.out.tfevents.1668897295.??-??.3604.0.v2
        Could not initialize events writer. [Op:CreateSummaryFileWriter]

There are question marks here Failed to create a NewWriteableFile: logs/MUSIKA_latlen_256_latdepth_64_sr_44100/20221120-013455/train/events.out.tfevents.1668897295.??-??.3604.0.v2 : ?????????????? ?????? ? ????? ?????, ????? ????? ??? ????? ????., instead of which, in my opinion, there should be text describing the reason for this error, but due to some Anaconda bug (dunno) or a problem with character encoding, only these question marks are displayed.

musika_test.py and musika_generate.py works fine

(musika) x:\musika>python musika_test.py


Using GPU with mixed precision enabled...

WARNING:tensorflow:Mixed precision compatibility check (mixed_float16): WARNING
Your GPU may run slowly with dtype policy mixed_float16 because it does not have compute capability of at least 7.0. Your GPU:
  NVIDIA GeForce GTX 1070, compute capability 6.1
See https://developer.nvidia.com/cuda-gpus for a list of GPUs and their compute capabilities.
If you will use compatible GPU(s) not attached to this host, e.g. by running a multi-worker model, you can ignore this warning. This message will only be logged once
Checking if models are already available...
Models are available!
X:\cnd\envs\musika\lib\site-packages\keras\initializers\initializers_v2.py:120: UserWarning: The initializer HeUniform is unseeded and being called multiple times, which will return identical values  each time (even if the initializer is unseeded). Please update your code to provide a seed to the initializer, or avoid using the same initalizer instance more than once.
  warnings.warn(
WARNING:tensorflow:You forgot to call LossScaleOptimizer.get_scaled_loss() and LossScaleOptimizer.get_unscaled_gradients() before calling LossScaleOptimizer.apply_gradients(). This will likely result in worse model quality, so please call them in the correct places! For example:
    with tf.GradientTape() as tape:
      loss = loss_fn()
      scaled_loss = opt.get_scaled_loss(loss)
    scaled_grads = tape.gradient(scaled_loss, vars)
    grads = opt.get_unscaled_gradients(scaled_grads)
    opt.apply_gradients([(grads, var)])
For more information, see https://www.tensorflow.org/api_docs/python/tf/keras/mixed_precision/LossScaleOptimizer
WARNING:tensorflow:You forgot to call LossScaleOptimizer.get_scaled_loss() and LossScaleOptimizer.get_unscaled_gradients() before calling LossScaleOptimizer.apply_gradients(). This will likely result in worse model quality, so please call them in the correct places! For example:
    with tf.GradientTape() as tape:
      loss = loss_fn()
      scaled_loss = opt.get_scaled_loss(loss)
    scaled_grads = tape.gradient(scaled_loss, vars)
    grads = opt.get_unscaled_gradients(scaled_grads)
    opt.apply_gradients([(grads, var)])
For more information, see https://www.tensorflow.org/api_docs/python/tf/keras/mixed_precision/LossScaleOptimizer
Networks loaded from checkpoints/techno/
--------------------------------
--------------------------------
--------------------------------
--------------------------------
--------------------------------
CLICK ON LINK BELOW TO OPEN GRADIO INTERFACE
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Keyboard interruption in main thread... closing server.
--------------------------------
--------------------------------
--------------------------------
--------------------------------
--------------------------------
(musika) x:\musika>python musika_generate.py --load_path checkpoints/misc --num_samples 10 --seconds 120


Using GPU with mixed precision enabled...

WARNING:tensorflow:Mixed precision compatibility check (mixed_float16): WARNING
Your GPU may run slowly with dtype policy mixed_float16 because it does not have compute capability of at least 7.0. Your GPU:
  NVIDIA GeForce GTX 1070, compute capability 6.1
See https://developer.nvidia.com/cuda-gpus for a list of GPUs and their compute capabilities.
If you will use compatible GPU(s) not attached to this host, e.g. by running a multi-worker model, you can ignore this warning. This message will only be logged once
Checking if models are already available...
Models are available!
X:\cnd\envs\musika\lib\site-packages\keras\initializers\initializers_v2.py:120: UserWarning: The initializer HeUniform is unseeded and being called multiple times, which will return identical values  each time (even if the initializer is unseeded). Please update your code to provide a seed to the initializer, or avoid using the same initalizer instance more than once.
  warnings.warn(
WARNING:tensorflow:You forgot to call LossScaleOptimizer.get_scaled_loss() and LossScaleOptimizer.get_unscaled_gradients() before calling LossScaleOptimizer.apply_gradients(). This will likely result in worse model quality, so please call them in the correct places! For example:
    with tf.GradientTape() as tape:
      loss = loss_fn()
      scaled_loss = opt.get_scaled_loss(loss)
    scaled_grads = tape.gradient(scaled_loss, vars)
    grads = opt.get_unscaled_gradients(scaled_grads)
    opt.apply_gradients([(grads, var)])
For more information, see https://www.tensorflow.org/api_docs/python/tf/keras/mixed_precision/LossScaleOptimizer
WARNING:tensorflow:You forgot to call LossScaleOptimizer.get_scaled_loss() and LossScaleOptimizer.get_unscaled_gradients() before calling LossScaleOptimizer.apply_gradients(). This will likely result in worse model quality, so please call them in the correct places! For example:
    with tf.GradientTape() as tape:
      loss = loss_fn()
      scaled_loss = opt.get_scaled_loss(loss)
    scaled_grads = tape.gradient(scaled_loss, vars)
    grads = opt.get_unscaled_gradients(scaled_grads)
    opt.apply_gradients([(grads, var)])
For more information, see https://www.tensorflow.org/api_docs/python/tf/keras/mixed_precision/LossScaleOptimizer
Networks loaded from checkpoints/misc
Generating 10 samples...
100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:12<00:00,  1.28s/it]

I am not sure, maybe a permissions problem? I am noticing that in the first command you are located in the C drive and reference files in the X drive Can you cd to the X:\musika directory and execute:

X:\musika>python musika_train.py --train_path encodings --mixed_precision False

with administrator privileges? And if you run musika_test.py and musika_generate.py does everything work?

@marcoppasini
Copy link
Owner

I will have to investigate on this
If anyone else reading this thread is experiencing the same problem please report it!

In the meantime as a temporary solution, you can manually comment out from train.py the two instances of train_summary_writer:

train_summary_writer = tf.summary.create_file_writer(train_log_dir)

and

with train_summary_writer.as_default():
    tf.summary.scalar("disc_loss_r", dloss_tr, step=m)
    tf.summary.scalar("disc_loss_f", dloss_tf, step=m)
    tf.summary.scalar("gen_loss", gloss_t, step=m)
    tf.summary.scalar("gradient_penalty", dloss_id, step=m)
    tf.summary.scalar("gp_weight", -switch.value() * self.args.gp_max_weight, step=m)
    tf.summary.scalar("lr", self.args.lr, step=m)
    

You will not be able to use tensorboard to track losses, but in case training collapses you will still notice it (Nan values) from the loss values in the tqdm bar

@BasicAutism
Copy link
Author

I will have to investigate on this If anyone else reading this thread is experiencing the same problem please report it!

In the meantime as a temporary solution, you can manually comment out from train.py the two instances of train_summary_writer:

train_summary_writer = tf.summary.create_file_writer(train_log_dir)

and

with train_summary_writer.as_default():
    tf.summary.scalar("disc_loss_r", dloss_tr, step=m)
    tf.summary.scalar("disc_loss_f", dloss_tf, step=m)
    tf.summary.scalar("gen_loss", gloss_t, step=m)
    tf.summary.scalar("gradient_penalty", dloss_id, step=m)
    tf.summary.scalar("gp_weight", -switch.value() * self.args.gp_max_weight, step=m)
    tf.summary.scalar("lr", self.args.lr, step=m)
    

You will not be able to use tensorboard to track losses, but in case training collapses you will still notice it (Nan values) from the loss values in the tqdm bar

This worked for me, but there was a problem with the XLA (with -- xla False musika_train.py works fine). First there was a common problem with InternalError: libdevice not found at ./libdevice.10.bc, which was solved by adding a new system variable XLA_FLAGS --xla_gpu_cuda_data_dir=X:\cnd\envs\musika.
However, after that there was a problem with ptxas:

(musika) C:\Users\ПК>python X:\musika\musika_train.py --train_path X:\musika\encodings --mixed_precision False --load_path X:\musika\saveexp\MUSIKA_latlen_256_latdepth_64_sr_44100_time_20221120-162659\MUSIKA_iterations-9k_losses-0.8499346-0.5206635-0.5622146 --save_path X:\musika\saveexp


Using GPU without mixed precision...

Calculating total number of samples in data folder...
Found 1720 total samples
Dataset is ready!
Checking if models are already available...
Models are available!
X:\cnd\envs\musika\lib\site-packages\keras\initializers\initializers_v2.py:120: UserWarning: The initializer HeUniform is unseeded and being called multiple times, which will return identical values  each time (even if the initializer is unseeded). Please update your code to provide a seed to the initializer, or avoid using the same initalizer instance more than once.
  warnings.warn(
Networks loaded from X:\musika\saveexp\MUSIKA_latlen_256_latdepth_64_sr_44100_time_20221120-162659\MUSIKA_iterations-9k_losses-0.8499346-0.5206635-0.5622146
Critic params: 20786689
Generator params: 15499530
--------------------------------
--------------------------------
--------------------------------
--------------------------------
--------------------------------
CLICK ON LINK BELOW TO OPEN GRADIO INTERFACE
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
--------------------------------
--------------------------------
--------------------------------
--------------------------------
--------------------------------
--------------------------------
--------------------------------
--------------------------------
--------------------------------
--------------------------------
CLICK ON LINK BELOW TO OPEN TENSORBOARD INTERFACE
http://localhost:6006/
--------------------------------
--------------------------------
--------------------------------
--------------------------------
--------------------------------
WARNING:tensorflow:From X:\cnd\envs\musika\lib\site-packages\tensorflow\python\training\moving_averages.py:553: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
Preparing for Training (this can take one or two minutes)...
Epoch 0/250:   0%|                                                                            | 0/9375 [00:00<?, ?it/s]Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.10.0 at http://localhost:6006/ (Press CTRL+C to quit)
2022-11-20 18:56:35.349899: F tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:453] ptxas returned an error during compilation of ptx to sass: 'INVALID_ARGUMENT: Failed to create a NewWriteableFile: C:\Users\90C5~1\AppData\Local\Temp\/tempfile-??-??-428-7444-5ede8fbbf0657 : ?????????????? ?????? ? ????? ?????, ????? ????? ??? ????? ????.
; no protocol option'  If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided.

I can say for sure that there is enough disk space. Also i additionally installed cuda-nvcc in conda environment by running conda install -c nvidia cuda-nvcc. Could this be a issue with version of tensorflow, cudnn, cudatoolkit, cuda-nvcc or path issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants