Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"TensorBoard could not bind to port 6006" #25

Open
cbmtrx opened this issue Nov 11, 2022 · 15 comments
Open

"TensorBoard could not bind to port 6006" #25

cbmtrx opened this issue Nov 11, 2022 · 15 comments

Comments

@cbmtrx
Copy link

cbmtrx commented Nov 11, 2022

I started seeing this message this morning:

E1111 13:29:46.703546 140441364936576 program.py:298] TensorBoard could not bind to port 6006, it was already in use ERROR: TensorBoard could not bind to port 6006, it was already in use

Often the finetuning would stop at 3-4% in Epoch 1; one got all the way to 96% then stopped.

I don't think I'm doing anything differently; likely server issue?

@cbmtrx
Copy link
Author

cbmtrx commented Nov 11, 2022

When I tried a test submission the samples sounded like they were using different audio content.

@cbmtrx
Copy link
Author

cbmtrx commented Nov 11, 2022

Managed to squeeze one Epoch out of it this am but it seemed to then fail on "OSError: Socket is closed". Not sure if I'm doing something wrong...

Loving this tool btw!

@cbmtrx
Copy link
Author

cbmtrx commented Nov 19, 2022

Is anybody else seeing this error msg? Tried 2 diff accounts, no change.

@marcoppasini
Copy link
Owner

I have never experienced it!
Are you using Firefox as your browser?

@cbmtrx
Copy link
Author

cbmtrx commented Nov 19, 2022

I'm on Chrome. I've only seen this a couple of times but when it throws this error, the training will always freeze at 2-3%. I've reloaded, disconnected/reconnected, and tried diff accounts. I have a lot of tabs open...could that be a problem?

@marcoppasini
Copy link
Owner

No, I don't think it should be a problem.
Is this happening if you run the training cell for a second time in the same session maybe?

@cbmtrx
Copy link
Author

cbmtrx commented Nov 19, 2022

Sometimes I might stop the Finetune cell, edit something, and start it again, but when I was having the 6006 error I would disconnect, reload the page, reconnect and try it all over again. Kept getting the error.

@cbmtrx
Copy link
Author

cbmtrx commented Nov 19, 2022

Gonna try something... I noticed a few days ago that if I have discord open in another tab, it often interferes with Colab stuff. Weird I know...

@marcoppasini
Copy link
Owner

You can try to factory reset the notebook instead of simply reconnecting, hopefully it solves the issue

@cbmtrx
Copy link
Author

cbmtrx commented Nov 19, 2022

Ya I'll try that next. Closing Discord didn't do anything (although it got to 36% this time).

@cbmtrx
Copy link
Author

cbmtrx commented Nov 19, 2022

I guess there is no longer a "factory reset" but I used Edit > Clear all outputs, reloaded, and ran it again. Still getting the error. It ran for awhile, stopped, ran a bit more then stopped altogether. This is with 3 new (stereo) WAVs.

Not really sure what's going on, here's the last part of the output from that cell:

`Epoch 0/3: 0% 0/9375 [00:00<?, ?it/s]
NOTE: Using experimental fast data loading logic. To disable, pass
"--load_fast=false" and report issues on GitHub. More details:
tensorflow/tensorboard#4784

E1119 14:07:23.534777 140575472019328 program.py:298] TensorBoard could not bind to port 6006, it was already in use
ERROR: TensorBoard could not bind to port 6006, it was already in use
Epoch 0/3: 21% 1966/9375 [06:55<14:56, 8.27it/s, DR=0.115, DF=0.0657, G=1.34, GP=0.0975, LR=4e-5, TIME=0.11]Exception in thread Thread-9:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/threading.py", line 954, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.9/threading.py", line 892, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.9/site-packages/gradio/tunneling.py", line 39, in handler
chan.send(data)
File "/usr/local/lib/python3.9/site-packages/paramiko/channel.py", line 801, in send
return self._send(s, m)
File "/usr/local/lib/python3.9/site-packages/paramiko/channel.py", line 1198, in _send
raise socket.error("Socket is closed")
OSError: Socket is closed
Epoch 0/3: 24% 2204/9375 [07:18<10:52, 10.99it/s, DR=0.124, DF=0.0675, G=1.32, GP=0.0923, LR=4e-5, TIME=0.0861]Exception in thread Thread-12:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/threading.py", line 954, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.9/threading.py", line 892, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.9/site-packages/gradio/tunneling.py", line 39, in handler
chan.send(data)
File "/usr/local/lib/python3.9/site-packages/paramiko/channel.py", line 801, in send
return self._send(s, m)
File "/usr/local/lib/python3.9/site-packages/paramiko/channel.py", line 1198, in _send
raise socket.error("Socket is closed")
OSError: Socket is closed
Epoch 0/3: 43% 4007/9375 [10:20<07:12, 12.41it/s, DR=0.196, DF=0.0676, G=1.24, GP=0.065, LR=4e-5, TIME=0.0846]^C`

@marcoppasini
Copy link
Owner

With "factory reset" I mean "Disconnect and delete runtime"

@cbmtrx
Copy link
Author

cbmtrx commented Nov 19, 2022

Oh yeah I do that all the time, no change. Only thing left to try is restarting the computer/browser...

@cbmtrx
Copy link
Author

cbmtrx commented Nov 19, 2022

Nothing working here, no idea what the problem is. Found the issue on Stackexchange but this didn't fix it either.

https://stackoverflow.com/questions/54395201/tensorboard-could-not-bind-to-port-6006-it-was-already-in-use

@cbmtrx
Copy link
Author

cbmtrx commented Nov 25, 2022

I'm really hoping that someone discovers a fix for this. I was using the Musika colab no problem for ~2 weeks and now I can't run even 1 epoch. I consistently get this error—doesn't matter what browser—and none of the kill/PID or clear outputs or restart runtime are solving it. The Colab will run the 1st epoch up to a certain percentage then stop. I've left it for up to 30 minutes to see if processing will restart, never does. Seems the port 6006 issue is effectively fatal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants