-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"No trace event is collected" when using tensorboard / capture_tpu_profile #380
Comments
Have you tried increasing the number of tracing attempts as suggested in the log? Similarly, you can try increasing the profile duration. The potential issues section of the guide has some suggestions for what could be going wrong here and some steps to try. In particular, making sure the TPU is running before capturing the trace. |
@dmmolitor Yes I did. Since the error continues, I changed the TPU architecture to Node and everything worked fine. So I guess there might be some bug with non-Node architecture since tensorboard itself cannot even profile CPU in my laptop as I mentioned above. |
You are welcome. If your issue is resolved, could you please close the issue? |
@dmmolitor I don't think the issue is resolved, since profiling only works in specific architecture. I'll leave it opened. |
I also find myself unable to replicate https://cloud.google.com/tpu/docs/profile-tpu-vm#profile_tab in order to capture profiles on TPU VMs (TPU nodes work fine as @lackhole noted). In my case, the Tensorboard web UI says
This doesn't look like will get resolved by increasing either the number of retries or the profiling duration 🤔 I also tried the command line tool And here's my TF setup for reference - $ python3 -m pip list | grep -E 'tensor|cloud-tpu'
cloud-tpu-client 0.10
cloud-tpu-profiler 2.4.0
tensorboard 2.6.0
tensorboard-data-server 0.6.1
tensorboard-plugin-profile 2.11.1
tensorboard-plugin-wit 1.8.1
tensorflow 2.6.5
tensorflow-addons 0.16.1
tensorflow-datasets 4.8.2
tensorflow-estimator 2.6.0
tensorflow-hub 0.12.0
tensorflow-io 0.30.0
tensorflow-io-gcs-filesystem 0.30.0
tensorflow-metadata 1.12.0
tensorflow-model-optimization 0.7.3
tensorflow-text 2.6.0 |
As it turns out, the I was then able to see the following output from the training session, signalling a successful profile capture ✌
On the
which prevented the resulting |
At first I was trying to profile BERT in Google Cloud TPU VM(v3-8 | tpu-vm-tf-2.7.0), so I followed the guide while fine tuning BERT.
But when I press capture, it says
No trace event is collected
, so I thought the problem maybe specific to TPU and posted a question at StackOverflow.* Full log vv
After that, I thought maybe the tensorboard itself might be the problem so I followed Tensorflow Serving Readme for my personal PC(macOS 10.15 / Ubuntu 18.04) using CPU, but both of them also got stuck with same error :
No trace event is collected. Automatically retrying.
.Original issue filed at Tensorboard Issue 5517
The output from
diagnose_tensorboard.py
is pasted at the original issue.cf.
Tensorboard Web toasts "Capture profile successfully. Please refresh." but after 0.5 sec it disappears and nothing happens after refresh.
The text was updated successfully, but these errors were encountered: