Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensorboard profiler not working well with data from gcs bucket? #372

Open
miclegr opened this issue Dec 24, 2021 · 5 comments
Open

Tensorboard profiler not working well with data from gcs bucket? #372

miclegr opened this issue Dec 24, 2021 · 5 comments
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@miclegr
Copy link

miclegr commented Dec 24, 2021

I'm running the keras profing notebook on colab and all works fine.
Then I add a cell for logging into gcloud

from google.colab import auth
auth.authenticate_user()
project_id = 'my-project-name'
bucket_name = 'my-bucket-name'
!gcloud config set project {project_id}

and amend logging path to a gcs path:

# Create a TensorBoard callback
logs = f"gs://{bucket_name}/logs/" + datetime.now().strftime("%Y%m%d-%H%M%S")

tboard_callback = tf.keras.callbacks.TensorBoard(log_dir = logs,
                                                 histogram_freq = 1,
                                                 profile_batch = '500,520')

model.fit(ds_train,
          epochs=2,
          validation_data=ds_test,
          callbacks = [tboard_callback])

and most of the times it works fine, but a few time I've got the "No profile data was found." page when browsing into tensorboard, even after refreshing.

Then I launch a tensorboard session in my local machine with logdir the gcs path:

tensorboard --logdir=gs://my-bucket-name/logs/20211224-151008

and I always get the "No profile data was found." page when browsing into tensorboard, even after refreshing.

Finally I download the logging data from gcs bucket into a directory in my local machine and I start tensorboard with logdir my local path and it always shows the profile data.

Similar to #330 , but not quite like.

tensorboard 2.7.0, tensorboard_profiler_plugin 2.5.0

@dkondoetsy
Copy link

dkondoetsy commented Jan 21, 2022

I'm experiencing a similar issue.

In my case, tensorboard is running in a k8s pod for profiling tfserving.

Tensorboard is run with the following command:

tensorboard --host 0.0.0.0 --load_fast=false --logdir=[my_gcs_bucket]

After clicking "Capture" from the tensorboard UI and sending requests to the TFServer, the Profile page doesn't show the profile results; it's as if the capture was never run. I verified that the gcs bucket has the xplane.pb trace files.

However, if I run tensorboard locally from my laptop pointing it to the gcs bucket, tensorboard locally does show the profile:

tensorboard --logdir=[my_gcs_bucket] --load_fast=false

Tensorboard version is 2.8.0, but the same issue occurs with version 2.4.1.
The issue occurs both with --load_fast=false and without that flag (default set to true).
Installed the latest version of tensorboard_plugin_profile: tensorboard_plugin_profile-2.5.0-py3-none-any.whl

Any fix or debugging tips would be greatly appreciated. Thank you.

@dkondoetsy
Copy link

dkondoetsy commented Jan 27, 2022

Any input on this? We are setting up tensorboard in a large-scale k8s deployment (>1000 pods), and so being able to store event logs in GCS is crucial for enabling this.

I can reproduce the issue locally in docker with latest serving and tensorflow images and Tensorboard 2.7.0, and am happy to send my docker files it if helps. A local docker container runs tensorboard specifying a log directory in GCS. Tried running tensorboard both with and without the --load_fast option enabled, but still nothing appears in the Profile page (or any another page), after a profile capture.

Below is a list of files in GCS produced after a profiling run. Noticed a profile-empty file in the list:

gs://[...]/tensorboard/events.out.tfevents.1643293809.9c7022a74960.profile-empty
gs://[...]/tensorboard/plugins/profile/2022_01_27_14_30_08/tfserving_8500.xplane.pb

The file sizes are:
events.out.tfevents.1643293809.9c7022a74960.profile-empty: 40B
tfserving_8500.xplane.pb: 8.8MB

Here is the output of tensorboard inspect. Strangely, there are tags but no stats shown for each tag:

Found event files in:
gs://etsy-recsys-ml-dev-data-nxsn/user/dkondo/tensorboard

These tags are in gs://etsy-recsys-ml-dev-data-nxsn/user/dkondo/tensorboard:
audio -
histograms -
images -
scalars -
tensor -
======================================================================

Event statistics for gs://etsy-recsys-ml-dev-data-nxsn/user/dkondo/tensorboard:
audio -
graph -
histograms -
images -
scalars -
sessionlog:checkpoint -
sessionlog:start -
sessionlog:stop -
tensor -

The docker container has access to the GCS bucket. I verified this by exec'ing into the container and using gsutil to list and read files in the bucket. Also, tensorboard inspect works in the bucket.

If I start a separate tensorboard instance from the command line from my laptop pointing to the same gcs bucket as so:

tensorboard --logdir gs://[...]/tensorboard --port 6007 --load_fast false

the profile results appear [after a log reload, clicked in the upper right hand corner of the UI].

@dkondoetsy
Copy link

dkondoetsy commented Jan 27, 2022

I found that the issue specifically occurs with tensorboard-plugin-profile==2.5.0 with tensorboard 2.4.1 and 2.7.0 (and possibly other versions), but does not occur with version tensorboard-plugin-profile==2.4.0.

@rivershah
Copy link

Anything on this please? Problem is still there

@Matt-Hurd Matt-Hurd added bug Something isn't working good first issue Good for newcomers labels Dec 11, 2024
@Matt-Hurd
Copy link
Collaborator

@rivershah can you please provide the directory structure of the GCS bucket, the command that you're using to run Tensorboard, and the code that you're using to collect the profile? Also, version information for tensorflow, tensorboard, and tensorboard_plugin_profile?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

4 participants