You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I'm trying to profile the inference performance of my model on CPU to figure out where the bottlenecks are. However, when I run the profiler and open tensorflow_stats, it groups all the operations by type (MatMul, BiasAdd...) and doesn't tell me which layer is taking how much time.
2022-03-24 17:54:20.546354: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-03-24 17:54:20.557142: I tensorflow/core/profiler/lib/profiler_session.cc:110] Profiler session initializing.
2022-03-24 17:54:20.557166: I tensorflow/core/profiler/lib/profiler_session.cc:125] Profiler session started.
2022-03-24 17:54:20.605772: I tensorflow/core/profiler/lib/profiler_session.cc:67] Profiler session collecting data.
2022-03-24 17:54:20.609004: I tensorflow/core/profiler/lib/profiler_session.cc:143] Profiler session tear down.
2022-03-24 17:54:20.616532: I tensorflow/core/profiler/rpc/client/save_profile.cc:136] Creating directory: logdir/plugins/profile/2022_03_24_17_54_20
2022-03-24 17:54:20.619339: I tensorflow/core/profiler/rpc/client/save_profile.cc:142] Dumped gzipped tool data for trace.json.gz to logdir/plugins/profile/2022_03_24_17_54_20/my-computer.trace.json.gz
2022-03-24 17:54:20.625353: I tensorflow/core/profiler/rpc/client/save_profile.cc:136] Creating directory: logdir/plugins/profile/2022_03_24_17_54_20
2022-03-24 17:54:20.625652: I tensorflow/core/profiler/rpc/client/save_profile.cc:142] Dumped gzipped tool data for memory_profile.json.gz to logdir/plugins/profile/2022_03_24_17_54_20/my-computer.memory_profile.json.gz
2022-03-24 17:54:20.628578: I tensorflow/core/profiler/rpc/client/capture_profile.cc:251] Creating directory: logdir/plugins/profile/2022_03_24_17_54_20
Dumped tool data for xplane.pb to logdir/plugins/profile/2022_03_24_17_54_20/my-computer.xplane.pb
Dumped tool data for overview_page.pb to logdir/plugins/profile/2022_03_24_17_54_20/my-computer.overview_page.pb
Dumped tool data for input_pipeline.pb to logdir/plugins/profile/2022_03_24_17_54_20/my-computer.input_pipeline.pb
Dumped tool data for tensorflow_stats.pb to logdir/plugins/profile/2022_03_24_17_54_20/my-computer.tensorflow_stats.pb
Dumped tool data for kernel_stats.pb to logdir/plugins/profile/2022_03_24_17_54_20/my-computer.kernel_stats.pb
When I view the results in Tensorboard,, I get:
Here I would like the two layers to be shown as separate elements in the charts, rather than there being a single "MatMul" element for both matrix multiplications together. This is also what the guide shows in one of the screenshots. How can I achieve this?
Furthermore, when I run this code, I can only see tensorflow_stats and trace_viewer; some of the others say "No step marker observed and hence the step time is unknown". Am I using the profiler wrong? My model uses a custom training loop so I am using context managers rather than a callback, as per the guide.
Hi vvolhejn, sorry for the late reply, did you figure out this issue yet?
For your 2nd question, some observations:
you seems to missing the 3rd argument when instrumenting the python custom training loops, _r=1.
The guide mentions "Make sure that you include the dataset iterator within the tf.profiler.experimental.Trace context for accurate analysis of the input pipeline", so maybe put your fake_batch = tf.random.normal((8, 32)) line underneath the tf.profiler.experimental.Trace scope
For your 1st question, not sure why your Type and Op are the same. noticed that you seem to use the Context manager API to collect the profile data, be sure to use it only in local mode (having the API called in the same machine where your model is running, not remotely like a cloud machine). If it's still not working for you, maybe try switching to the tf.profiler.experimental start/stop API instead and try again?
Hi,
I'm trying to profile the inference performance of my model on CPU to figure out where the bottlenecks are. However, when I run the profiler and open tensorflow_stats, it groups all the operations by type (MatMul, BiasAdd...) and doesn't tell me which layer is taking how much time.
Here is a minimal example:
The console output doesn't report anything weird:
When I view the results in Tensorboard,, I get:
Here I would like the two layers to be shown as separate elements in the charts, rather than there being a single "MatMul" element for both matrix multiplications together. This is also what the guide shows in one of the screenshots. How can I achieve this?
Furthermore, when I run this code, I can only see tensorflow_stats and trace_viewer; some of the others say "No step marker observed and hence the step time is unknown". Am I using the profiler wrong? My model uses a custom training loop so I am using context managers rather than a callback, as per the guide.
Versions:
tensorflow 2.8.0
tensorboard 2.8.0
tensorboard_plugin_profile 2.5.0
macOS Catalina 10.15.7
The text was updated successfully, but these errors were encountered: