-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added INSTRUCTIONS and CYCLES hardware perf counters on MacOS. #1404
base: main
Are you sure you want to change the base?
Conversation
…ses a little known and MacOS specific function in libpthread to obtain the per thread counters
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
Shouldn't this go into perfmon itself? |
I looked at that but it's a hell of an effort to get perfmon to build on MacOS and support the whole API but it's difficult because it uses read() with a driver provided fd to get data - this is not how MacOS works and from user mode isn't really possible. It also seems like software engineering overkill to make a a whole library to wrap a single API that is built in to the OS (the OS already does a lot of work to make those two counters behave the same on both platforms, the counters are thread specific and context switching and processor migration is handled so the counters are rock solid). This PR just calls that API where the perform version calls read(), to me it seems like a fair change and it has zero overhead and zero additional dependencies, ultimately its a 50 line addition, no existing code was changed. I understand if you don't agree but I'd like to hear your thoughts on how to get around the file descriptor/read() problem. Now, if we had easy access to the whole set of counters then the full library would make sense but the read() problem is not trivial even from kernel mode. Getting access to the the hardware counters on MacOS is a challenge and completely different on Intel and Arm so it would be two completely independent implementations and on top of that kernel drivers are tricky especially on Apple Silicon. On both platforms Apple assume they have sole access to the counters and writing a kernel driver would break things like instruments and the existing profile tools. Sorry for the long reply, but I did look at all the options and I'm pretty sure this is the best solution. |
@@ -1,5 +1,3 @@ | |||
<a name="perf-counters" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you leave this in please? (or replace it with the equivalent github markdown anchor method)
@@ -114,6 +114,50 @@ void PerfCounters::CloseCounters() const { | |||
close(fd); | |||
} | |||
} | |||
#elif defined(BENCHMARK_OS_MACOSX) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder if it's worth splitting this file into perf_counters_linux.cc and perf_counters_osx.cc
(then we can selectively compile on top of using these guards)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a bad idea
please check the clang format (and any other build failures) :) |
@robwyatt did you intend to get back to this? |
MacOS has a semi undocumented API in libpthread that returns the INSTRUCTION and CYCLE counts of the calling thread. The API is thread_selfcounts and is available on Intel and Apple Silicon hardware. Using the API, this PR cleanly integrates with the existing performance counter code and allows those two counters.
For example you can use
--benchmark_perf_counters=INSTRUCTIONS,CYCLES
just like you can on Linux but no external dependencies are required as everything needed is in libpthread.The performance counter documentation has been updated to reflect this change.