Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added INSTRUCTIONS and CYCLES hardware perf counters on MacOS. #1404

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

robwyatt
Copy link

@robwyatt robwyatt commented Jun 2, 2022

MacOS has a semi undocumented API in libpthread that returns the INSTRUCTION and CYCLE counts of the calling thread. The API is thread_selfcounts and is available on Intel and Apple Silicon hardware. Using the API, this PR cleanly integrates with the existing performance counter code and allows those two counters.

For example you can use --benchmark_perf_counters=INSTRUCTIONS,CYCLES just like you can on Linux but no external dependencies are required as everything needed is in libpthread.

The performance counter documentation has been updated to reflect this change.

…ses a little known and MacOS specific function in libpthread to obtain the per thread counters
@google-cla
Copy link

google-cla bot commented Jun 2, 2022

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@LebedevRI
Copy link
Collaborator

Shouldn't this go into perfmon itself?

@robwyatt
Copy link
Author

robwyatt commented Jun 2, 2022

I looked at that but it's a hell of an effort to get perfmon to build on MacOS and support the whole API but it's difficult because it uses read() with a driver provided fd to get data - this is not how MacOS works and from user mode isn't really possible.

It also seems like software engineering overkill to make a a whole library to wrap a single API that is built in to the OS (the OS already does a lot of work to make those two counters behave the same on both platforms, the counters are thread specific and context switching and processor migration is handled so the counters are rock solid). This PR just calls that API where the perform version calls read(), to me it seems like a fair change and it has zero overhead and zero additional dependencies, ultimately its a 50 line addition, no existing code was changed. I understand if you don't agree but I'd like to hear your thoughts on how to get around the file descriptor/read() problem.

Now, if we had easy access to the whole set of counters then the full library would make sense but the read() problem is not trivial even from kernel mode. Getting access to the the hardware counters on MacOS is a challenge and completely different on Intel and Arm so it would be two completely independent implementations and on top of that kernel drivers are tricky especially on Apple Silicon. On both platforms Apple assume they have sole access to the counters and writing a kernel driver would break things like instruments and the existing profile tools.

Sorry for the long reply, but I did look at all the options and I'm pretty sure this is the best solution.

@@ -1,5 +1,3 @@
<a name="perf-counters" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you leave this in please? (or replace it with the equivalent github markdown anchor method)

@@ -114,6 +114,50 @@ void PerfCounters::CloseCounters() const {
close(fd);
}
}
#elif defined(BENCHMARK_OS_MACOSX)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if it's worth splitting this file into perf_counters_linux.cc and perf_counters_osx.cc

(then we can selectively compile on top of using these guards)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a bad idea

@dmah42
Copy link
Member

dmah42 commented Jun 6, 2022

please check the clang format (and any other build failures) :)

@dmah42
Copy link
Member

dmah42 commented Feb 7, 2023

@robwyatt did you intend to get back to this?

@dmah42 dmah42 added the incomplete work needed label Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
incomplete work needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants