Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display CPU Governor information in javacore #2838

Open
bharathappali opened this issue Sep 12, 2018 · 11 comments
Open

Display CPU Governor information in javacore #2838

bharathappali opened this issue Sep 12, 2018 · 11 comments

Comments

@bharathappali
Copy link
Contributor

When running in containers with CPU quota set, performance of the application may be adversely affected if CPU frequency governor is set to "powersave".

As an example, compare application running on 4 CPUs with 100% quota vs running it on 8 CPUs with 50% quota. If the CPU frequency governor is set to "powersave", performance in latter case is very poor even though the application has 4 "effective" cpus.

We have observed this during internal testing on DayTrader 3. We ran DayTrader 3 with following cases (Frequency Governor - powersave(default)):

- 4 cpus, 100% quota
- 8 cpus, 50% quota

Throughput comparison for the above cases:

Case 			Throughput
4Cpu100Quota 		9248.22
8Cpu50Quota 		7771.04

We also asked about this on Ubuntu forum here: https://askubuntu.com/questions/1007656/behavior-of-powersave-freq-governor-when-cpu-quota-is-set/1007731#1007731

When we came to know that its due to the powersave frequency governor. we ran again with performance frequency governor and saw the difference in throughput

Case 			Throughput
4Cpu100Quota 		8255.2
8Cpu50Quota 		8355.69

It appears this drop in performance is expected with "powersave" governor. If thats the case then it would be useful to capture the CPU frequency governor being used in javacore which can be helpful in dealing with performance problems.

The Javacore CPU information section could be delivering this extended feature and might look something like :

1CICPUINFO	CPU Information
NULL      	------------------------------------------------------------------------
2CIPHYSCPU     	Physical CPUs: 4
2CIONLNCPU     	Online CPUs: 4
2CIBOUNDCPU    	Bound CPUs: 4
2CIACTIVECPU   	Active CPUs: 0
2CITARGETCPU   	Target CPUs: 4
2CICPUGVNR     	CPU Governors
2CICPUGVNR0		CPU0 : powersave
2CICPUGVNR1            	CPU1 : powersave
2CICPUGVNR2            	CPU2 : powersave
2CICPUGVNR3            	CPU3 : powersave
                    .
                    .
                    .
                    .
...

Please kindly provide your views on it

FYI @ashu-mehra @dinogun @deesebas

@bharathappali
Copy link
Contributor Author

@pshipton @DanHeidinga Can i get your views on it please ?

@DanHeidinga
Copy link
Member

@bharathappali I can see how having this info would be useful. Here's a couple of questions to help me understand the scope of the this change and the broader implications:

  • How expensive is this information to collect? Is it collected once and stored for the duration of the run?

  • How much info is being added to the javacore? 1 line per cpu? This may be unwieldy when there are large numbers of CPUs.

  • Does this info only get included when run in cgroups or is it more widely applicable?

  • Are there other governors than powersave?

@bharathappali
Copy link
Contributor Author

bharathappali commented Sep 20, 2018

@DanHeidinga Thank you for posting those questions. Which are quite helpful and important.

How expensive is this information to collect? Is it collected once and stored for the duration of the run?

Its a moderately expensive operation as its processed every time the javacore dump is generated. We read and write to javacore directly when its processed will not be storing it.

How much info is being added to the javacore? 1 line per cpu? This may be unwieldy when there are large numbers of CPUs.

Initially i thought to add 1 line per cpu. But what you said was right it doesn't look good in a case where we have many CPU's. I have come up with different ways listed below for representing them which take less lines to be displayed in javacore.

Array model representation (for each governor) :

2CICPUGVNR         CPU Governors [Governor - CPU's]
2CICPUGVRPS        Powersave - [0,1,2,3,8,9,10,11]
2CICPUGVRPF        Performance - [4,5,6,7,12,13,14,15]

More simplified with ranges it would be something like :

2CICPUGVNR         CPU Governors [Governor - CPU's]
2CICPUGVRPS        Powersave - [(0-3),(8-11)]
2CICPUGVRPF        Performance - [(4-7),(12-15)]

Short form representation with 16 CPU's per line

2CICPUGVNR         CPU Governors [Powersave - PS, Performance - PF]
2CICPUIDS     0     1     2     3     4     5     6     7     8     9     10    11    12    13    14    15
2CICPUGVRS    PS    PS    PS    PS    PF    PF    PF    PF    PS    PS    PS    PS    PF    PF    PF    PF 

Does this info only get included when run in cgroups or is it more widely applicable?

Its widely applicable irrespective of cgroups. If we run in cgroups with cpu restrictions then will be getting the governor info of those specific CPU's

Are there other governors than powersave?

yes, In linux there are 6 cpu frequency governors.

Note: I have got the info of cpu frequency governors from this link from section 2.

2.   Governors In the Linux Kernel
2.1  Performance
2.2  Powersave
2.3  Userspace
2.4  Ondemand
2.5  Conservative
2.6  Schedutil

Please kindly provide your views on the representation of the CPU Governor info in the javacore.

@DanHeidinga
Copy link
Member

Its a moderately expensive operation as its processed every time the javacore dump is generated. We read and write to javacore directly when its processed will not be storing it.

Can the frequency governor change at runtime? Or is it a static assignment? I'd rather cache the info than read it over and over.... depending on how expensive it is to read.

Is there a reason to only record the two kinds (performance / powersave) when there are 4 other kinds?

I think we need to figure out what data we want before we pick the format.

@bharathappali
Copy link
Contributor Author

@DanHeidinga I'm sorry for delayed reply. I haven't gone through it for a while.

Can the frequency governor change at runtime?

Yeah there is a possibility to change it in runtime.

I'd rather cache the info than read it over and over.... depending on how expensive it is to read.

its just for a one time use when javacore is generated. As there is no other mechanism/service using this info I'm thinking that caching it would not be that useful from a memory allocation perspective. I might be wrong, I'm sorry if i was.

Is there a reason to only record the two kinds (performance / powersave) when there are 4 other kinds?

My system (RHEL 7.5) currently showing only those two in available governors for my cpus (got the info from /sys/devices/system/cpu/cpu<0-3>/cpufreq/scaling_available_governors) so just for representational purposes i have described only two. But yes will be adding all the kinds available in the JAVACORE.

@SueChaplain
Copy link
Contributor

SueChaplain commented Nov 5, 2018

@bharathappali Please add a tag for the doc changes needed (doc:externals) and open an issue at the openj9-docs repository: https://github.com/eclipse/openj9-docs/issues/new?template=new-documentation-change.md

We will then be able to add this to our 0.12.0 release list. Thanks!

@bharathappali
Copy link
Contributor Author

@SueChaplain Thanks for pointing it. Created the issue openj9-docs/issues/136. Will be updating it accordingly.

@pshipton
Copy link
Member

pshipton commented Dec 18, 2018

Given where this is at, and the upcoming holidays and absences, I'm guessing this won't make the 0.12 release. I've moved it out of the 0.12 milestone and into the 0.14 milestone to reduce the size of the 0.12 milestone. If it does happen to make it into the 0.12 release, we can fix the target milestone.

@bharathappali
Copy link
Contributor Author

@DanHeidinga As the OpenJ9 PR #3348 makes changes to add CPU Governors to javacore with a representation [Governor - CPU's]. I would like to know your views on that representation format. Thanks

@ashu-mehra
Copy link
Contributor

@bharathappali Looking at PRs #3348 and #4176 , I am more inclined towards #3348
where cpu governor information is obtained from port library, whereas #4176 just puts up everything in a function and is difficult to comprehend and maintain.

I also suggest to use bits for storing list of cpus for each governor. In fact having port library apis for maintaining cpu sets would be a good addition I feel.
For example, I have come up with following set of APIs catering to cpu governor information:

+typedef struct OMRCpuSet {
+	union {
+		cpu_set_t cpuset;
+		struct {
+			cpu_set_t *cpuset;
+			uint64_t size;
+		} d_cpuset;
+	} set;
+	BOOLEAN dynamic;	
+} OMRCpuSet;
+
 +void omrsysinfo_get_cpu_affinity(OMRCpuSet *cpuset) {
+#if defined(LINUX) && !defined(OMRZTPF)
+	int32_t size = sizeof(cpuset->set.cpuset); /* Size in bytes */
+	int32_t rc = sched_getaffinity(0, size, &cpuset->set.cpuset); /* pid = 0 returns mask of calling process */
+	if (rc != 0) {
+		if (EINVAL == errno) {
+			/* Too many CPUs for the fixed cpu_set_t structure */
+			int32_t numCPUs = sysconf(_SC_NPROCESSORS_CONF);
+			cpuset->set.d_cpuset.cpuset = CPU_ALLOC(numCPUs);
+			if (NULL != allocatedCpuSet) {
+				cpuset->set.d_cpuset.size = CPU_ALLOC_SIZE(numCPUs);
+				CPU_ZERO_S(cpuset->set.d_cpuset.size, cpuset->set.d_cpuset.cpuset);
+				rc = sched_getaffinity(mainProcess, cpuset->set.d_cpuset.size, cpuset->set.d_cpuset.cpuset);
+				if (0 != rc) {
+					return OMRPORT_ERROR_XXX;
+				}
+				cpuset->dynamic = TRUE;
+			}
+		}
+	}
+#else
+
+#endif	
+}
+
+void omrsysinfo_free_cpu_set(OMRCpuSet *cpuset) {
+#if defined(LINUX) && !defined(OMRZTPF)
+	if (cpuset->dynamic) {
+		CPU_FREE(cpuset->set.d_cpuset.cpuset)
+	}
+#endif
+}
+
+void omrsysinfo_add_cpuset(OMRCpuSet *cpuset, int cpu) {
+#if defined(LINUX) && !defined(OMRZTPF)
+	if (cpuset->dynamic) {
+		if (cpu >= (CPU_ALLOC_SIZE(cpuset->set.d_cpuset.cpuset) * 8)) {
+			CPU_FREE(cpuset->set.d_cpuset.cpuset);
+			cpuset->set.d_cpuset.cpuset = CPU_ALLOC(cpu+1);
+		}
+		CPU_SET(cpu, cpuset->set.d_cpuset.cpuset);
+	} else {
+		if (cpu >= sizeof(cpuset->set.cpuset)) {
+			cpuset->set.d_cpuset.cpuset = CPU_ALLOC(cpu+1);
+			cpuset->dynamic = TRUE;
+			CPU_SET(cpu, cpuset->set.d_cpuset.cpuset);
+		} else {
+			CPU_SET(cpu, &cpuset->set.cpuset);
+		}
+	}
+#endif
+}
+
+void omrsysinfo_isset_cpuset(OMRCpuSet *cpuset, int cpu) {
+#if defined(LINUX) && !defined(OMRZTPF)
+	if (cpuset->dynamic) {
+		CPU_ISSET(cpu, cpuset->set.d_cpuset.cpuset);
+	} else {
+		CPU_ISSET(cpu, &cpuset->set.cpuset);
+	}
+#endif
+}
+
+int omrsysinfo_get_cpu_count(OMRCpuSet *cpuset) {
+#if defined(LINUX) && !defined(OMRZTPF)
+	if (cpuset->dynamic) {
+		CPU_COUNT(cpuset->set.d_cpuset.cpuset);
+	} else {
+		CPU_COUNT(&cpuset->set.cpuset);
+	}
+#endif
+}
-- 

Whether these APIs would be useful for OMR or should they be in OpenJ9, or just be internal functions, is open to discussion.
With the above functions, the code for CPU governors can be simplified a lot like this:

+typedef struct J9CpuGovernor {
+	char *type;
+	struct OMRCpuSet cpuSet;
+	struct J9CpuGovernor *next;
+} J9CpuGovernor;
+
+
+#if defined(LINUX)
+
+#define CPU_INDEX_LENGTH 5
+
+char const *cpuGovernorPathPattern = "/sys/devices/system/cpu/cpu%d/cpufreq/scaling_governor";
+#define CPU_GOVERNOR_PATTERN_SIZE (strlen(cpuGovernorPathPattern) + (CPU_INDEX_LENGTH) + 1)
+
+static void
+addCpuToGovernorList(struct J9PortLibrary *portLibrary, J9CpuGovernor **governorList, int cpu, const char *governor)
+{
+	int i = 0;
+	J9CpuGovernor *governorEntry = *governorList;
+	J9CpuGovernor *prevEntry = NULL;
+
+	while (governorEntry != NULL) {
+		if (0 == strcmp(governor, governorEntry->type)) {
+			break;
+		}
+		prevEntry = governorEntry;
+		governorEntry = governorEntry->next;
+	}
+
+	if (governorEntry == NULL) {
+		governorEntry = omrmem_allocate_memory(sizeof(J9CpuGovernor) + strlen(governor) + 1);
+		governorEntry->type = (char *)(governorEntry + 1);
+		strcpy(governorEntry->type, governor);
+		if (prevEntry) {
+			prevEntry->next = governorEntry;
+		}
+	}
+
+	omrsysinfo_add_cpuset(&governorEntry.cpuSet, cpu);
+
+	if (NULL == *governorList) {
+		*governorList = governorEntry;
+	}
+}
+
+static int32_t
+getCpuGovernorDetails(struct J9PortLibrary *portLibrary, struct J9CpuGovernor **governorList, OMRCpuSet *cpuSet)
+{
+	OMRPORT_ACCESS_FROM_J9PORT(portLibrary);
+ 	int32_t rc = 0;
+	int32_t i = 0;
+	int32_t cpuCount = omrsysinfo_get_cpu_count(cpuSet);
+	int32_t cpuIndex = 0;
+
+	for(i = 0; i < cpuCount; i++) {
+		char pathBuffer[CPU_GOVERNOR_PATTERN_SIZE];
+		int32_t cpuGovernorPathLength = 0;
+		FILE *file = NULL; 
+		char governorName[20];
+
+		while (!omrsysinfo_isset_cpuset(cpuSet, cpuIndex)) {
+			cpuIndex += 1;
+		}
+
+		cpuGovernorPathLength = omrstr_printf(pathBuffer, sizeof(pathBuffer), cpuGovernorPathPattern, i);
+		if (0 == cpuGovernorPathLength) {
+			rc = J9PORT_ERROR_STRING_ILLEGAL_STRING;
+			goto _end;
+		}
+		file = fopen(pathBuffer, "r");
+		if (NULL == file) {
+			rc = J9PORT_ERROR_FILE_OPFAILED;
+			goto _end;
+		}
+		if (NULL == fgets(governorName, 20, file)) {
+			rc = J9PORT_ERROR_FILE_OPFAILED;
+		} else {
+			addCpuToGovernorList(cpuIndex, governorName, governorList);
+		}
+		fclose(file);
+		cpuIndex += 1;
+	}
+_end:
+	return rc;
+}
+#endif
+
+int32_t
+j9sysinfo_get_cpu_governor_info(struct J9PortLibrary *portLibrary, J9CpuGovernor **governorList)
+{
+	int32_t rc = J9PORT_ERROR_SYSINFO_NOT_SUPPORTED;
+	OMRCpuSet cpuSet = {0};
+	omrsysinfo_get_cpu_affinity(&cpuSet);
+	getCpuGovernorDetails(portLibrary, &cpuSet, governorList);
+}
+
+j9sysinfo_release_cpu_governor_info(struct J9PortLibrary *portLibrary, J9CpuGovernor *governorList)
+{
+
+}

This code does not need to have a hard coded list of governors.
Note that this is not working code and needs to be tested and polished like adding return values, handling error conditions etc.

@pshipton
Copy link
Member

pshipton commented Jul 8, 2019

Moving forward until this picks up again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants