metrics: graph improvements #271

grahamwhaley · 2019-11-15T13:55:09Z

A small set of improvements to help graph readability in the report:

move the legends under the graphs, which allows the graphs to be wider
reduce the page margins in the pdf, which allows the graphs to be wider still
improve the lhs y axis for the interface drop/error graphs so the labels look sensible.

grahamwhaley · 2019-11-15T13:56:33Z

As an example of how the graphs now look:

/cc @askervin @dklyle @obedmr @marcemq

dklyle

nice improvements. thanks!

dklyle · 2019-11-15T16:23:29Z

metrics/report/report_dockerfile/collectd_scaling.R

@@ -657,7 +657,7 @@ interface_drop_line_plot <- ggplot() +
 	labs(colour="") +
 	xlab("seconds") +
 	ylab("drops") +
-	scale_y_continuous(labels=comma, sec.axis=sec_axis(~ ./drop_scale, name="pods")) +
+	scale_y_continuous(breaks=pretty_breaks(), sec.axis=sec_axis(~ ./drop_scale, name="pods", labels=comma)) +


pretty_breaks, that's what I put into my code too

marcemq · 2019-11-15T18:23:39Z

Testing done

cloned grahamwhaley/cloud-native-setup
checkout to 20191115_graph_improvements branch
create the required results directory which looks like (with previous results files):

$ tree results/
results/
├── Env.R
├── test-PR-rapid
│   └── k8s-rapid.json
└── test-PR-scaling
    └── k8s-scaling.json

hit ./report/makereport.sh to create the report

Findings:

No graph were created, nor any of the two tests in placed
Only the table information gets created, but the last one does not get added into the PDF
the width page was increase for PDF but it breaks the page numbers at the bottom

The used results files are attached and the output directory created too:

marcemq · 2019-11-15T18:25:29Z

metrics/report/report_dockerfile/pdf.Rmd

@@ -8,6 +8,8 @@ author: "Auto generated"
 date: "`r format(Sys.time(), '%d %B, %Y')`"
 output:
  pdf_document:
+# Shrink the page margins so we get bigger/better resolution on the graphs
+geometry: margin=1cm


The page numbers get broke with this margin.

grahamwhaley · 2019-11-18T11:33:56Z

Thanks for the test and report @marcemq . I've an idea of most of what is going on:

trimmed page numbers
ah, yes, that is a bug in the PR. As I trimmed all the margins, the page numbers nearly fell off the bottom of the page. Fixed by now setting the top/bottom/left/right border widths independently.
'rapid' test not showing any results.
I could not reproduce that one, as I think I'd also need the collectd node tgz files to go with the json result file.
scaling test render fail
I'm seeing a different error, which is ## Error in data.frame(..., check.names = FALSE): arguments imply differing number of rows: 21, 601. I know where that is coming from (trying to merge the node list into the results list, which is a 21 item and 601 item mismatch). I think that might be a consequence of some effect of using STEP in your test run. I don't think that is related to this PR directly.

I've pushed a PR update for (1). I think (2) and (3) we should reproduce, document the problem more clearly, and PR separately.

I think (2) and (3) are not related to the changes in this PR. Could you:

verify if the HEAD code renders a pdf from your results files OK?
try this PR with the default STEP=1 NUM_PODS=20 data

It is possible there are some fixes in the HEAD branch that are not in this PR baseline - I can rebase and repush if we need etc.

askervin · 2019-11-20T08:49:57Z

@grahamwhaley, LGTM. This makes it possible to see graphs in cases with many interfaces.

I'm good with merging this already. But if something could be improved, then a combination of longish test name + host name + many interfaces makes the legend unreadable. Of course this cannot be fixed for all possible cases, but using smaller font would help at least in this case:

The font in the caption looks slightly smaller than the font in the legend at the moment.

grahamwhaley · 2019-11-20T10:02:08Z

@askervin thx. I think the problems @marcemq has seen are not specific to this PR (I have seen similar errors - I think there is something not quite write matching STEP!=1, but have not found an easy way to reproduce yet).

Yeah, large amounts of data in the tables tends to blow them outside of the page :-( I've seen that many times with the 'device under test' table when I am comparing many data runs - you can see the 'shrink the font' sticky plaster on that table at https://github.com/clearlinux/cloud-native-setup/blob/master/metrics/report/report_dockerfile/dut-details.R#L92-L99

# Build us a text table of numerical results
# Set up as left hand justify, so the node data indent renders.
tablefontsize=8
tbody.style = tbody_style(hjust=0, x=0.1, size=tablefontsize)
stats_plot = suppressWarnings(ggtexttable(data.frame(spun_stats, check.names=FALSE),
	theme=ttheme(base_size=tablefontsize, tbody.style=tbody.style),
	rows=NULL
	))

We could try applying that to the interface tables and see what it looks like - do you have a few cycles to try that with your data?

marcemq · 2019-11-22T17:53:14Z

Testing done

cloned grahamwhaley/cloud-native-setup
checkout to 20191115_graph_improvements branch
execute scale and rapid tests with default variables (./scaling/k8s_scale.sh and ./scaling/k8s_scale_rapid.sh )
create the required results directory using above output results which looks like (with previous results files):

$ tree results/
results/
├── Env.R
├── test-PR-rapid
│   └── k8s-rapid.json
└── test-PR-scaling
    └── k8s-scaling.json

2 directories, 3 files

hit ./report/makereport.sh to create the report

Findings:

No graph were created, nor any of the two tests in placed
Only the table information gets created, but the last one does not get added into the PDF

The used results files are attached and the output directory created too:
metrics_report.pdf
PR271.zip

marcemq · 2019-11-22T18:13:20Z

Heads up

I've attempt to generate the report for the output files mentioned above separately and for k8s-scaling.json the report was generated w/o errors. Find attached the reports generated separately:

So I'm not sure if the errors came from having multiple directories, which I've previously have tested w/o errors, or is it due to the rapid test, anyway, @grahamwhaley I've detailed the steps that I've followed in case you want to reproduce this behaviour.

grahamwhaley · 2019-11-22T18:17:31Z

Thanks @marcemq - I tried to generate with the commit before this PR - commit 54adf53 - and that didn't work for me either - I think the problem is probably not in this PR, but already existed.
I'l see if I can see the root cause (I have a feeling it is going to be the incorrect k8s-... .* regexp file match somewhere)....

In the meantime, do you think you can verify the commits in this PR are OK?

grahamwhaley · 2019-11-22T18:24:31Z

OK, I think I know what the bug with the k8s-scaling (tidy) report is - there is a bad assumption in the R that tries to use a piece of data, even if there were no files to process in the directory it just scanned. The error is at https://github.com/clearlinux/cloud-native-setup/blob/master/metrics/report/report_dockerfile/tidy_scaling.R#L198 - and the two lines there need to be lifted inside the for loop.
I've seen this before - thought I'd already fixed that (unless it is in another stuck PR, or a PR I never got to land).

@marcemq - fancy seeing if you can move those lines and seeing if that helps - and if so, opening a PR to fix it?

marcemq · 2019-11-22T20:26:45Z

@grahamwhaley I've moved the lines that you've pointed out inside the inner loop and did not make any difference, and even if the reports gets generated only for scaling test the legend is still at the right side.

Move the legends under the graphs to give more width, and thus resolution, to the final pictures. This works well for the collectd graphs as they are spread out into sets of single column graphs per page. Signed-off-by: Graham Whaley <[email protected]>

The pdf output by default has large page margins, which wastes a lot of page space, and reduces our 'resolution'. Shrink the margins to a pretty minimal 1cm to increase the graph resolution. The document itself then does not look as 'pretty', but we can see more data visually. Signed-off-by: Graham Whaley <[email protected]>

Most of the time we have 0 interface errors or drops, so we pin the y scale to '1', so we don't hit 'infinity' errors. That left us with a strange y-axis label anomoly - as the axis was automatically divided into 5 labels, and we got for some reason the sequence '0,0,0,1,1'. That just plain looked wrong and confusing. Fix it by using `pretty_breaks()` for the error/drop y axis, whilst maintaining the `comma` count for the pod count y axis. Signed-off-by: Graham Whaley <[email protected]>

The bootdata assignments were outside the 'valid file' check loop, which meant in the case there was a data directory which did not contain a valid scaling file, we would fail the assignment (as the `local_bootdata` would be empty). Fix by moving the assignments into the loop, thus only assigning when we know we have valid data. Signed-off-by: Graham Whaley <[email protected]>

Move the legends to the bottom (underneath) for the tidy scaling graphs to make them wider on the page, and thus easier to read with more resolution. Signed-off-by: Graham Whaley <[email protected]>

grahamwhaley · 2019-11-25T11:07:52Z

@marcemq - indeed, if you looked at the commits you would have seen they only updated the graph width for the collectd (rapid) graphs.... but....
I've had a check if we could do the same for the tidy (non-rapid) case, and it works fine. I've thus updated the patch sequence:

to bring the local_data assignment inside the loop, so the non-rapid tidy report will work when there are directories with no non-rapid results
to expand the width of the graphs for the non-rapid tidy case as well.

What you will still see with your data is failure to render for the rapid case due to #276 not having been reviewed and merged (so I cannot rebase on it)....

marcemq · 2019-12-02T22:34:30Z

Testing done

cloned grahamwhaley/cloud-native-setup
checkout to 20191115_graph_improvements branch
execute scale and rapid tests with default variables (./scaling/k8s_scale.sh and ./scaling/k8s_scale_rapid.sh )
create the required results directory using above output results which looks like:
$ tree results/

$ tree results/
results/
├── Env.R
└── run1
    ├── clr-30f01b5149ba4ab8b05a7ee03b6812a5
    ├── k8s-rapid.json
    └── k8s-scaling.json

2 directories, 3 files

hit ./report/makereport.sh to create the report

Findings:

PDF gets created and for scaling test the generated graph are the same as it used to, which is expected.
The report for rapid does not get created.

The generated report is attached below:
metrics_report.pdf

grahamwhaley · 2019-12-03T11:08:25Z

@marcemq - can we check which commit you were based on? - the last commit in the new series (21953a7) moves the legends for the 'tidy' series under the graphttps://github.com//pull/271/commits/21953a75ad332c5f5cfd9c0da41e0d5eb490ec29hs, and I'm not seeing that in your PDF. It feels like maybe you need to pull a new version of my tree??

As for the failure in the collected rapid graphs - I suspect the code did not find the .tgz collectd file for the node - in your tree above, it looks like the clr-30f01b5149ba4ab8b05a7ee03b6812a5 file doesn't have a .tar.gz extention? That would probably cause the script to fail?

Can you check the commit you are on, the file names of the data files, and if it still fails, maybe do the first stage debug to work out why?

marcemq · 2019-12-03T17:53:40Z

Testing done

Having a all-in-one Kubernetes clusted:

cloned latest grahamwhaley/cloud-native-setup
checkout to 20191115_graph_improvements branch with commit 21953a75ad332
execute scale and rapid tests with default variables (./scaling/k8s_scale.sh and ./scaling/k8s_scale_rapid.sh )
created the required results directory using all above output results which looks like (since creating the results directory is currently manual I haven't copied the tar.gz before, this should be added in the README when creating the PDF report):

results/
├── Env.R
└── test-PR271
    ├── clr-30f01b5149ba4ab8b05a7ee03b6812a5.tar.gz
    ├── k8s-rapid.json
    └── k8s-scaling.json

hit ./report/makereport.sh to create the report.

Findings:

PDF gets created for scaling and rapid tests ✅
The page margins in the pdf are reduced as expected ✅
The legends for all graphs are at the button, which is expected, although when having large node names the legends gets cut in a weird manner (see graph below), should the large node names be cropped? ⁉️

The generated report is attached below:
metrics_report.pdf

grahamwhaley · 2019-12-03T20:23:13Z

Thanks @marcemq . I think the issue with super wide, or many, legends is sadly an inherent issue in ggplot2 itself I suspect. I don't think there is any easy nice fix that I can find. The most relevant Issue I found was maybe tidyverse/ggplot2#1502 - which basically says 'solving legend placement and size in the core ggplot2 code is really really hard'.

I had a play with legend bounding box size and placement, but did not find anything that worked. Two not so great workaround solutions I can think of then would be....

shrink the font size for the legend, cross our fingers, and hope the issue mostly goes away for our use cases
maybe we can try to trim all the strings to some short, but not too short, unique value (I'm sure there must be some string manipulations we can do to make that happen...) - which might then get the legends to fit, but may also make them unintelligable :-(

Interesting note then - if you double click (highlight) the cropped items in the pdf - you can see all of their text flowing off outside the page.

So, in summary:

I don't think this is something particularly new introduced by this PR - the legends with large text or many entries already did strange things - just this one now leaps out more in this case I think
I don't think there is an easy fix either
but, if you want to play with it (look up things like legend.margin), or have a tweak of the font sizes, please do have a play... but, I'd not invest too much time...

/cc @askervin as I think he's mentioned this before.

marcemq · 2019-12-03T20:43:33Z

@grahamwhaley I agree with your comments, and I'm OK on merging this PR as it is 😊

And lets open an issue to address 'what to do or how to report' the node names when it is large when creating a PDF report, so the fix can be applied to all report content. What do you think @grahamwhaley and @askervin about it?

askervin · 2019-12-04T09:07:29Z

lgtm.

@marcemq , opening an issue for handling very long hostnames is ok.

grahamwhaley · 2019-12-04T10:02:31Z

Issue for the long label poor rendering opened at #287 .

dklyle approved these changes Nov 15, 2019

View reviewed changes

dklyle added the scaling label Nov 15, 2019

marcemq suggested changes Nov 15, 2019

View reviewed changes

grahamwhaley force-pushed the 20191115_graph_improvements branch from 2bd4ac0 to 017c7c5 Compare November 18, 2019 11:30

Graham Whaley added 5 commits November 25, 2019 10:57

metrics: tidy: widen the graphs

21953a7

Move the legends to the bottom (underneath) for the tidy scaling graphs to make them wider on the page, and thus easier to read with more resolution. Signed-off-by: Graham Whaley <[email protected]>

grahamwhaley force-pushed the 20191115_graph_improvements branch from 017c7c5 to 21953a7 Compare November 25, 2019 11:03

marcemq approved these changes Dec 3, 2019

View reviewed changes

grahamwhaley merged commit 6298cf2 into clearlinux:master Dec 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metrics: graph improvements #271

metrics: graph improvements #271

grahamwhaley commented Nov 15, 2019

grahamwhaley commented Nov 15, 2019

dklyle left a comment

dklyle Nov 15, 2019

marcemq commented Nov 15, 2019

marcemq Nov 15, 2019

grahamwhaley commented Nov 18, 2019

askervin commented Nov 20, 2019 •

edited

Loading

grahamwhaley commented Nov 20, 2019

marcemq commented Nov 22, 2019

marcemq commented Nov 22, 2019

grahamwhaley commented Nov 22, 2019

grahamwhaley commented Nov 22, 2019

marcemq commented Nov 22, 2019

grahamwhaley commented Nov 25, 2019

marcemq commented Dec 2, 2019

grahamwhaley commented Dec 3, 2019

marcemq commented Dec 3, 2019

grahamwhaley commented Dec 3, 2019

marcemq commented Dec 3, 2019

askervin commented Dec 4, 2019

grahamwhaley commented Dec 4, 2019

metrics: graph improvements #271

metrics: graph improvements #271

Conversation

grahamwhaley commented Nov 15, 2019

grahamwhaley commented Nov 15, 2019

dklyle left a comment

Choose a reason for hiding this comment

dklyle Nov 15, 2019

Choose a reason for hiding this comment

marcemq commented Nov 15, 2019

marcemq Nov 15, 2019

Choose a reason for hiding this comment

grahamwhaley commented Nov 18, 2019

askervin commented Nov 20, 2019 • edited Loading

grahamwhaley commented Nov 20, 2019

marcemq commented Nov 22, 2019

marcemq commented Nov 22, 2019

grahamwhaley commented Nov 22, 2019

grahamwhaley commented Nov 22, 2019

marcemq commented Nov 22, 2019

grahamwhaley commented Nov 25, 2019

marcemq commented Dec 2, 2019

grahamwhaley commented Dec 3, 2019

marcemq commented Dec 3, 2019

grahamwhaley commented Dec 3, 2019

marcemq commented Dec 3, 2019

askervin commented Dec 4, 2019

grahamwhaley commented Dec 4, 2019

askervin commented Nov 20, 2019 •

edited

Loading