Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics: graph improvements #271

Merged

Conversation

grahamwhaley
Copy link

A small set of improvements to help graph readability in the report:

  • move the legends under the graphs, which allows the graphs to be wider
  • reduce the page margins in the pdf, which allows the graphs to be wider still
  • improve the lhs y axis for the interface drop/error graphs so the labels look sensible.

@grahamwhaley
Copy link
Author

As an example of how the graphs now look:

new_report

/cc @askervin @dklyle @obedmr @marcemq

Copy link
Contributor

@dklyle dklyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice improvements. thanks!

@@ -657,7 +657,7 @@ interface_drop_line_plot <- ggplot() +
labs(colour="") +
xlab("seconds") +
ylab("drops") +
scale_y_continuous(labels=comma, sec.axis=sec_axis(~ ./drop_scale, name="pods")) +
scale_y_continuous(breaks=pretty_breaks(), sec.axis=sec_axis(~ ./drop_scale, name="pods", labels=comma)) +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty_breaks, that's what I put into my code too

@dklyle dklyle added the scaling label Nov 15, 2019
@marcemq
Copy link
Contributor

marcemq commented Nov 15, 2019

Testing done

  • cloned grahamwhaley/cloud-native-setup
  • checkout to 20191115_graph_improvements branch
  • create the required results directory which looks like (with previous results files):
$ tree results/
results/
├── Env.R
├── test-PR-rapid
│   └── k8s-rapid.json
└── test-PR-scaling
    └── k8s-scaling.json
  • hit ./report/makereport.sh to create the report

Findings:

  • No graph were created, nor any of the two tests in placed
  • Only the table information gets created, but the last one does not get added into the PDF
  • the width page was increase for PDF but it breaks the page numbers at the bottom

The used results files are attached and the output directory created too:

@@ -8,6 +8,8 @@ author: "Auto generated"
date: "`r format(Sys.time(), '%d %B, %Y')`"
output:
pdf_document:
# Shrink the page margins so we get bigger/better resolution on the graphs
geometry: margin=1cm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The page numbers get broke with this margin.

@grahamwhaley
Copy link
Author

Thanks for the test and report @marcemq . I've an idea of most of what is going on:

  1. trimmed page numbers
    ah, yes, that is a bug in the PR. As I trimmed all the margins, the page numbers nearly fell off the bottom of the page. Fixed by now setting the top/bottom/left/right border widths independently.

  2. 'rapid' test not showing any results.
    I could not reproduce that one, as I think I'd also need the collectd node tgz files to go with the json result file.

  3. scaling test render fail
    I'm seeing a different error, which is ## Error in data.frame(..., check.names = FALSE): arguments imply differing number of rows: 21, 601. I know where that is coming from (trying to merge the node list into the results list, which is a 21 item and 601 item mismatch). I think that might be a consequence of some effect of using STEP in your test run. I don't think that is related to this PR directly.

I've pushed a PR update for (1). I think (2) and (3) we should reproduce, document the problem more clearly, and PR separately.

I think (2) and (3) are not related to the changes in this PR. Could you:

  • verify if the HEAD code renders a pdf from your results files OK?
  • try this PR with the default STEP=1 NUM_PODS=20 data

It is possible there are some fixes in the HEAD branch that are not in this PR baseline - I can rebase and repush if we need etc.

@askervin
Copy link
Contributor

askervin commented Nov 20, 2019

@grahamwhaley, LGTM. This makes it possible to see graphs in cases with many interfaces.

I'm good with merging this already. But if something could be improved, then a combination of longish test name + host name + many interfaces makes the legend unreadable. Of course this cannot be fixed for all possible cases, but using smaller font would help at least in this case:

drops

The font in the caption looks slightly smaller than the font in the legend at the moment.

@grahamwhaley
Copy link
Author

@askervin thx. I think the problems @marcemq has seen are not specific to this PR (I have seen similar errors - I think there is something not quite write matching STEP!=1, but have not found an easy way to reproduce yet).

Yeah, large amounts of data in the tables tends to blow them outside of the page :-( I've seen that many times with the 'device under test' table when I am comparing many data runs - you can see the 'shrink the font' sticky plaster on that table at https://github.com/clearlinux/cloud-native-setup/blob/master/metrics/report/report_dockerfile/dut-details.R#L92-L99

# Build us a text table of numerical results
# Set up as left hand justify, so the node data indent renders.
tablefontsize=8
tbody.style = tbody_style(hjust=0, x=0.1, size=tablefontsize)
stats_plot = suppressWarnings(ggtexttable(data.frame(spun_stats, check.names=FALSE),
	theme=ttheme(base_size=tablefontsize, tbody.style=tbody.style),
	rows=NULL
	))

We could try applying that to the interface tables and see what it looks like - do you have a few cycles to try that with your data?

@marcemq
Copy link
Contributor

marcemq commented Nov 22, 2019

Testing done

  • cloned grahamwhaley/cloud-native-setup
  • checkout to 20191115_graph_improvements branch
  • execute scale and rapid tests with default variables (./scaling/k8s_scale.sh and ./scaling/k8s_scale_rapid.sh )
  • create the required results directory using above output results which looks like (with previous results files):
$ tree results/
results/
├── Env.R
├── test-PR-rapid
│   └── k8s-rapid.json
└── test-PR-scaling
    └── k8s-scaling.json

2 directories, 3 files
  • hit ./report/makereport.sh to create the report

Findings:

  • No graph were created, nor any of the two tests in placed
  • Only the table information gets created, but the last one does not get added into the PDF

The used results files are attached and the output directory created too:
metrics_report.pdf
PR271.zip

@marcemq
Copy link
Contributor

marcemq commented Nov 22, 2019

Heads up

I've attempt to generate the report for the output files mentioned above separately and for k8s-scaling.json the report was generated w/o errors. Find attached the reports generated separately:

So I'm not sure if the errors came from having multiple directories, which I've previously have tested w/o errors, or is it due to the rapid test, anyway, @grahamwhaley I've detailed the steps that I've followed in case you want to reproduce this behaviour.

@grahamwhaley
Copy link
Author

Thanks @marcemq - I tried to generate with the commit before this PR - commit 54adf53 - and that didn't work for me either - I think the problem is probably not in this PR, but already existed.
I'l see if I can see the root cause (I have a feeling it is going to be the incorrect k8s-... .* regexp file match somewhere)....

In the meantime, do you think you can verify the commits in this PR are OK?

@grahamwhaley
Copy link
Author

OK, I think I know what the bug with the k8s-scaling (tidy) report is - there is a bad assumption in the R that tries to use a piece of data, even if there were no files to process in the directory it just scanned. The error is at https://github.com/clearlinux/cloud-native-setup/blob/master/metrics/report/report_dockerfile/tidy_scaling.R#L198 - and the two lines there need to be lifted inside the for loop.
I've seen this before - thought I'd already fixed that (unless it is in another stuck PR, or a PR I never got to land).

@marcemq - fancy seeing if you can move those lines and seeing if that helps - and if so, opening a PR to fix it?

@marcemq
Copy link
Contributor

marcemq commented Nov 22, 2019

@grahamwhaley I've moved the lines that you've pointed out inside the inner loop and did not make any difference, and even if the reports gets generated only for scaling test the legend is still at the right side.

scaling-1

Graham Whaley added 5 commits November 25, 2019 10:57
Move the legends under the graphs to give more width, and thus
resolution, to the final pictures.
This works well for the collectd graphs as they are spread out
into sets of single column graphs per page.

Signed-off-by: Graham Whaley <[email protected]>
The pdf output by default has large page margins, which wastes a lot of
page space, and reduces our 'resolution'. Shrink the margins to a pretty
minimal 1cm to increase the graph resolution. The document itself then
does not look as 'pretty', but we can see more data visually.

Signed-off-by: Graham Whaley <[email protected]>
Most of the time we have 0 interface errors or drops, so we pin the y
scale to '1', so we don't hit 'infinity' errors. That left us with a
strange y-axis label anomoly - as the axis was automatically divided
into 5 labels, and we got for some reason the sequence '0,0,0,1,1'.
That just plain looked wrong and confusing.
Fix it by using `pretty_breaks()` for the error/drop y axis, whilst
maintaining the `comma` count for the pod count y axis.

Signed-off-by: Graham Whaley <[email protected]>
The bootdata assignments were outside the 'valid file' check loop,
which meant in the case there was a data directory which did not
contain a valid scaling file, we would fail the assignment (as the
`local_bootdata` would be empty).

Fix by moving the assignments into the loop, thus only assigning when
we know we have valid data.

Signed-off-by: Graham Whaley <[email protected]>
Move the legends to the bottom (underneath) for the tidy scaling graphs
to make them wider on the page, and thus easier to read with more
resolution.

Signed-off-by: Graham Whaley <[email protected]>
@grahamwhaley
Copy link
Author

@marcemq - indeed, if you looked at the commits you would have seen they only updated the graph width for the collectd (rapid) graphs.... but....
I've had a check if we could do the same for the tidy (non-rapid) case, and it works fine. I've thus updated the patch sequence:

  • to bring the local_data assignment inside the loop, so the non-rapid tidy report will work when there are directories with no non-rapid results
  • to expand the width of the graphs for the non-rapid tidy case as well.

What you will still see with your data is failure to render for the rapid case due to #276 not having been reviewed and merged (so I cannot rebase on it)....

@marcemq
Copy link
Contributor

marcemq commented Dec 2, 2019

Testing done

  • cloned grahamwhaley/cloud-native-setup
  • checkout to 20191115_graph_improvements branch
  • execute scale and rapid tests with default variables (./scaling/k8s_scale.sh and ./scaling/k8s_scale_rapid.sh )
  • create the required results directory using above output results which looks like:
    $ tree results/
$ tree results/
results/
├── Env.R
└── run1
    ├── clr-30f01b5149ba4ab8b05a7ee03b6812a5
    ├── k8s-rapid.json
    └── k8s-scaling.json

2 directories, 3 files
  • hit ./report/makereport.sh to create the report

Findings:

  • PDF gets created and for scaling test the generated graph are the same as it used to, which is expected.
  • The report for rapid does not get created.

The generated report is attached below:
metrics_report.pdf

@grahamwhaley
Copy link
Author

@marcemq - can we check which commit you were based on? - the last commit in the new series (21953a7) moves the legends for the 'tidy' series under the graphttps://github.com//pull/271/commits/21953a75ad332c5f5cfd9c0da41e0d5eb490ec29hs, and I'm not seeing that in your PDF. It feels like maybe you need to pull a new version of my tree??

As for the failure in the collected rapid graphs - I suspect the code did not find the .tgz collectd file for the node - in your tree above, it looks like the clr-30f01b5149ba4ab8b05a7ee03b6812a5 file doesn't have a .tar.gz extention? That would probably cause the script to fail?

Can you check the commit you are on, the file names of the data files, and if it still fails, maybe do the first stage debug to work out why?

@marcemq
Copy link
Contributor

marcemq commented Dec 3, 2019

Testing done

Having a all-in-one Kubernetes clusted:

  • cloned latest grahamwhaley/cloud-native-setup
  • checkout to 20191115_graph_improvements branch with commit 21953a75ad332
  • execute scale and rapid tests with default variables (./scaling/k8s_scale.sh and ./scaling/k8s_scale_rapid.sh )
  • created the required results directory using all above output results which looks like (since creating the results directory is currently manual I haven't copied the tar.gz before, this should be added in the README when creating the PDF report):
results/
├── Env.R
└── test-PR271
    ├── clr-30f01b5149ba4ab8b05a7ee03b6812a5.tar.gz
    ├── k8s-rapid.json
    └── k8s-scaling.json
  • hit ./report/makereport.sh to create the report.

Findings:

  • PDF gets created for scaling and rapid tests ✅
  • The page margins in the pdf are reduced as expected ✅
  • The legends for all graphs are at the button, which is expected, although when having large node names the legends gets cut in a weird manner (see graph below), should the large node names be cropped? ⁉️
    collectd-5

The generated report is attached below:
metrics_report.pdf

@grahamwhaley
Copy link
Author

Thanks @marcemq . I think the issue with super wide, or many, legends is sadly an inherent issue in ggplot2 itself I suspect. I don't think there is any easy nice fix that I can find. The most relevant Issue I found was maybe tidyverse/ggplot2#1502 - which basically says 'solving legend placement and size in the core ggplot2 code is really really hard'.

I had a play with legend bounding box size and placement, but did not find anything that worked. Two not so great workaround solutions I can think of then would be....

  • shrink the font size for the legend, cross our fingers, and hope the issue mostly goes away for our use cases
  • maybe we can try to trim all the strings to some short, but not too short, unique value (I'm sure there must be some string manipulations we can do to make that happen...) - which might then get the legends to fit, but may also make them unintelligable :-(

Interesting note then - if you double click (highlight) the cropped items in the pdf - you can see all of their text flowing off outside the page.

So, in summary:

  • I don't think this is something particularly new introduced by this PR - the legends with large text or many entries already did strange things - just this one now leaps out more in this case I think
  • I don't think there is an easy fix either
  • but, if you want to play with it (look up things like legend.margin), or have a tweak of the font sizes, please do have a play... but, I'd not invest too much time...

/cc @askervin as I think he's mentioned this before.

@marcemq
Copy link
Contributor

marcemq commented Dec 3, 2019

@grahamwhaley I agree with your comments, and I'm OK on merging this PR as it is 😊

And lets open an issue to address 'what to do or how to report' the node names when it is large when creating a PDF report, so the fix can be applied to all report content. What do you think @grahamwhaley and @askervin about it?

@askervin
Copy link
Contributor

askervin commented Dec 4, 2019

lgtm.

@marcemq , opening an issue for handling very long hostnames is ok.

@grahamwhaley
Copy link
Author

Issue for the long label poor rendering opened at #287 .

@grahamwhaley grahamwhaley merged commit 6298cf2 into clearlinux:master Dec 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants