Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test to demonstrate issue mesalib #237

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

test to demonstrate issue mesalib #237

wants to merge 5 commits into from

Conversation

doutriaux1
Copy link
Contributor

@danlipsa the added test passes under a regular VTK build, but a VTK-mesalib shows a leak.

This is kind of a show stopper for both @zshaheen and @durack1 do you or @sankhesh or @aashish24 have any cycle for this at the moment? Thanks.

@doutriaux1 doutriaux1 added the bug label Aug 24, 2017
@danlipsa
Copy link
Contributor

@doutriaux1 I will take a look at this.

@durack1
Copy link
Member

durack1 commented Aug 24, 2017

@danlipsa that would be great, please do take a look into this.. Since 2015 it's been a problem (see CDAT/cdat#1424) and it's really tripping over most of the inhouse VCS users here at LLNL - Happy to provide feedback on any further investigations/PRs

More chatter in PCMDI/amipbcs#10

@durack1
Copy link
Member

durack1 commented Aug 25, 2017

@danlipsa I have a fairly big investment in this, so please do ping me if/when you want me to provide some feedback - the demo that @doutriaux1 has provided should provide much of the code that we're testing.. But my environment is significantly different to yours..

@danlipsa
Copy link
Contributor

@durack1 Thanks! I can reproduce the bug so I am all set. The test as it it written has a small vcs.elements leak, but fixing that did not make a difference. So I am still digging ...

@danlipsa
Copy link
Contributor

@doutriaux1 @durack1 @aashish24 @zshaheen It seems that certain resources get accumulated in the render window and are not released. As a workarounds, just set x.backend.renWin = None at the end of your loop. That fixes Charles' test.

@danlipsa
Copy link
Contributor

Could you try that in your programs. Thanks!

@durack1
Copy link
Member

durack1 commented Aug 25, 2017

@danlipsa excellent, will kick off a run now..

@doutriaux1 what was the other object that I needed to purge/delete - in the code that I provided you?

@durack1
Copy link
Member

durack1 commented Aug 25, 2017

@danlipsa I added in the x.backend.renWin = None and the vcs.removeobject(tmpl) from @doutriaux1 and I still get a nasty memory growth (Max mem), plot slowdown (time) and rogue python objects (PyObj#) accumulating - my script which was was trimmed down to the provide the demo is at PCMDI/amipbcs/make_newVsOldDiffs.py:

REMOVED OLD EXAMPLE NUMBERS AS THIS WAS FROM THE WRONG ENV

And full disclosure this code is running on an active VNC session

@danlipsa
Copy link
Contributor

@durack1 Is this better than what you had before x.backend.renWin = None? This instruction solves completly, Charles' test simplified a little. But of course, this is not necessary the best solution as creating and destroying the window at every iteration takes time. It will just help me narrow down the problem.

@durack1
Copy link
Member

durack1 commented Aug 26, 2017

@danlipsa, the numbers below are before I included the x.backend.renWin = None and the vcs.removeobject(tmpl) from @doutriaux1:

[duro@ocean 150219_AMIPForcingData]$ head -n 40 170822_1756_make_newVsOldDiffs.txt
UV-CDAT version:      2.10-19-gba336f2
UV-CDAT prefix:       /export/duro/anaconda2/envs/uvcdatNightly170822
delFudge:             False
Background graphics:  True
donotstoredisplay:    True
00001 processing: 1870-01 sic     Time: 23.881 secs; Max mem: 1.549 GB PyObj#: 0113596;
00002 processing: 1870-02 sic     Time: 18.564 secs; Max mem: 2.031 GB PyObj#: 0113661;
00003 processing: 1870-03 sic     Time: 17.349 secs; Max mem: 2.052 GB PyObj#: 0113581;
00004 processing: 1870-04 sic     Time: 20.931 secs; Max mem: 2.057 GB PyObj#: 0113664;
00005 processing: 1870-05 sic     Time: 16.427 secs; Max mem: 2.057 GB PyObj#: 0113873;
00006 processing: 1870-06 sic     Time: 17.019 secs; Max mem: 2.057 GB PyObj#: 0114158;
00007 processing: 1870-07 sic     Time: 16.797 secs; Max mem: 2.057 GB PyObj#: 0113959;
00008 processing: 1870-08 sic     Time: 18.188 secs; Max mem: 2.057 GB PyObj#: 0114122;
00009 processing: 1870-09 sic     Time: 19.846 secs; Max mem: 2.057 GB PyObj#: 0114245;
00010 processing: 1870-10 sic     Time: 16.137 secs; Max mem: 2.057 GB PyObj#: 0114420;
00011 processing: 1870-11 sic     Time: 17.955 secs; Max mem: 2.057 GB PyObj#: 0114298;
00012 processing: 1870-12 sic     Time: 19.820 secs; Max mem: 2.058 GB PyObj#: 0114501;
00013 processing: 1871-01 sic     Time: 22.262 secs; Max mem: 2.058 GB PyObj#: 0114617;
00014 processing: 1871-02 sic     Time: 21.922 secs; Max mem: 2.058 GB PyObj#: 0114636;
00015 processing: 1871-03 sic     Time: 17.976 secs; Max mem: 2.058 GB PyObj#: 0114761;
00016 processing: 1871-04 sic     Time: 19.530 secs; Max mem: 2.058 GB PyObj#: 0114742;
00017 processing: 1871-05 sic     Time: 19.493 secs; Max mem: 2.058 GB PyObj#: 0114939;
00018 processing: 1871-06 sic     Time: 20.545 secs; Max mem: 2.058 GB PyObj#: 0114983;
00019 processing: 1871-07 sic     Time: 19.503 secs; Max mem: 2.058 GB PyObj#: 0115162;
00020 processing: 1871-08 sic     Time: 19.034 secs; Max mem: 2.058 GB PyObj#: 0114902;
00021 processing: 1871-09 sic     Time: 15.348 secs; Max mem: 2.058 GB PyObj#: 0115256;
00022 processing: 1871-10 sic     Time: 15.676 secs; Max mem: 2.058 GB PyObj#: 0115405;
00023 processing: 1871-11 sic     Time: 15.400 secs; Max mem: 2.058 GB PyObj#: 0115556;
00024 processing: 1871-12 sic     Time: 16.803 secs; Max mem: 2.058 GB PyObj#: 0115416;
00025 processing: 1872-01 sic     Time: 22.079 secs; Max mem: 2.058 GB PyObj#: 0115545;
00026 processing: 1872-02 sic     Time: 24.532 secs; Max mem: 2.058 GB PyObj#: 0115942;
00027 processing: 1872-03 sic     Time: 26.946 secs; Max mem: 2.058 GB PyObj#: 0115766;
00028 processing: 1872-04 sic     Time: 16.901 secs; Max mem: 2.058 GB PyObj#: 0115751;
00029 processing: 1872-05 sic     Time: 17.029 secs; Max mem: 2.058 GB PyObj#: 0115898;
00030 processing: 1872-06 sic     Time: 23.446 secs; Max mem: 2.058 GB PyObj#: 0115998;
00031 processing: 1872-07 sic     Time: 19.066 secs; Max mem: 2.058 GB PyObj#: 0116145;
00032 processing: 1872-08 sic     Time: 16.257 secs; Max mem: 2.058 GB PyObj#: 0116059;

@durack1
Copy link
Member

durack1 commented Aug 26, 2017

@danlipsa, ok now I am using the correct env - on memory (2.126 [new] vs 2.058 GB [old]) we're worse, but on python objects (114320 vs 116059) and time (4.705 vs 16.257 secs) we're winning, with all these comparisons at step 32:

(uvcdatNightly170822) duro@ocean:[150219_AMIPForcingData]:[5048]> python make_newVsOldDiffs.py
UV-CDAT version:      2.10-19-gba336f2
UV-CDAT prefix:       /export/duro/anaconda2/envs/uvcdatNightly170822
delFudge:             False
Background graphics:  True
donotstoredisplay:    True
00001 processing: 1870-01 sic     Time: 05.274 secs; Max mem: 1.557 GB PyObj#: 0113578;
00002 processing: 1870-02 sic     Time: 04.759 secs; Max mem: 2.044 GB PyObj#: 0113525;
00003 processing: 1870-03 sic     Time: 03.627 secs; Max mem: 2.069 GB PyObj#: 0113698;
00004 processing: 1870-04 sic     Time: 03.601 secs; Max mem: 2.073 GB PyObj#: 0113664;
00005 processing: 1870-05 sic     Time: 03.604 secs; Max mem: 2.076 GB PyObj#: 0113277;
00006 processing: 1870-06 sic     Time: 03.729 secs; Max mem: 2.079 GB PyObj#: 0113555;
00007 processing: 1870-07 sic     Time: 04.434 secs; Max mem: 2.080 GB PyObj#: 0113879;
00008 processing: 1870-08 sic     Time: 03.627 secs; Max mem: 2.083 GB PyObj#: 0113952;
00009 processing: 1870-09 sic     Time: 03.675 secs; Max mem: 2.085 GB PyObj#: 0113786;
00010 processing: 1870-10 sic     Time: 03.659 secs; Max mem: 2.087 GB PyObj#: 0113684;
00011 processing: 1870-11 sic     Time: 03.621 secs; Max mem: 2.090 GB PyObj#: 0113960;
00012 processing: 1870-12 sic     Time: 03.625 secs; Max mem: 2.091 GB PyObj#: 0113690;
00013 processing: 1871-01 sic     Time: 04.016 secs; Max mem: 2.092 GB PyObj#: 0113646;
00014 processing: 1871-02 sic     Time: 04.120 secs; Max mem: 2.095 GB PyObj#: 0113954;
00015 processing: 1871-03 sic     Time: 03.721 secs; Max mem: 2.096 GB PyObj#: 0114077;
00016 processing: 1871-04 sic     Time: 03.626 secs; Max mem: 2.100 GB PyObj#: 0113753;
00017 processing: 1871-05 sic     Time: 04.682 secs; Max mem: 2.102 GB PyObj#: 0114001;
00018 processing: 1871-06 sic     Time: 03.607 secs; Max mem: 2.103 GB PyObj#: 0114024;
00019 processing: 1871-07 sic     Time: 04.384 secs; Max mem: 2.104 GB PyObj#: 0113906;
00020 processing: 1871-08 sic     Time: 03.683 secs; Max mem: 2.105 GB PyObj#: 0113749;
00021 processing: 1871-09 sic     Time: 03.652 secs; Max mem: 2.108 GB PyObj#: 0114004;
00022 processing: 1871-10 sic     Time: 03.691 secs; Max mem: 2.109 GB PyObj#: 0113991;
00023 processing: 1871-11 sic     Time: 03.614 secs; Max mem: 2.109 GB PyObj#: 0113901;
00024 processing: 1871-12 sic     Time: 03.630 secs; Max mem: 2.111 GB PyObj#: 0114053;
00025 processing: 1872-01 sic     Time: 04.298 secs; Max mem: 2.111 GB PyObj#: 0113973;
00026 processing: 1872-02 sic     Time: 04.367 secs; Max mem: 2.115 GB PyObj#: 0114177;
00027 processing: 1872-03 sic     Time: 03.627 secs; Max mem: 2.116 GB PyObj#: 0113929;
00028 processing: 1872-04 sic     Time: 03.637 secs; Max mem: 2.117 GB PyObj#: 0113912;
00029 processing: 1872-05 sic     Time: 03.710 secs; Max mem: 2.121 GB PyObj#: 0114096;
00030 processing: 1872-06 sic     Time: 03.617 secs; Max mem: 2.125 GB PyObj#: 0114092;
00031 processing: 1872-07 sic     Time: 03.641 secs; Max mem: 2.126 GB PyObj#: 0114213;
00032 processing: 1872-08 sic     Time: 04.705 secs; Max mem: 2.126 GB PyObj#: 0114320;

@doutriaux1
Copy link
Contributor Author

@durack1 in your scriot you also want to plot s1s (not s1) and use s1s-s2 to compute diff, your memory use will go down significantly (like 1.5Gb)

@doutriaux1
Copy link
Contributor Author

@durack1, @danlipsa mentioned that if fixed the small (one object) leak that is left. Also the fact that you are using this HUGE array and that you read it everytime from the file (if I remember correctly) probably leads to the known cdtime memory leak (8bytes i think) accumulating a lot.

@danlipsa
Copy link
Contributor

@doutriaux1 @durack1 In Charles' test there is a small leak, I solved using:
I replaced:
tmpl.legend.textorientation = leg
with
oldOrientation = tmpl.legend.textorientation
tmpl.legend.textorientation = leg
vcs.removeobject(vcs.elements['textorientation'][oldOrientation])

@doutriaux1
Copy link
Contributor Author

thanks @danlipsa did I forget to clean it up? Or should it be cleaned up automatically?

@doutriaux1
Copy link
Contributor Author

oh yes looking at the code, I forgot to remove the object.

durack1 added a commit to PCMDI/amipbcs that referenced this pull request Aug 28, 2017
@durack1
Copy link
Member

durack1 commented Aug 28, 2017

@danlipsa @doutriaux1 great, this is looking better. Following your suggestions above:

(uvcdatNightly170822) duro@ocean:[150219_AMIPForcingData]:[5048]> python make_newVsOldDiffs.py
UV-CDAT version:      2.10-19-gba336f2
UV-CDAT prefix:       /home/duro/anaconda2/envs/uvcdatNightly170822
delFudge:             False
Background graphics:  True
donotstoredisplay:    True
00001 processing: 1870-01 sic     Time: 07.130 secs; Max mem: 1.042 GB PyObj#: 0113461;
00002 processing: 1870-02 sic     Time: 04.834 secs; Max mem: 1.042 GB PyObj#: 0113601;
00003 processing: 1870-03 sic     Time: 03.076 secs; Max mem: 1.042 GB PyObj#: 0113615;
00004 processing: 1870-04 sic     Time: 03.180 secs; Max mem: 1.042 GB PyObj#: 0113652;
00005 processing: 1870-05 sic     Time: 03.000 secs; Max mem: 1.042 GB PyObj#: 0113970;
00006 processing: 1870-06 sic     Time: 03.791 secs; Max mem: 1.042 GB PyObj#: 0113906;
00007 processing: 1870-07 sic     Time: 02.991 secs; Max mem: 1.042 GB PyObj#: 0113926;
00008 processing: 1870-08 sic     Time: 02.919 secs; Max mem: 1.042 GB PyObj#: 0113960;
00009 processing: 1870-09 sic     Time: 02.974 secs; Max mem: 1.042 GB PyObj#: 0114148;
00010 processing: 1870-10 sic     Time: 03.607 secs; Max mem: 1.042 GB PyObj#: 0114222;
00011 processing: 1870-11 sic     Time: 02.945 secs; Max mem: 1.042 GB PyObj#: 0114167;
00012 processing: 1870-12 sic     Time: 03.013 secs; Max mem: 1.042 GB PyObj#: 0114282;
00013 processing: 1871-01 sic     Time: 03.410 secs; Max mem: 1.042 GB PyObj#: 0114201;
00014 processing: 1871-02 sic     Time: 02.976 secs; Max mem: 1.042 GB PyObj#: 0114613;
00015 processing: 1871-03 sic     Time: 02.977 secs; Max mem: 1.042 GB PyObj#: 0114779;
00016 processing: 1871-04 sic     Time: 02.968 secs; Max mem: 1.042 GB PyObj#: 0114625;
00017 processing: 1871-05 sic     Time: 02.988 secs; Max mem: 1.042 GB PyObj#: 0114678;
00018 processing: 1871-06 sic     Time: 02.943 secs; Max mem: 1.042 GB PyObj#: 0114901;
00019 processing: 1871-07 sic     Time: 02.956 secs; Max mem: 1.042 GB PyObj#: 0114914;
00020 processing: 1871-08 sic     Time: 02.908 secs; Max mem: 1.042 GB PyObj#: 0115022;
00021 processing: 1871-09 sic     Time: 04.797 secs; Max mem: 1.042 GB PyObj#: 0115094;
00022 processing: 1871-10 sic     Time: 02.898 secs; Max mem: 1.042 GB PyObj#: 0115275;
00023 processing: 1871-11 sic     Time: 03.599 secs; Max mem: 1.042 GB PyObj#: 0115421;
00024 processing: 1871-12 sic     Time: 06.025 secs; Max mem: 1.042 GB PyObj#: 0115075;
00025 processing: 1872-01 sic     Time: 06.746 secs; Max mem: 1.042 GB PyObj#: 0115169;
00026 processing: 1872-02 sic     Time: 07.839 secs; Max mem: 1.042 GB PyObj#: 0115396;
00027 processing: 1872-03 sic     Time: 03.094 secs; Max mem: 1.042 GB PyObj#: 0115450;
00028 processing: 1872-04 sic     Time: 04.236 secs; Max mem: 1.042 GB PyObj#: 0115412;
00029 processing: 1872-05 sic     Time: 02.932 secs; Max mem: 1.042 GB PyObj#: 0115670;
00030 processing: 1872-06 sic     Time: 03.800 secs; Max mem: 1.042 GB PyObj#: 0115561;
00031 processing: 1872-07 sic     Time: 04.059 secs; Max mem: 1.042 GB PyObj#: 0116015;
00032 processing: 1872-08 sic     Time: 04.307 secs; Max mem: 1.042 GB PyObj#: 0115765;
...

It still seems those python objects are continuing to grow however..? @danlipsa not sure whether the comment by @dlonie at CDAT/cdat#1424 (comment) is something to consider?

@danlipsa
Copy link
Contributor

@durack1 I would not worry about the number of objects if the memory does not grow. So now you don't see any leak in your program (just looking at memory reported)? How is the running time? Maybe this is a valid workaround for now.

@durack1
Copy link
Member

durack1 commented Aug 28, 2017

@danlipsa I'll run the script to completion and see what happens to time and memory usage.. In the past it's taken a little time to get to the ridiculous numbers.. It used to take a day or more to complete the script, with the shorter times (reported above ~4 vs ~15s) I'll hopefully have a log to upload here this afternoon..

@danlipsa
Copy link
Contributor

danlipsa commented Aug 28, 2017

@durack1 @doutriaux1 I run a simplified version of Charles' test with the workaround over 200 times and I don't get any increase in memory or the number objects. The simplification is that I only run one plot not three so that I get the simplest program where I still was seeing the problem.

00220 processing: 219 Time: 00.699 secs; Max mem: 0.261 GB PyObj#: 0124097;
plotApps: 0, text_renderers: 0, _renderers: 0

00221 processing: 220 Time: 00.699 secs; Max mem: 0.261 GB PyObj#: 0124097;
plotApps: 0, text_renderers: 0, _renderers: 0

00222 processing: 221 Time: 00.705 secs; Max mem: 0.261 GB PyObj#: 0124097;
plotApps: 0, text_renderers: 0, _renderers: 0

00223 processing: 222 Time: 00.729 secs; Max mem: 0.261 GB PyObj#: 0124097;
plotApps: 0, text_renderers: 0, _renderers: 0

00224 processing: 223 Time: 00.747 secs; Max mem: 0.261 GB PyObj#: 0123986;
plotApps: 0, text_renderers: 0, _renderers: 0

00225 processing: 224 Time: 00.721 secs; Max mem: 0.261 GB PyObj#: 0123986;
plotApps: 0, text_renderers: 0, _renderers: 0

00226 processing: 225 Time: 00.720 secs; Max mem: 0.261 GB PyObj#: 0123986;
plotApps: 0, text_renderers: 0, _renderers: 0

00227 processing: 226 Time: 00.722 secs; Max mem: 0.261 GB PyObj#: 0123986;
plotApps: 0, text_renderers: 0, _renderers: 0

00228 processing: 227 Time: 00.703 secs; Max mem: 0.261 GB PyObj#: 0123986;
plotApps: 0, text_renderers: 0, _renderers: 0

00229 processing: 228 Time: 00.708 secs; Max mem: 0.261 GB PyObj#: 0123986;
plotApps: 0, text_renderers: 0, _renderers: 0

00230 processing: 229 Time: 00.721 secs; Max mem: 0.261 GB PyObj#: 0124097;
plotApps: 0, text_renderers: 0, _renderers: 0

00231 processing: 230 Time: 00.706 secs; Max mem: 0.261 GB PyObj#: 0123986;
plotApps: 0, text_renderers: 0, _renderers: 0

00232 processing: 231 Time: 00.726 secs; Max mem: 0.261 GB PyObj#: 0124097;
plotApps: 0, text_renderers: 0, _renderers: 0

00233 processing: 232 Time: 00.701 secs; Max mem: 0.261 GB PyObj#: 0123986;
plotApps: 0, text_renderers: 0, _renderers: 0

00234 processing: 233 Time: 00.699 secs; Max mem: 0.261 GB PyObj#: 0124097;
plotApps: 0, text_renderers: 0, _renderers: 0

00235 processing: 234 Time: 00.699 secs; Max mem: 0.261 GB PyObj#: 0124097;

@durack1
Copy link
Member

durack1 commented Aug 29, 2017

@danlipsa attached is the logfile for the output using all the latest tweaks you guys suggested, it seems these tweaks have considerably reduced the time taken, and the memory usage, which now maxes out at 2.2Gb rather than 100+Gb.. I am a little curious about the huge virtual memory footprint (which is over 30Gb) but that's an aside
170828_1030_make_newVsOldDiffs.txt

If we were able to speed things up to be sub-1sec timing for each panel that would be ideal, I know your example above is simpler, but the timing is certainly appealing..

@jypeter
Copy link
Member

jypeter commented Aug 29, 2017

I have been following the resource monitoring of this thread with interest. I'm wondering how the results of time.time() would differ from time.clock()

    time() -- return current time in seconds since the Epoch as a float
    clock() -- return CPU time since process start as a float

Is time() (that is, the difference of 2 times) going to give bigger values than clock() just on multi-user servers?

@danlipsa
Copy link
Contributor

@doutriaux1 @aashish24 @durack1 @williams13 I built a OpenGL2 VTK backend and run the test (without the workaround). It does not show the memory leak - also seems to run much faster. So, I would use the workaround in the client applications for now and remove the workaround once we switch to the new backend.

@doutriaux1
Copy link
Contributor Author

@danlipsa should we try to stick it somewhere in x.clear()

@durack1
Copy link
Member

durack1 commented Aug 29, 2017

@danlipsa "..much faster.." now you have me excited.. How much faster, can you quantify?

Also apologies for harping, but any insights on the very large (~40Gb) virtual memory footprint?

@durack1
Copy link
Member

durack1 commented Aug 29, 2017

@danlipsa @doutriaux1 as a user, I would strongly advocate that all the "workaround" bits are incorporated into the x.clear() and x.close() so that as many of the objects that are created by a user are cleared when the canvas is closed. I realize this will be a problem for template objects, but maybe even adding a warning ("template objects used have not been cleared..") would be useful..?

@doutriaux1
Copy link
Contributor Author

@durack1 we cannot do that it would just be plain wrong. BUT I think you're right we should add a vcs.reset() function that clears everything created by the user. Note that this would not solve the above problem (mem leak) since all object are cleanly removed here.

@danlipsa
Copy link
Contributor

danlipsa commented Aug 29, 2017

"..much faster.." now you have me excited.. How much faster, can you quantify?

OpenGL2: ~ .39
OpenGL1 with window close: ~ .73
OpenGL1 without window close: .53

So it seems window closing/opening does have some overhead which is to be expected. We could close the window every 10 time steps if we are worried about that.

Also apologies for harping, but any insights on the very large (~40Gb) virtual memory footprint?
Not really. Maybe lots of dynamic libraries linked into VTK, vcs, cdms, python, ...

@jypeter
Copy link
Member

jypeter commented Aug 29, 2017

@durack1 are you doing any regridding in your script before plotting? This can be really memory hungry... :/

@danlipsa
Copy link
Contributor

@doutriaux1 @durack1 We cannot really close() the window when the user only requested clear(). So this has to be done at user level. I would not add an additional function - close() already does this.

@durack1
Copy link
Member

durack1 commented Aug 29, 2017

@danlipsa this sounds great.. I would certainly like to throw myself out for testing of the openGL2 stuff, as I'm using the VNC virtual sessions I'm sure to hit a bunch of issues that would be great to solve.. Is RHEL6 going to be a problem for openGL2? Most of our workstations are still running the older OS

@durack1
Copy link
Member

durack1 commented Aug 29, 2017

@durack1 are you doing any regridding in your script before plotting? This can be really memory hungry... :/

@jypeter nope.

@danlipsa
Copy link
Contributor

@durack1 It really depends on the graphics driver. If what is installed in the system does not support OpenGL 3.3 we do provide osmesa which does. We should be able to have onscreen vcs that uses mesa as well, if the system driver is older. See
#230

@durack1
Copy link
Member

durack1 commented Aug 29, 2017

BUT I think you're right we should add a vcs.reset() function that clears everything created by the user.

@doutriaux1 I think something like this would be useful - well I would certainly use it to make sure that I delete all the items I have unwittingly (or knowingly) created

@doutriaux1
Copy link
Contributor Author

according to: https://bugs.freedesktop.org/show_bug.cgi?id=102844

a mem leak has been fixed in 17.2.2 we are using 17.2.0 trying latest (17.3.0) to see if it helps.

@doutriaux1
Copy link
Contributor Author

nope... still no diff

('00006', 'processing: 5', 'Time: 02.383 secs;', 'Max mem: 562.913 GB', 'PyObj#: 0123680;')
('00007', 'processing: 6', 'Time: 02.314 secs;', 'Max mem: 615.711 GB', 'PyObj#: 0123684;')
('00008', 'processing: 7', 'Time: 02.327 secs;', 'Max mem: 669.274 GB', 'PyObj#: 0123688;')
('00009', 'processing: 8', 'Time: 02.360 secs;', 'Max mem: 715.850 GB', 'PyObj#: 0123692;')
('00010', 'processing: 9', 'Time: 02.226 secs;', 'Max mem: 758.624 GB', 'PyObj#: 0123766;')
(50.07593347878791, 313.64561454545446)
F
======================================================================
FAIL: testMesaLeak (test_vcs_mesa_leak.VCSTestMesaLeak)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/git/vcs/tests/test_vcs_mesa_leak.py", line 116, in testMesaLeak
    self.assertTrue(abs(a)<1.e-3)
AssertionError: False is not true
----------------------------------------------------------------------
Ran 1 test in 24.257s
FAILED (failures=1)
Ran 1 tests, 1 failed (0.00% success)
Failed tests:
	 tests/test_vcs_mesa_leak.py
(nightly2) doutriaux1@loki:[vcs]:[mesaleak]:[26333]> conda list mesa
# packages in environment at /Users/doutriaux1/anaconda2/envs/nightly2:
#
mesalib                   17.3.0                        0    local

@durack1
Copy link
Member

durack1 commented Dec 16, 2017

@doutriaux1 I'd be happy to provide feedback on this when you think you have a fix in place

@doutriaux1
Copy link
Contributor Author

@durack1 no luck with updating mesalib...

@durack1
Copy link
Member

durack1 commented Dec 18, 2017

@doutriaux1 bummer, what are the next steps to isolate and resolve this issue?

@doutriaux1
Copy link
Contributor Author

@zshaheen did we ever fix this? Is it still an issue?

@doutriaux1
Copy link
Contributor Author

@durack1 merged master in and tryting to see if it still fails. But needed EzTemplate. So retriggered now.

@zshaheen
Copy link

@doutriaux1 To my knowledge, no. The way we run e3sm_diags now is that one Python process creates one plot, it makes the multiprocessing more granular. Inadvertently, now running with a vcs backend doesn't crash. I'm guessing whatever memory issues are no longer present because so little work is done for a given process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants