Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High RSS memory increase for Full (Fast) Simulation in EL8 compared to SLC7 #42929

Open
nhoerman opened this issue Oct 2, 2023 · 9 comments
Open

Comments

@nhoerman
Copy link

nhoerman commented Oct 2, 2023

Testing Full Simulation SIM step (e.g.: MinBias, 500 events, 1 thread) on EL8 platform and SLC7 there seems to be a significant increase of the RSS memory consumption on EL8.

Comparing cmsRun to cmsRunGlibC and cmsRunTC:
*) same RSS memory increase using cmsRunTC
*) less RSS memory consumption using cmsRunGlibC

Used servers: olsky-05 (CS8 with singularity cmssw-el8) and olsky-06 (SLC7)
FullSim_cmsRuns.pdf

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 2, 2023

A new Issue was created by @nhoerman .

@antoniovilela, @Dr15Jones, @sextonkennedy, @makortel, @smuzaffar, @rappoccio can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@makortel
Copy link
Contributor

makortel commented Oct 2, 2023

Assign core, simulation

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 2, 2023

New categories assigned: core,simulation

@Dr15Jones,@civanch,@makortel,@mdhildreth,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks

@makortel
Copy link
Contributor

makortel commented Oct 2, 2023

Could you provide a recipe to reproduce the setup (e.g. cmsDriver command(s))?

I tested 10 events with step 1 of workflow 12434.0 (2023 TTbar) on cmsdev32 (slc7) directly and through cmssw-el8 container, and didn't see any significant difference (peak RSS in both cases being 1035 MB as reported by SimpleMemoryCheck service.

@nhoerman
Copy link
Author

nhoerman commented Oct 4, 2023

I use TimeMemoryInfo.py (GEN-SIM step):
cmsDriver.py MinBias_14TeV_pythia8_TuneCP5_cfi --conditions auto:phase1_2022_realistic -n 10 --nThreads 1 --era Run3 --eventcontent FEVTDEBUG --relval 10000,100 -s GEN,SIM --customise=Validation/Performance/TimeMemoryInfo.py --pileup=NoPileUp --datatier GEN-SIM --beamspot Run3RoundOptics25ns13TeVLowSigmaZ --geometry DB:Extended --dirout=./ --mc
olsky-05/06:
SLC7: Peak rss size 1004.9 Mbytes
EL8: Peak rss size 1285.03 Mbytes

@makortel
Copy link
Contributor

makortel commented Oct 4, 2023

Thanks. I ran couple of tests on lxplus

  • slc7 node, natively: Peak rss size 1049.21 Mbytes
  • slc7 node, through cmssw-el8 container: Peak rss size 1049.13 Mbytes
  • el8 node, natively: Peak rss size 1239.08 Mbytes
  • el8 node, through cmssw-cc7 container: Peak rss size 1261.25 Mbytes

So on a quick look the behavior seems to be related to the actual OS version of the node rather than of our Apptainer container.

@makortel
Copy link
Contributor

makortel commented Oct 4, 2023

I ran MALLOC_CONF=stats_print:true cmsRun <config> on both slc7 and el8 nodes. I didn't really learn much from the printout (except there are some differences), but in case anyone else would be able to understand them better, I'm attaching them here.
slc7_jemalloc_stats.txt
el8_jemalloc_stats.txt

@makortel
Copy link
Contributor

makortel commented Oct 9, 2023

For future reference, the CMSSW version used in the PDF attached in the description and in my tests was 13_3_0_pre3.

@makortel
Copy link
Contributor

I did some more testing on random lxplus nodes on the peak RSS

container cc7 node el8 node el9 node
cmssw-cc7 1048.26 MB 987.289 MB 1278.88 MB
cmssw-el8 976.453 MB 990.133 MB 1287.62 MB
cmsse-el9 1052.91 MB 998.445 MB 1290.95 MB

Interestingly, and contrary to my previous test #42929 (comment), the EL8-node RSS was now compatible with CC7-node RSS (the node was coincidentally the same as in my previous test).

Perhaps there is a random element in play? (similar to effects discussed in #42387)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants