You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description
A clear and concise description of what the issue is.
I recently noticed that a script using matplotlib to make plots takes about 20 GB memory on Perlmutter, but takes only 1GB memory on FNAL cluster and 2GB memory on RCC (U.Chicago). I monitored memory with pmap, and also using top. The input table file is 180 MB, and internally several copies are made, so 1GB memory seems right. This excessive memory consumption means that we cannot use this script in slurm (as an afterburner script) without requesting much more memory than actually needed for the primary task. This is also a hint about the potential for other codes consuming much more memory than needed.
Choose all applicable topics by placing an 'X' between the [ ]:
jupyter
jupyter terminal
Perlmutter interactive command line
Batch jobs
GPU nodes
[x ] python
PSCRATCH
Community File System
HPSS (tape)
Data transfer and Globus
New User Account or account access problems
To Reproduce
Steps to reproduce the behavior:
0. source /global/cfs/cdirs/lsst/groups/TD/setup_td.sh'
cd /global/cfs/cdirs/lsst/groups/TD/SN/SNANA/SURVEYS/LSST/USERS/kessler/debug/plot_simlib
./MEMORY_PLOT_TEST.sh
ps -u
pmap
The text was updated successfully, but these errors were encountered:
I traced the memory based on import statements and noticed the following before running any code:
import pandas as pd # <- adds 3 GB memory
from scipy.stats import binned_statistic # <- adds 18 GB memory
Is the same version of scipy being used at FNAL and RCC as is being used at NERSC? This version of td_env at NERSC is on scipy 1.8.1.. that's the CPU environment and I can indeed reproduce that large memory consumption with pmap.
Now I repeated this same experiment but set up the GPU env of td_env which is not tied to any versions of packages inherited from the LSST Sci PIpelines ( source $CFS/lsst/groups/TD/setup_td.sh -g ), and here we're on scipy 1.12.1.. and I see the memory use has decreased significantly when I rerun the same test using the ./MEMORY_PLOT_TEST.sh script. Now I see:
total kB 1265000 667872 662932 605824 1932
so I'm guessing this isn't a NERSC issue, rather it's a problem with using this old scipy.
Description
A clear and concise description of what the issue is.
I recently noticed that a script using matplotlib to make plots takes about 20 GB memory on Perlmutter, but takes only 1GB memory on FNAL cluster and 2GB memory on RCC (U.Chicago). I monitored memory with pmap, and also using top. The input table file is 180 MB, and internally several copies are made, so 1GB memory seems right. This excessive memory consumption means that we cannot use this script in slurm (as an afterburner script) without requesting much more memory than actually needed for the primary task. This is also a hint about the potential for other codes consuming much more memory than needed.
Choose all applicable topics by placing an 'X' between the [ ]:
To Reproduce
Steps to reproduce the behavior:
0. source /global/cfs/cdirs/lsst/groups/TD/setup_td.sh'
The text was updated successfully, but these errors were encountered: