Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NERSC] python script takes extreme amount of memory #114

Open
10 tasks
RickKessler opened this issue Sep 5, 2024 · 2 comments
Open
10 tasks

[NERSC] python script takes extreme amount of memory #114

RickKessler opened this issue Sep 5, 2024 · 2 comments
Assignees
Labels
nersc Questions pertaining to NERSC

Comments

@RickKessler
Copy link

Description
A clear and concise description of what the issue is.

I recently noticed that a script using matplotlib to make plots takes about 20 GB memory on Perlmutter, but takes only 1GB memory on FNAL cluster and 2GB memory on RCC (U.Chicago). I monitored memory with pmap, and also using top. The input table file is 180 MB, and internally several copies are made, so 1GB memory seems right. This excessive memory consumption means that we cannot use this script in slurm (as an afterburner script) without requesting much more memory than actually needed for the primary task. This is also a hint about the potential for other codes consuming much more memory than needed.

Choose all applicable topics by placing an 'X' between the [ ]:

  • jupyter
  • jupyter terminal
  • Perlmutter interactive command line
  • Batch jobs
  • GPU nodes
  • [x ] python
  • PSCRATCH
  • Community File System
  • HPSS (tape)
  • Data transfer and Globus
  • New User Account or account access problems

To Reproduce
Steps to reproduce the behavior:
0. source /global/cfs/cdirs/lsst/groups/TD/setup_td.sh'

  1. cd /global/cfs/cdirs/lsst/groups/TD/SN/SNANA/SURVEYS/LSST/USERS/kessler/debug/plot_simlib
  2. ./MEMORY_PLOT_TEST.sh
  3. ps -u
  4. pmap
@RickKessler RickKessler added the nersc Questions pertaining to NERSC label Sep 5, 2024
@RickKessler
Copy link
Author

I traced the memory based on import statements and noticed the following before running any code:
import pandas as pd # <- adds 3 GB memory
from scipy.stats import binned_statistic # <- adds 18 GB memory

@heather999
Copy link
Collaborator

Is the same version of scipy being used at FNAL and RCC as is being used at NERSC? This version of td_env at NERSC is on scipy 1.8.1.. that's the CPU environment and I can indeed reproduce that large memory consumption with pmap.
Now I repeated this same experiment but set up the GPU env of td_env which is not tied to any versions of packages inherited from the LSST Sci PIpelines ( source $CFS/lsst/groups/TD/setup_td.sh -g ), and here we're on scipy 1.12.1.. and I see the memory use has decreased significantly when I rerun the same test using the ./MEMORY_PLOT_TEST.sh script. Now I see:
total kB 1265000 667872 662932 605824 1932

so I'm guessing this isn't a NERSC issue, rather it's a problem with using this old scipy.

I owe TD a new td_env.. I will get that out asap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
nersc Questions pertaining to NERSC
Projects
None yet
Development

No branches or pull requests

2 participants