-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
timing tests with and without Podman/MPI #204
Comments
A new / different version of So these results that relying exclusively on MPI is actually detrimental (for FastSpecFit at least). I'm going to benchmark a mix of MPI ranks and mp cores on a fixed dataset and see what combination works best. |
Well, the So here are timing tests with the DESI software stack,
|
@jdbuhler @sbailey
Following up on #203, here are some timing tests with
main
, with and without parallelization, and with different underlying software stacks. All of these tests were carried out in a single interactive node,Once a decision is made on which software stack to use, I'll do some more extensive benchmarking to determine the appropriate number of MPI tasks.
Edit: the
desiconda
times (tests 5 and 6) were updated to include thenumba
cache.timing-test1
desihub/fastspecfit:3.1.1
mkl_fft
.time srun --ntasks=1 podman-hpc run --rm --mpi --group-add keep-groups --volume=/dvs_ro/cfs/cdirs:/dvs_ro/cfs/cdirs --volume=/global/cfs/cdirs:/global/cfs/cdirs --volume=$PSCRATCH:/scratch desihub/fastspecfit:3.1.1 mpi-fastspecfit --outdir-data=/scratch/timing-test1 --mp=1 --survey=sv1 --program=bright --healpix=22746 --targetids=39627671176481414 --profile
timing-test2
desihub/fastspecfit:3.1.1b
mkl_fft
; crashes.time srun --ntasks=1 podman-hpc run --rm --mpi --group-add keep-groups --volume=/dvs_ro/cfs/cdirs:/dvs_ro/cfs/cdirs --volume=/global/cfs/cdirs:/global/cfs/cdirs --volume=$PSCRATCH:/scratch desihub/fastspecfit:3.1.1b mpi-fastspecfit --outdir-data=/scratch/timing-test2 --mp=1 --survey=sv1 --program=bright --healpix=22746 --targetids=39627671176481414 --profile
timing-test5
desiconda/main
andfastspecfit/main
mkl_fft
(see this ticket).time srun --ntasks=1 mpi-fastspecfit --outdir-data=$PSCRATCH/timing-test5 --mp=1 --survey=sv1 --program=bright --healpix=22746 --targetids=39627671176481414 --profile
timing-test3
desihub/fastspecfit:3.1.1
mkl_fft
.time srun --ntasks=32 podman-hpc run --rm --mpi --group-add keep-groups --volume=/dvs_ro/cfs/cdirs:/dvs_ro/cfs/cdirs --volume=/global/cfs/cdirs:/global/cfs/cdirs --volume=$PSCRATCH:/scratch desihub/fastspecfit:3.1.1 mpi-fastspecfit --outdir-data=/scratch/timing-test3 --survey=sv1 --program=bright --healpix=22746 --mp=32 --profile
timing-test4
desihub/fastspecfit:3.1.1b
mkl_fft
; crashes.time srun --ntasks=32 podman-hpc run --rm --mpi --group-add keep-groups --volume=/dvs_ro/cfs/cdirs:/dvs_ro/cfs/cdirs --volume=/global/cfs/cdirs:/global/cfs/cdirs --volume=$PSCRATCH:/scratch desihub/fastspecfit:3.1.1b mpi-fastspecfit --outdir-data=/scratch/timing-test4 --survey=sv1 --program=bright --healpix=22746 --mp=32 --profile
timing-test6
desiconda/main
andfastspecfit/main
mkl_fft
(see this ticket).time srun --ntasks=32 mpi-fastspecfit --outdir-data=$PSCRATCH/timing-test6 --mp=32 --survey=sv1 --program=bright --healpix=22746 --profile
Note:
desihub/fastspecfit:3.1.1
anddesihub/fastspecfit:3.1.1b
are nearly identical Podman containers which exclude and include, respectively,mkl_fft
. Currently, containerdesihub/fastspecfit:3.1.1b
crashes due to an obscure error:This issue has been tracked down to
mkl_fft
and is being tracked in NERSC ticketINC0228131
.The text was updated successfully, but these errors were encountered: