run_sbc (and run_tarp) run time #1329

humnaawan · 2024-12-11T14:46:45Z

Hello, is there any documentation on how to effectively use num_workers and use_batched_sampling? I am running into very long run times with run_sbc and I am not sure whats going wrong. Here's how I'm calling the function:

    ranks, dap_samples = run_sbc(thetas=thetas, xs=xs,
                                            posterior=posterior,
                                            num_posterior_samples=nsamples,
                                            show_progress_bar=True,
                                            num_workers=ncpus
                                            )

I have 1000 simulations and I set nsamples to be 1000. When I toggle the above between use_batched_sampling=False and use_batched_sampling=True (default) in the function call, the former at least gives me a progress update although it still doesn't finish.

Looking through the code, I think the bottleneck might be max_sampling_batch_size which is set to 10,000? The parameter is not exposed though (at least when you build a posterior via inference.build_posterior). I did set simulation_batch_size in simulate_for_sbi (to be int(nsims/ncpus)) but I dont think that gets communicated to the DirectPosterior object.

I run into the same issue with run_tarp which doesnt have the use_batched_sampling exposed (although #1321 should enable that once its merged).

I use cpus-per-task=35 in my sbatch script and confirm that 35 cpus are indeed available. The run_sbc call seems to be stuck at 1/1000 even after 5hours when using the default option for use_batched_sampling, and barely passes 100/1000 after 12hrs (even though the time estimates on the progress bar estimate otherwise) when I set use_batched_sampling=False.

I'd really appreciate some help. I am starting to unpack run_sbc since I can't think of anything else but thought I'd inquire here in case I'm missing something. My understanding is that my call never makes it past get_posterior_samples_on_batch (which calls posterior.sample_batched).

Thank you!

The text was updated successfully, but these errors were encountered:

janfb · 2024-12-11T14:58:58Z

Hi @humnaawan

thanks for reporting this! Some context that might help already:

batched sampling is quite fast for "direct" posteriors like NPE because it's just a forward pass in the flow and it can just pass the entire batch of xs.
for MCMCPosteriors is in principle much slower because we have to run MCMC for each x in xs separately. For the slice_np_vectorize MCMC method we implemented batched sampling, but it's still slower because it has to run MCMC and evaluate the flow for each element in the chain.
when you use batched sampling, it's one single call to be sample method with a big batch of xs, so we cannot parallelize it. Thus, num_workers only has an effect if batch_sampling=False.

To summarize:

with a NPE-based posterior it should be quite fast and if not, something if off
with a MCMC-based posterior, it will be slow anyway, but using slice_np_vectorized and batched_sampling(default) is probably the fastest way. Alternatively, you could try using batched_sampling=False and use many workers.

Note that num_posterior_samples is just the number of posterior samples used during sbc / tarp and it's different from the num_sbc_samples, which is the number of xs and thetas. num_sbc_samples should be on the order of hundreds to give reasonable results, e.g., 100-500 and it's the main bottleneck.

Does this help?

humnaawan · 2024-12-11T15:46:38Z

hi @janfb, thanks so much! I am indeed working with a direct posterior (sorry for not including the detail in my first post) via:

        # create inference object
        inference = NPE(prior=prior)
        # generate simulations
        theta, x = simulate_for_sbi(simulator=simulator,
                                    proposal=prior, num_simulations=nsims,
                                    seed=seed,
                                    show_progress_bar=True,
                                    num_workers=ncpus
                                    )
        # pass sims to inference object
        inference = inference.append_simulations(theta=theta, x=x)
        # now train
        density_estimator = inference.train()
        # build posterior
        posterior = inference.build_posterior(density_estimator=density_estimator)

thank you for reminding me about num_sbc_samples vs num_posterior_samples. im currently setting both to 1000, so my xs, thetas are shaped as (torch.Size([1000, 500]), torch.Size([1000, 2])) (since my simulation is for a 500-point spectrum and I am attempting to constrain two parameters).

It is helpful to know that num_workers only plays a role without batch sampling. I'm not sure why either option is not working with my call to run_sbc though.

janfb · 2024-12-11T15:53:55Z

I see, thanks for the details. So you effectively evaluation the underlying density estimator with a batch-size of 1000 thetas (posterior samples) and 1000 xs (sbc samples), which could be the bottleneck here.
Are you using some kind of embedding network to process the 500-D input dimension?

What I would try:

use fewer num_sbc_samples
set batched_sampling=False and pass num_workers=30 or so.

humnaawan · 2024-12-11T19:31:07Z

Thank you! I am currently not using an embedding network as I wanted to see how things work out of the box; it's certainly on my todo list. I will try your two suggestions and hopefully the function call finishes.

I appreciate your quick feedback!

janfb · 2024-12-13T08:28:17Z

I would recommend using at least a small embedding net, e.g., the standard MLP we have implemented here:
https://github.com/sbi-dev/sbi/blob/main/sbi/neural_nets/embedding_nets/fully_connected.py

and explained here:
https://github.com/sbi-dev/sbi/blob/main/tutorials/04_embedding_networks.ipynb

Otherwise it could be challenging for the flow-based density estimator to cope with the 500-D conditioning dimension.

I am moving this issue to discussions and close it here. Feel free to give updates of your case there.

humnaawan added the question Further information is requested label Dec 11, 2024

sbi-dev locked and limited conversation to collaborators Dec 13, 2024

janfb converted this issue into discussion #1332 Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

run_sbc (and run_tarp) run time #1329

run_sbc (and run_tarp) run time #1329

humnaawan commented Dec 11, 2024

janfb commented Dec 11, 2024

humnaawan commented Dec 11, 2024

janfb commented Dec 11, 2024

humnaawan commented Dec 11, 2024

janfb commented Dec 13, 2024

This issue was moved to a discussion.

This issue was moved to a discussion.

run_sbc (and run_tarp) run time #1329

run_sbc (and run_tarp) run time #1329

Comments

humnaawan commented Dec 11, 2024

janfb commented Dec 11, 2024

humnaawan commented Dec 11, 2024

janfb commented Dec 11, 2024

humnaawan commented Dec 11, 2024

janfb commented Dec 13, 2024

This issue was moved to a discussion.