Skip to content

pso_gen with batch evaluator (thread_bfe) not invoking batch_fitness() in custom UDP #598

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
dsriaditya999 opened this issue Feb 26, 2025 · 5 comments

Comments

@dsriaditya999
Copy link

I'm trying to use the generational PSO algorithm (pso_gen) with a batch fitness evaluator (using pagmo::thread_bfe) to enable multithreaded evaluation of my user‑defined problem. My final goal is to run pso_gen in batch mode so that, when I have (for example) 30 particles, they are evaluated in parallel via the problem's batch_fitness() method.

I created a minimal dummy problem that implements both the single-evaluation fitness() and the batch_fitness() methods. However, during evolution, I only see output from fitness() being called (i.e. one-by-one evaluation) and no indication that batch_fitness() is ever invoked.

Below is my dummy example code and Makefile. Could someone please help me understand what I might be missing so that batch_fitness() is correctly used?

dummy_batch.cpp

#include <pagmo/problem.hpp>
#include <pagmo/population.hpp>
#include <pagmo/algorithm.hpp>
#include <pagmo/algorithms/pso_gen.hpp>
#include <pagmo/bfe.hpp>
#include <pagmo/batch_evaluators/thread_bfe.hpp>
#include <iostream>
#include <vector>
#ifdef _OPENMP
    #include <omp.h>
#endif

using vector_double = std::vector<double>;

// Dummy problem: f(x) = sum(x^2)
struct dummy_problem {
    // Single evaluation: compute sum of squares.
    vector_double fitness(const vector_double &x) const {
        std::cout << "[fitness] Single evaluation called." << std::endl;
        double sum = 0.0;
        for (double xi : x)
            sum += xi * xi;
        return {sum};
    }

    // Batch fitness evaluation: expects dvs to be a concatenation of decision vectors.
    vector_double batch_fitness(const vector_double &dvs) const {
        std::cout << "[batch_fitness] Batch evaluation called. dvs.size() = " << dvs.size() << std::endl;
        size_t dim = get_nx();
        size_t n_ind = dvs.size() / dim;
        vector_double results(n_ind);

        // Parallel loop using OpenMP.
        #pragma omp parallel for
        for (size_t i = 0; i < n_ind; ++i) {
            // Extract i-th decision vector from dvs.
            auto start = dvs.begin() + i * dim;
            auto end = start + dim;
            vector_double x(start, end);
            std::cout << "  [batch_fitness] Evaluating individual " << i << std::endl;
            double sum = 0.0;
            for (double xi : x) {
                sum += xi * xi;
            }
            results[i] = sum;
        }
        return results;
    }

    // Number of decision variables per individual.
    vector_double::size_type get_nx() const { return 3; }

    // Define the bounds for each decision variable.
    std::pair<vector_double, vector_double> get_bounds() const {
        vector_double lb(get_nx(), -5.0);
        vector_double ub(get_nx(), 5.0);
        return {lb, ub};
    }
};

int main() {
    // Create the problem instance.
    pagmo::problem prob{dummy_problem()};

    // Create a population of 10 individuals.
    pagmo::population pop(prob, 10);

    // Instantiate pso_gen (generational PSO supports batch evaluation).
    pagmo::pso_gen pso_algo;
    pso_algo.set_verbosity(1u);

    // Set the batch fitness evaluator to a thread-based evaluator.
    pso_algo.set_bfe(pagmo::bfe{ pagmo::thread_bfe{} });

    // Wrap pso_algo in a generic algorithm.
    pagmo::algorithm algo{ pso_algo };

    // Evolve the population for 5 generations.
    for (int gen = 0; gen < 5; ++gen) {
        std::cout << "\n=== Generation " << gen + 1 << " ===" << std::endl;
        pop = algo.evolve(pop);
    }

    // Print the champion solution.
    auto champ = pop.champion_x();
    auto champ_f = pop.champion_f();
    std::cout << "\n=== Optimization Complete ===" << std::endl;
    std::cout << "Champion solution: ";
    for (auto xi : champ)
        std::cout << xi << " ";
    std::cout << "\nChampion fitness: " << champ_f[0] << std::endl;

    return 0;
}

Makefile

# Compiler definitions
CXX      = g++
CXXFLAGS = -std=c++17 -O2 -fopenmp -I/usr/local/include/pagmo2 -I/usr/local/include
LDFLAGS  = -fopenmp -lpthread -L/usr/local/lib -Wl,-R/usr/local/lib
LDLIB    = -lpagmo -lboost_serialization

# Source file and target executable
SRC = dummy_batch.cpp
TARGET = bin/dummy_batch

all: $(TARGET)

$(TARGET): $(SRC)
	@mkdir -p bin
	$(CXX) $(CXXFLAGS) $(SRC) -o $(TARGET) $(LDFLAGS) $(LDLIB)

clean:
	rm -rf bin

.PHONY: all clean

Issue Summary:
Despite setting up a dummy problem with both fitness() and batch_fitness(), my generational PSO algorithm (pso_gen) never seems to call batch_fitness(). Instead, I only see output from fitness() being printed for each evaluation. I compile with OpenMP enabled and use set_bfe(pagmo::bfe{ pagmo::thread_bfe{} }).

My final goal is to use pso_gen with batch evaluators and multithreading in my real problem. Can someone please help clarify what might be wrong or missing in my UDP such that batch evaluation is not used? Any insight or suggestions would be appreciated!

@bluescarni
Copy link
Member

Hi @dsriaditya999 !

The issue is that by calling pso_algo.set_bfe(pagmo::bfe{ pagmo::thread_bfe{} }); you are specifically telling pagmo that it should use its own generic thread-based BFE implementation, rather than the batch_fitness() method implemented in your problem. If you change the code to just:

pso_algo.set_bfe(pagmo::bfe{});

You will see that your batch_fitness() method is being invoked.

As an alternative, you could also just remove the batch_fitness() method altogether and let the generic thread-based pagmo BFE do the parallelisation for you. If you run the code enough times, you will see that in some runs the screen outputs of the single evaluation will overlap, e.g., like this:

[fitness] Single evaluation called.[fitness] Single evaluation called.

which will confirm that the single evaluations are indeed running in parallel.

@dsriaditya999
Copy link
Author

Hi @bluescarni , Thanks a lot for your reply. I tried incorporating your suggestion and it did work in the dummy toy example, that I shared. But when I try to apply the same changes to my actual problem, batch processing still does not happen. I have tried using my implementation and using pso_algo.set_bfe(pagmo::bfe{});. I have also tried using the inbuilt thread bfe. Both of them don't work and only single thread processing happens.

Any suggestions on how to tackle this problem? Does the fitness function have to satisfy any constraints in order to enable parallel processing?

@bluescarni
Copy link
Member

Hi @dsriaditya999 , could you explain in a bit more detail what you mean when you say that only single thread processing happens? I.e., how have you been checking that multithreading is active?

@dsriaditya999
Copy link
Author

I have checked by printing some indicative text in the custom batch processor and also by using top, I have checked CPU utilization. In case of dummy problem, CPU utilization is around 3000% (I have a 32 Core CPU). But in case of my actual problem it is just 100%.

@bluescarni
Copy link
Member

Is there any chance your objective function is holding some lock that force serialised fitness evaluations? Do you see any multithreaded evaluation if you replace your current objective function with some other code but leave the rest of the setup unchanged?

It would be helpful to see the code :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants