Skip to content

A framework written in Scala to support the development of distributed population-based metaheuristics and their application to the global optimization of large-scale problems in Spark clusters.

License

Notifications You must be signed in to change notification settings

UDC-GAC/spark-eclib

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

spark-eclib

License Scala DOI

spark-eclib is a framework written in Scala to support the development of distributed population-based metaheuristics and their application to the global optimization of large-scale problems in Spark clusters.

The spark-eclib framework is being developed by the Computer Architecture Group (GAC) at Universidade da Coruña (UDC) in collaboration with the Computational Biology Lab (formerly (Bio)Process Engineering Group) at Misión Biológica de Galicia (MBG-CSIC). Both groups maintain a long-term research collaboration on the field of global optimization of large-scale problems from Computational Biology using distributed frameworks on computational clusters. After having implemented and evaluated different ad-hoc implementations of distributed population-based metaheuristics on frameworks like Hadoop or Spark (e.g. SiPDE, eSS), the development of spark-eclib was started with the main objective of avoiding to reinvent the wheel every time a new metaheuristic is implemented and to improve the automation and reproducibility of the optimization experiments.

The framework provides a reduced set of abstractions to represent the general structure of population-based metaheuristics as templates from which different variants of algorithms can be instantiated by the implementation of strategies. Strategies can be reused between metaheuristics, thus enforcing code reusability. To validate the approach, a template for Particle Swarm Optimization (PSO) was implemented applying the general abstractions provided by the framework. The template supports the instantiation of different variants of the PSO algorithm, a long list of configurable topologies, and several execution models (i.e. sequential, master-worker and island-based).

This repository contains a snapshot of the state of the source code of the framework as described in 10.1016/j.swevo.2024.101483

Citation

Please, if you use spark-eclib, cite our work using the following reference:

Xoán C. Pardo, Patricia González, Julio R. Banga, Ramón Doallo. Population based metaheuristics in Spark: towards a general framework using PSO as a case study. Swarm and Evolutionary Computation, 85 (2024), article 101483, 10.1016/j.swevo.2024.101483

Usage

To build the project use the Maven command:

mvn clean package

The resulting fat jar eclib-0.0.1-test-jar-with-dependencies.jar will be placed in the target folder of the project.

The simplest command to submit to a Spark cluster a job using spark-eclib would be:

spark-submit --class gal.udc.gac.eclib.EclibTest \
  --master <master-url> \
  eclib-0.0.1-test-jar-with-dependencies.jar \
  <configuration_file>

Examples

Refer to the README.md file in the testbed\kubernetes and testbed\cluster directories.

Branches

Besides master, there are branches with different combinations of the optimizations implemented to reduce the number of Spark jobs per iteration. These branches were used to profile the performance of the parallel implementations with and without optimizations.

License

This code is open source software licensed under the GPLv3 License.

Contact

Xoán C. Pardo <[email protected]>
Computer Architecture Group / CITIC
Universidade da Coruña (UDC)
Spain

ORCID ResearchID Scopus

About

A framework written in Scala to support the development of distributed population-based metaheuristics and their application to the global optimization of large-scale problems in Spark clusters.

Resources

License

Stars

Watchers

Forks

Packages

No packages published