Skip to content
Open
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
6066f3a
Add files via upload
Mohammed-Ryiad-Eiadeh Apr 21, 2023
e729e18
Update TransferFunction.java
Mohammed-Ryiad-Eiadeh Apr 22, 2023
e33157e
Update CuckooSearchOptimizer.java
Mohammed-Ryiad-Eiadeh Apr 22, 2023
580fb6a
Update Binarizing.java
Mohammed-Ryiad-Eiadeh Apr 22, 2023
02ddebb
Update CuckooSearchOptimizer.java
Mohammed-Ryiad-Eiadeh Apr 23, 2023
1079ee7
Update CuckooSearchOptimizer.java
Mohammed-Ryiad-Eiadeh Apr 24, 2023
1a323c9
Update CuckooSearchOptimizer.java
Mohammed-Ryiad-Eiadeh Apr 25, 2023
8228ffe
Update CuckooSearchOptimizer.java
Mohammed-Ryiad-Eiadeh Apr 27, 2023
4f7a25f
Update TransferFunction.java
Mohammed-Ryiad-Eiadeh Apr 27, 2023
44a33bd
Update Binarizing.java
Mohammed-Ryiad-Eiadeh Apr 27, 2023
5d58e5d
Add files via upload
Mohammed-Ryiad-Eiadeh Apr 28, 2023
39c340b
Update CuckooSearchOptimizer.java
Mohammed-Ryiad-Eiadeh Apr 28, 2023
2f85c7b
Update CuckooSearchOptimizer.java
Mohammed-Ryiad-Eiadeh Apr 29, 2023
6392dd1
Update FitnessFunction.java
Mohammed-Ryiad-Eiadeh Apr 29, 2023
e214c10
Update CuckooSearchOptimizer.java
Mohammed-Ryiad-Eiadeh Apr 30, 2023
f0a1300
Update FitnessFunction.java
Mohammed-Ryiad-Eiadeh Apr 30, 2023
2c9c47d
Update FitnessFunction.java
Mohammed-Ryiad-Eiadeh Apr 30, 2023
40da8a8
Update FitnessFunction.java
Mohammed-Ryiad-Eiadeh May 1, 2023
b381c2a
Update FitnessFunction.java
Mohammed-Ryiad-Eiadeh May 7, 2023
e05f30d
Update FitnessFunction.java
Mohammed-Ryiad-Eiadeh May 13, 2023
5aa563d
Update CuckooSearchOptimizer.java
Mohammed-Ryiad-Eiadeh May 13, 2023
c5d5f62
Add files via upload
Mohammed-Ryiad-Eiadeh May 29, 2023
99f83e6
Delete Classification/FeatureSelection/src/main/java/org/tribuo/class…
Mohammed-Ryiad-Eiadeh May 29, 2023
63e4aa0
Delete Binarizing.java
Mohammed-Ryiad-Eiadeh Jul 10, 2023
8c9d3e2
Update TransferFunction.java
Mohammed-Ryiad-Eiadeh Jul 10, 2023
6d2f940
Delete FitnessFunction.java
Mohammed-Ryiad-Eiadeh Jul 10, 2023
2df7c7e
Update CuckooSearchOptimizer.java
Mohammed-Ryiad-Eiadeh Jul 10, 2023
50be1fb
Update CuckooSearchOptimizer.java
Mohammed-Ryiad-Eiadeh Jul 10, 2023
96ecc6b
Update CuckooSearchOptimizer.java
Mohammed-Ryiad-Eiadeh Jul 10, 2023
21a9c3f
Update pom.xml
Mohammed-Ryiad-Eiadeh Jul 11, 2023
fe512ae
Update CuckooSearchOptimizer.java
Mohammed-Ryiad-Eiadeh Jul 11, 2023
d4a11db
Update CuckooSearchOptimizer.java
Mohammed-Ryiad-Eiadeh Jul 11, 2023
f45bde5
Update CuckooSearchOptimizer.java
Mohammed-Ryiad-Eiadeh Jul 12, 2023
7670a28
Update CuckooSearchOptimizer.java
Mohammed-Ryiad-Eiadeh Jul 16, 2023
0993bc0
Update CuckooSearchOptimizer.java
Mohammed-Ryiad-Eiadeh Jul 23, 2023
9038ef4
Update CuckooSearchOptimizer.java
Mohammed-Ryiad-Eiadeh Aug 8, 2023
76dc715
Update CuckooSearchOptimizer.java
Mohammed-Ryiad-Eiadeh Aug 9, 2023
28509c0
Update CuckooSearchOptimizer.java
Mohammed-Ryiad-Eiadeh Aug 9, 2023
2910d99
Update CuckooSearchOptimizer.java
Mohammed-Ryiad-Eiadeh Aug 10, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
package FS_Wrapper_Approaches.Discreeting;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The base package name needs updating to org.tribuo.classification.fs.wrapper and then the files should be moved into the right directory. At the moment this won't compile because the directory and package names don't line up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also all the files need the copyright and license header.


import static org.apache.commons.math3.special.Erf.erf;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tribuo no longer depends on Apache Commons Math 3, and this import isn't used.


/**
* This interface includes a static method that is utilized to convert continuous value to binary one
*/
public interface Binarizing {
/**
* This method used to convert continuous values to binary ones
* @param TF is the type (id) of the transfer function
* @param Value is the continuous value to be converted
* @return return the converted value based on the selected function
*/
static int discreteValue(TransferFunction TF, double Value) {
return switch (TF) {
case V1 -> Math.abs(erf(Math.sqrt(Math.PI) / 2 * Value)) >= 0.5 ? 1 : 0;
case V2 -> Math.abs(Math.tan(Value)) >= 0.5 ? 1 : 0;
case V3 -> Math.abs(Value / Math.abs(1 + Math.pow(Value, 2))) >= 0.5 ? 1 : 0;
case V4 -> Math.abs(2 / Math.PI * Math.atan(Math.PI / 2 * Value)) >= 0.5 ? 1 : 0;
case S1 -> 1 / (1 + Math.pow(Math.E, - 2 * Value)) >= 0.5 ? 1 : 0;
case S2 -> 1 / (1 + Math.pow(Math.E, - Value)) >= 0.5 ? 1 : 0;
case S3 -> 1 / (1 + Math.pow(Math.E, - Value / 2)) >= 0.5 ? 1 : 0;
case S4 -> 1 / (1 + Math.pow(Math.E, - Value / 3)) >= 0.5 ? 1 : 0;
};
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
package FS_Wrapper_Approaches.Discreeting;

/**
* Enumeration that contains the types of transfer functions in which they are used to define the type of transfer function
*/
public enum TransferFunction {
V1, V2, V3, V4, S1, S2, S3, S4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation of discreteValue could be folded in here, making the enum contain a DoubleUnaryFunction which performs the necessary operation, and then it could have a discretise method which applies the function the output. Then the Binarizing interface can be removed.

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
package FS_Wrapper_Approaches.Evaluation;

import FS_Wrapper_Approaches.Optimizers.CuckooSearchOptimizer;
import com.oracle.labs.mlrg.olcut.util.Pair;
import org.tribuo.Dataset;
import org.tribuo.ImmutableFeatureMap;
import org.tribuo.Model;
import org.tribuo.SelectedFeatureSet;
import org.tribuo.classification.Label;
import org.tribuo.classification.evaluation.LabelEvaluation;
import org.tribuo.classification.evaluation.LabelEvaluator;
import org.tribuo.common.nearest.KNNClassifierOptions;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We try to avoid surfacing the options classes in the library code as they are not under the same compatibility guarantees as the library code. It's probably better to import KNNTrainer directly.

import org.tribuo.dataset.SelectedFeatureDataset;
import org.tribuo.evaluation.CrossValidation;
import org.tribuo.provenance.FeatureSetProvenance;

import java.util.ArrayList;
import java.util.List;


/**
* This interface includes the evaluation function of each solution
*/
public interface FitnessFunction {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This interface only contains static members. For the time being those methods can be moved onto CuckooSearchOptimizer and be private, then if we need to use them from other places they can be easily moved and made more accessible.

/**
* This method is used to compute the fitness score of each solution of the population
* @param optimizer The optimizer that is used for FS
* @param dataset The dataset to use
* @param Fmap The dataset feature map
* @param solution The current subset of features
* @return The fitness score of the given subset
*/
static <T extends FeatureSelector<Label>> double EvaluateSolution(T optimizer, Dataset<Label> dataset, ImmutableFeatureMap Fmap, int[] solution) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tribuo consistently uses camelCase for method names, argument names, field names and variable names.

SelectedFeatureDataset<Label> selectedFeatureDataset = new SelectedFeatureDataset<>(dataset,getSFS(optimizer, dataset, Fmap, solution));
KNNClassifierOptions classifier = new KNNClassifierOptions();
CrossValidation<Label, LabelEvaluation> crossValidation = new CrossValidation<>(classifier.getTrainer(), selectedFeatureDataset, new LabelEvaluator(), 10);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trainer should probably be a parameter for this method. That does raise questions about the provenance & reproducibility, but provided it remains single threaded it should be ok.

double avgAccuracy = 0d;
for (Pair<LabelEvaluation, Model<Label>> ACC : crossValidation.evaluate())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We always use curly braces even for single line for loops and if statements.

avgAccuracy += ACC.getA().accuracy();
avgAccuracy /= crossValidation.getK();

return avgAccuracy + 0.0001 * (1 - ((double) selectedFeatureDataset.getSelectedFeatures().size() / Fmap.size()));
}

/**
* This methid is used to return the selected subset of features
* @param optimizer The optimizer that is used for FS
* @param dataset The dataset to use
* @param Fmap The dataset feature map
* @param solution The current subset of featurs
* @return The selected feature set
*/
static <T extends FeatureSelector<Label>> SelectedFeatureSet getSFS(T optimizer, Dataset<Label> dataset, ImmutableFeatureMap Fmap, int[] solution) {
List<String> names = new ArrayList<>();
List<Double> scores = new ArrayList<>();
for (int i = 0; i < solution.length; i++)
if (solution[i] == 1) {
names.add(Fmap.get(i).getName());
scores.add(1D);
}
FeatureSetProvenance provenance = new FeatureSetProvenance(SelectedFeatureSet.class.getName(), dataset.getProvenance(), optimizer.getProvenance());

return new SelectedFeatureSet(names, scores, optimizer.isOrdered(), provenance);
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
package FS_Wrapper_Approaches.Optimizers;

import FS_Wrapper_Approaches.Discreeting.Binarizing;
import FS_Wrapper_Approaches.Discreeting.TransferFunction;
import com.oracle.labs.mlrg.olcut.util.Pair;
import org.tribuo.*;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All imports should be explicit, no star imports for packages.

import org.tribuo.classification.Label;
import org.tribuo.classification.evaluation.LabelEvaluation;
import org.tribuo.classification.evaluation.LabelEvaluator;
import org.tribuo.common.nearest.KNNClassifierOptions;
import org.tribuo.dataset.SelectedFeatureDataset;
import org.tribuo.evaluation.CrossValidation;
import org.tribuo.provenance.FeatureSelectorProvenance;
import org.tribuo.provenance.FeatureSetProvenance;
import org.tribuo.provenance.impl.FeatureSelectorProvenanceImpl;

import java.util.*;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.stream.IntStream;

/**
* Select features based on Cuckoo Search algorithm with binary transfer functions, KNN classifier and 10-fold cross validation
* <p>
* see:
* <pre>
* Xin-She Yang and Suash Deb.
* "Cuckoo Search via L´evy Flights", 2010.
*
* L. A. M. Pereira et al.
* "A Binary Cuckoo Search and its Application for Feature Selection", 2014.
* </pre>
*/
public class CuckooSearchOptimizer implements FeatureSelector<Label> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class should be final to preserve some flexibility for later.

private final TransferFunction transferFunction;
private final double stepSizeScaling;
private final double lambda;
private final double worstNestProbability;
private final double delta;
private final int populationSize;
private int [][] setOfSolutions;
private final int maxIteration;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the final variables here should not be final and need to be tagged @Config (with appropriate description fields) so they are automatically captured by the provenance system.


/**
* The default constructor for feature selection based on Cuckoo Search Algorithm
*/
public CuckooSearchOptimizer() {
this.transferFunction = TransferFunction.TFunction_V2;
this.populationSize = 50;
this.stepSizeScaling = 2d;
this.lambda = 2d;
this.worstNestProbability = 0.1d;
this.delta = 1.5d;
this.maxIteration = 30;
}

/**
* Constructs the wrapper feature selection based on cuckoo search algorithm
* @param transferFunction The transfer function to convert continuous values to binary ones
* @param populationSize The size of the solution in the initial population
* @param maxIteration The number of times that is used to enhance generation
*/
public CuckooSearchOptimizer(TransferFunction transferFunction, int populationSize, int maxIteration) {
this.transferFunction = transferFunction;
this.populationSize = populationSize;
this.stepSizeScaling = 2d;
this.lambda = 2d;
this.worstNestProbability = 1.5d;
this.delta = 1.5d;
this.maxIteration = maxIteration;
}

/**
* @param transferFunction The transfer function to convert continuous values to binary ones
* @param populationSize The size of the solution in the initial population
* @param stepSizeScaling The cuckoo step size
* @param lambda The lambda of the levy flight function
* @param worstNestProbability The fraction of the nests to be abandoned
* @param delta The delta that is used in the abandon nest function
* @param maxIteration The number of times that is used to enhance generation
*/
public CuckooSearchOptimizer(TransferFunction transferFunction, int populationSize, double stepSizeScaling, double lambda, double worstNestProbability, double delta, int maxIteration) {
this.transferFunction = transferFunction;
this.populationSize = populationSize;
this.stepSizeScaling = stepSizeScaling;
this.lambda = lambda;
this.worstNestProbability = worstNestProbability;
this.delta = delta;
this.maxIteration = maxIteration;
}

/**
* @param totalNumberOfFeatures The number of features in the given dataset
* @return The population of subsets of selected features
*/
private int[][] GeneratePopulation(int totalNumberOfFeatures) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned elsewhere this method should accept a SplittableRandom rather than make a fresh RNG each time.

setOfSolutions = new int[this.populationSize][totalNumberOfFeatures];
for (int[] subSet : setOfSolutions)
System.arraycopy(new Random().ints(totalNumberOfFeatures, 0, 2).toArray(), 0, subSet, 0, setOfSolutions[0].length);
return setOfSolutions;
}

/**
* Does this feature selection algorithm return an ordered feature set?
*
* @return True if the set is ordered.
*/
@Override
public boolean isOrdered() {
return true;
}

/**
* Selects features according to this selection algorithm from the specified dataset.
* @param dataset The dataset to use.
* @return A selected feature set.
*/
@Override
public SelectedFeatureSet select(Dataset<Label> dataset) {
ImmutableFeatureMap FMap = new ImmutableFeatureMap(dataset.getFeatureMap());
setOfSolutions = GeneratePopulation(dataset.getFeatureMap().size());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be an instance variable to preserve multithreading behaviour.

List<FeatureSet_FScore_Container> subSet_fScores = new ArrayList<>();
SelectedFeatureSet selectedFeatureSet = null;
// Update the solution based on the levy flight function
for (int i = 0; i < maxIteration; i++) {
IntStream.range(0, setOfSolutions.length).parallel().forEach(subSet -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use the common ForkJoinPool as Tribuo is a library rather than the application, so this should be executed inside a constructed FJP held as an instance variable. The number of threads for that FJP also needs to be a parameter. You can see how we do it in KMeansTrainer, or KNNModel.

AtomicInteger currentIter = new AtomicInteger(subSet);
int[] evolvedSolution = Arrays.stream(setOfSolutions[subSet]).map(x -> Binarizing.discreteValue(transferFunction, x + stepSizeScaling * Math.pow(currentIter.get() + 1, -lambda))).toArray();
int[] randomCuckoo = setOfSolutions[new Random().nextInt(setOfSolutions.length)];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new Random() is not allowed in Tribuo, it introduces non-determinism and we can't capture accurate provenance information. All the uses of an RNG need to be refactored so there's a single instance of SplittableRandom in the class, which is then split in a synchronized block into a local RNG for the select method, then provenance is captured, the invocation count is incremented (which will need explicit capturing in the provenance too), and then all RNGs are split from the local one in a deterministic way. You can see the right idiom in KMeansTrainer.

if (FitnessFunction.EvaluateSolution(this, dataset, FMap, evolvedSolution) > FitnessFunction.EvaluateSolution(this, dataset, FMap, randomCuckoo))
System.arraycopy(evolvedSolution, 0, setOfSolutions[subSet], 0, evolvedSolution.length);
// Update the solution based on the abandone nest function
if (new Random().nextDouble() < worstNestProbability) {
int r1 = new Random().nextInt(setOfSolutions.length);
int r2 = new Random().nextInt(setOfSolutions.length);
for (var j = 0; j < setOfSolutions[subSet].length; j++)
evolvedSolution[j] = Binarizing.discreteValue(transferFunction, setOfSolutions[subSet][j] + delta * (setOfSolutions[r1][j] - setOfSolutions[r2][j]));
if (FitnessFunction.EvaluateSolution(this, dataset, FMap, evolvedSolution) > FitnessFunction.EvaluateSolution(this, dataset, FMap, setOfSolutions[subSet]))
System.arraycopy(evolvedSolution, 0, setOfSolutions[subSet], 0, evolvedSolution.length);
}
subSet_fScores.add(new FeatureSet_FScore_Container(setOfSolutions[subSet], FitnessFunction.EvaluateSolution(this, dataset, FMap, setOfSolutions[subSet])));
});
subSet_fScores.sort(Comparator.comparing(FeatureSet_FScore_Container::score).reversed());
selectedFeatureSet = FitnessFunction.getSFS(this, dataset, FMap, subSet_fScores.get(0).subSet);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to create the SFS each iteration? It looks like this could be called once after the loop finishes.

}
return selectedFeatureSet;
}

@Override
public FeatureSelectorProvenance getProvenance() {
return new FeatureSelectorProvenanceImpl(this);
}

/**
* This record is used to hold subset of features with its corresponding fitness score
*/
record FeatureSet_FScore_Container(int[] subSet, double score) { }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CuckooSearchFeatureSet?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used this record in order to store the solutions with their corresponding fitness values in order to use this to arrange the solutions based on their fitness scores using Comparator

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry, I was suggesting this record should be renamed CuckooSearchFeatureSet as it's specific to this approach and it's currently got a very general name.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I am sorry, I don't get your note as it should be. You are right I should change the name, and I am currently working on the FJP issue. Thanks Dr. Adam for your suggestions. All of your suggestions are suitable, and I know that I still need more time to learn more and more in programming in Java (my best language)

}