-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simple example of a regression model #33
Comments
Thanks alot for shown interest! I'll try to make an example for the housing data later today or tonight. There are probably a few missing components for regression (most notably a fitness function), but it should not be too much of a hassle to put something together (at least for someone who knows the package). |
Hi, This example is perhaps a bit overworked and could probably have been made simpler. I guess this library is a bit of an overkill for the boston housing dataset and the models returned will most likely be overfitted. module FitBostonHousing
using NaiveGAflux
using MLDatasets: BostonHousing
using Random
using Statistics
function fit(x = BostonHousing.features(), y = BostonHousing.targets(); seed=1, popsize=128, nepochs=200)
Random.seed!(seed)
train,val= iterators(x,y)
nfeatures = size(x,1)
fitnessmetric = mse_vs_size(val)
population = initialpopulation(popsize, nfeatures, fitnessmetric)
evolution = evolution_strategy(popsize)
for epoch in 1:nepochs
@info "Begin epoch $epoch"
for (i, candidate) in enumerate(population)
Flux.train!(candidate, train)
@info "\tFitness candidate $i : $(fitness(candidate))"
end
population = evolve!(evolution, population)
end
return population
end
function initialpopulation(popsize, inputsize, fitnessmetric)
# Simple serach space of 1-4 Dense layers with loguniform sizes
layerspace = VertexSpace(DenseSpace(2 .^ 1:6, [relu, elu, selu]))
initial_hidden = RepeatArchSpace(layerspace, 0:3)
# Output layer has fixed size and is shielded from mutation
outlayer = VertexSpace(Shielded(), DenseSpace(1, identity))
initial_space = ListArchSpace(initial_hidden, outlayer)
return [create_candidate(initial_space, inputsize, fitnessmetric) for _ in 1:popsize]
end
function create_candidate(searchspace, inputsize, fitnessmetric)
input = inputvertex("input", inputsize, FluxDense())
model = CompGraph(input, searchspace(input))
opt = Descent(0.01) # Due to small batch size
return CacheCandidate(CandidateModel(model, opt, Flux.mse, NanGuard(fitnessmetric)))
end
function evolution_strategy(popsize, nelites=2)
elite = EliteSelection(2)
mutate = mutation_strategy()
evolve = SusSelection(popsize - nelites, mutate)
combine = CombinedEvolution(elite, evolve)
return ResetAfterEvolution(combine)
end
function mutation_strategy()
acts = [relu, elu, selu]
# Create a shorthand alias for MutationProbability
mp(p, m) = VertexMutation(MutationProbability(m, p))
# Mutate number of neurons +/- 10% with 10% probability
mut_size = mp(0.1, NeuronSelectMutation(NoutMutation(-0.1, 0.1)))
# Add a layer similar to the ones in the initial search space with 5% probability
add_layer = mp(0.05, AddVertexMutation(VertexSpace(DenseSpace(2 .^ 1:6, acts))))
# Remove a layer with 5% probability
rem_layer = mp(0.05, RemoveVertexMutation())
# Mutate the activation function with 5% probability
mut_actfun = mp(0.05, ActivationFunctionMutation(acts))
return EvolveCandidates(evolvemodel(MutationList(rem_layer, PostMutation(mut_size, NeuronSelect()), mut_actfun, add_layer)))
end
function iterators(x,y, split = [0.9, 0.1], seed = 0)
s1,s2= cumsum(floor.(Int, split ./ sum(split) .* size(x,2)))
xs = standardize(x)
trainitr = iterator(slice((xs,y), 1:s1)...)
valitr = iterator(slice((xs,y), s1:s2)...)
return trainitr,valitr
end
standardize(x) = (x .- mean(x)) ./ std(x)
slice(data::Tuple, inds) = slice.(data, [inds])
slice(data::Vector, inds) = data[inds]
slice(data::Matrix, inds) = data[:, inds]
iterator(x,y) = zip(BatchIterator(x, 8), BatchIterator(y, 8))
struct MseFitness{T} <: AbstractFitness
dataset::T
end
function NaiveGAflux.fitness(s::MseFitness, f)
acc,cnt = 0, 0
for (x,y) in s.dataset
mse = Flux.mse(f(x), y)
acc += sum(mse)
cnt += length(mse)
end
# Small mse => high fitness
return cnt / max(acc, 0.1)
end
# Fitness function. Compare MSE on the data provided by itr for each model.
# If MSE (rounded to three digits) is equal between two candidates, take the one with fewer parameters
function mse_vs_size(itr)
msedigits = 3
msefitness = MseFitness(itr)
truncacc = MapFitness(x -> round(100*x, digits=msedigits), msefitness)
size = SizeFitness()
sizefit = MapFitness(x -> min(10.0^-(msedigits+1), 1 / 10000x), size) # 10000x because models are typically very small
return AggFitness(+, truncacc, sizefit)
end
end To run: julia> include("FitBostonHousing.jl")
julia> models = FitBostonHousing.fit(seed=0); |
Thanks! I'm gonna have to study your code. |
No problem, just ask if something is unclear about the code. I have the intention to wrap some of the more useful compositions in convenience methods once I have a better idea of what they are :) Btw, I have not tested whether it produces useful models or not, just that it runs and that the fitness seems to converge. As for questions:
One motivation I had for making this package was that I kept reading everywhere that the architecture search space was so large it makes no sense to try to search it but I never saw anyone try. It is ofc extremely large if you don't constrain it, but otoh one of the most appealing things about neural networks is that they are not super sensitive to the architecture hyper parameters as long as you are in the right ballpark. I guess one can say I made it to satisfy my curiosity on that subject. Just as a disclaimer: This package does not really allow for a completely unconstrained search space, but I think it allows for much larger degrees of freedom than I have seen in any of the python packages (eg. Auto Keras). It is ofc fully possible to reimplement some paper showing good results with a certain search space using this package as a dependency, but it is not something I'm pursuing right now. As for 2: I don't know much about FluxOptTools but from looking at the readme I get the same impression as you: It is about using a more sophisticated type of optimizer for training the parameters of the model, not for finding the model architecture. |
@DrChainsaw I tried running your |
Yeah, I have made a few breaking changes since then. Here is an updated version which works with version 0.5.1. At the expense of breivty I added a few more operations. I again want to restate for the record that this is most likely not the best way to fit the boston housing data set, it is just a demonstration of a fair portion of the knobs in NaiveGAflux :) Let me know if it happens to work well for you and I can probably add a more generic version to the package itself. module FitBostonHousing
using NaiveGAflux
using MLDatasets: BostonHousing
using Random
using Statistics
function fit(x = BostonHousing.features(), y = BostonHousing.targets(); seed=1, popsize=128, nepochs=200)
Random.seed!(seed)
train,val= iterators(x,y)
nfeatures = size(x,1)
fitnessmetric = mse_vs_size(val)
population = initialpopulation(popsize, nfeatures, fitnessmetric)
evolution = evolution_strategy(popsize)
for epoch in 1:nepochs
@info "Begin epoch $epoch"
for (i, candidate) in enumerate(population)
Flux.train!(candidate, train)
#trainerror = sqrt(1 / fitness(MseFitness(train), NaiveGAflux.graph(candidate)))
# Measure validation error separately as candidate fitness has other components
valerror = sqrt(1 / fitness(MseFitness(val), NaiveGAflux.graph(candidate)))
@info "\tcandidate $i val error: $valerror"
end
if epoch != nepochs
population = evolve!(evolution, population)
end
end
return NaiveGAflux.graph.(population)
end
function initialpopulation(popsize, inputsize, fitnessmetric)
# Simple serach space of 1-4 Dense layers with loguniform sizes
layerspace = VertexSpace(DenseSpace(2 .^ 1:6, [relu, elu, selu]))
initial_hidden = RepeatArchSpace(layerspace, 0:3)
# Output layer has fixed size and is shielded from mutation
outlayer = VertexSpace(Shielded(), DenseSpace(1, identity))
initial_space = ArchSpaceChain(initial_hidden, outlayer)
return [create_candidate(initial_space, inputsize, fitnessmetric) for _ in 1:popsize]
end
function create_candidate(searchspace, inputsize, fitnessmetric)
input = inputvertex("input", inputsize, FluxDense())
model = CompGraph(input, searchspace(input))
opt = ADAM(0.01) # Due to small batch size
return CacheCandidate(CandidateModel(model, opt, Flux.mse, NanGuard(fitnessmetric)))
end
function evolution_strategy(popsize, nelites=2)
# The nelites best candidates will not be changed, just trained some more
elite = EliteSelection(nelites)
# Select popsize-nelites canidates to crossover and mutate using tournament selection with tournament size 2
evolve = TournamentSelection(popsize - nelites, 2, 1.0, crossovermutate())
# This will just combine the elites and the evolved population back into one population with size popsize
combine = CombinedEvolution(elite, evolve)
return ResetAfterEvolution(combine)
end
function crossovermutate(;pcrossover=0.3, pmutate=0.9)
# Crossover of both models and the optimizer used to fit them
cross = candidatecrossover(pcrossover)
crossoverevo = PairCandidates(EvolveCandidates(cross))
# Mutatation of both models and the optimizer used to fit them
mutate = candidatemutation(pmutate)
mutationevo = EvolveCandidates(mutate)
return EvolutionChain(crossoverevo, mutationevo)
end
# Crossover of models (graphs) and optimizers
candidatecrossover(p) = evolvemodel(MutationProbability(graphcrossover(), p), optcrossover())
# Mutation of models (graphs) and optimizers
candidatemutation(p) = evolvemodel(MutationProbability(graphmutation(), p), optmutation())
function optcrossover(poptswap=0.3, plrswap=0.4)
# Swap learning rates between two optimizers with probability plrswap
lrc = MutationProbability(LearningRateCrossover(), plrswap) |> OptimizerCrossover
# Swap two whole optimizers
oc = MutationProbability(OptimizerCrossover(), poptswap) |> OptimizerCrossover
return MutationChain(lrc, oc)
end
function optmutation(p=0.05)
# Mutate learning rate of an optimizer
lrm = LearningRateMutation()
# Change optimizer type and keep learning rate with probability p
om = MutationProbability(OptimizerMutation([Descent, Momentum, Nesterov, ADAM, NADAM, ADAGrad]), p)
return MutationChain(lrm, om)
end
function graphcrossover()
# Swap a sequence of layers from one graph to another
vertexswap = CrossoverSwap(0.05)
# For each layer in a graph swap it with another graph with a 5% probability
return VertexCrossover(MutationProbability(vertexswap, 0.05), 0.05)
end
function graphmutation()
acts = [relu, elu, selu]
# Create a shorthand alias for MutationProbability
mp(p, m) = VertexMutation(MutationProbability(m, p))
# Mutate number of neurons +/- 10% with 10% probability
mut_size = mp(0.1, NeuronSelectMutation(NoutMutation(-0.1, 0.1)))
# Add a layer similar to the ones in the initial search space with 5% probability
add_layer = mp(0.05, AddVertexMutation(VertexSpace(DenseSpace(2 .^ 1:6, acts))))
# Remove a layer with 5% probability
rem_layer = mp(0.05, RemoveVertexMutation())
# Mutate the activation function with 5% probability
mut_actfun = mp(0.05, ActivationFunctionMutation(acts))
return MutationChain(rem_layer, PostMutation(mut_size, neuronselect), mut_actfun, add_layer)
end
# Should ideally be handled by MLDataUtils or similar package
function iterators(x,y, split = [0.8, 0.2], seed = 0)
s1,s2= cumsum(floor.(Int, split ./ sum(split) .* size(x,2)))
xs = standardize(x)
trainitr = iterator(slice((xs,y), 1:s1)...)
valitr = iterator(slice((xs,y), s1:s2)...)
return trainitr,valitr
end
standardize(x) = (x .- mean(x; dims=2)) ./ std(x; dims=2)
slice(data::Tuple, inds) = slice.(data, [inds])
slice(data::Vector, inds) = data[inds]
slice(data::Matrix, inds) = data[:, inds]
iterator(x,y) = zip(BatchIterator(x, 8), BatchIterator(y, 8))
struct MseFitness{T} <: AbstractFitness
dataset::T
end
function NaiveGAflux.fitness(s::MseFitness, f)
acc,cnt = 0, 0
for (x,y) in s.dataset
mse = (f(x) - y) .^ 2
acc += sum(mse)
cnt += length(mse)
end
# Small mse => high fitness
return cnt / max(acc, 0.1)
end
# Fitness function. Compare MSE on the data provided by itr for each model.
# If MSE (rounded to three digits) is equal between two candidates, take the one with fewer parameters
function mse_vs_size(itr)
msedigits = 3
# Training is very noisy. Use EWMA filtering to smooth it out
msefitness = EwmaFitness(0.1, MseFitness(itr))
truncacc = MapFitness(x -> round(100*x, digits=msedigits), msefitness)
size = SizeFitness()
# Size fitness will just return the number of parameters as the fitness value.
# We want a low number of parameters to be indicative of good fitness
sizefit = MapFitness(x -> min(10.0^-(msedigits+1), 1 / 10000x), size)
return AggFitness(+, truncacc, sizefit)
end
end
|
@DrChainsaw models = fit(seed=0,popsize=5, nepochs=2)
ŷ = predict(models[1], X_test)
score = mape(ŷ, y_test) |
Oops, forgot that part :) julia> models = NaiveGAflux.graph.(models);
julia> using MLDatasets: BostonHousing
julia> x = FitBostonHousing.standardize(BostonHousing.features()); #This just happens to be what is done in code above. In a real application you might want to do this differently to avoid train->test leakage.
julia> models[1](x)
1×506 Array{Float32,2}:
27.3557 22.6819 30.6625 27.8783 27.6214 22.448 18.9768 … 18.0694 24.9557 22.2386 29.7275 27.891 22.0147
julia> mapreduce(model -> model(x), +, models[1:10]) ./ 10 # Why not use a few of the models in an ensemble
1×506 Array{Float32,2}:
24.0295 19.8521 26.1882 24.6596 24.2919 20.5495 16.7767 … 15.3688 20.7122 18.6924 23.9985 22.6324 18.4841 I have edited the examble above to return the actual models instead of the population of candidates (where a candidate holds the model as well as the optimizer, loss and fitness function). |
Thanks, it works now. I'm gonna have to tune it before it works well... |
Great! Please file another issue (or add to this one) if you find anything which could be improved. |
Hi and thank you for this package!
I'm having difficulty learning how to apply your package to a regression models w/ a continuous Y (outcome variable).
Is it possible to show a very simple example of how to use your model on the Boston housing data?
For reference I have parsimonious Flux code here.
The text was updated successfully, but these errors were encountered: