-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance disparity in expression operations #85
Comments
The 500 000 case finished in 4 seconds for GenericAffExpr and crashed my julia session for the cases with parameters. |
I think your results are getting mixed with when the GC runs. using JuMP
using ParameterJuMP
using Random
function model_with_params(m)
model = Model()
@variable(model, x[1:m])
@variable(model, y[1:m])
@variable(model, p[1:m] == 1.0, ParameterJuMP.Param())
@expression(model, expr[i=1:m], x[i] + y[i] + p[i] + rand())
col = rand(m)
@time @constraint(model, sum(col[i] * expr[i] for i in 1:m) == 0.0)
return
end
function model_without_params(m)
model = Model()
@variable(model, x[1:m])
@variable(model, y[1:m])
@variable(model, p[1:m] == 1.0)
@expression(model, expr[i=1:m], x[i] + y[i] + p[i] + rand())
col = rand(m)
@time @constraint(model, sum(col[i] * expr[i] for i in 1:m) == 0.0)
return
end
model_with_params(10)
model_without_params(10)
for i in [100, 1000, 10_000, 50_000, 100_000, 500_000]
println(i)
GC.gc()
model_without_params(i)
GC.gc()
model_with_params(i)
end Yields
So there is a ~2x penalty to using parameters. Which makes sense: every expression has two dictionaries. |
Thanks, that makes sense. |
@jd-lara can you try the following instead? Replace @constraint(m, sum(col[i]*expression_container[i, 1] for i in 1:size_1) == 0.0) with ex = @expression(m, sum(col[i]*expression_container[i, 1] for i in 1:size_1))
c = ScalarConstraint(ex, MOI.EqualTo(0.0))
add_constraint(m, c, "") |
@odow I have updated the test as follows: using JuMP
using ParameterJuMP
using Random
function _make_constraint(model::JuMP.Model, col::Vector{Float64}, expression_container::JuMP.Containers.DenseAxisArray{ParameterJuMP.ParametrizedGenericAffExpr{Float64, JuMP.VariableRef}}, size_1::Int)
ex = @expression(model, sum(col[i]*expression_container[i, 1] for i in 1:size_1))
c = ScalarConstraint(ex, MOI.EqualTo(0.0))
add_constraint(model, c, "")
end
function _make_constraint(model::JuMP.Model, col::Vector{Float64}, expression_container::JuMP.Containers.DenseAxisArray{JuMP.GenericAffExpr{Float64, JuMP.VariableRef}}, size_1::Int)
ex = @expression(model, sum(col[i]*expression_container[i, 1] for i in 1:size_1))
c = ScalarConstraint(ex, MOI.EqualTo(0.0))
add_constraint(model, c, "")
end
function model_with_params(size_1, size_2)
m = Model()
ParameterJuMP.enable_parameters(m)
set_names_1 = [randstring() for _ in 1:size_1]
set_names_2 = [randstring() for _ in 1:size_1]
time = 1:size_2
z = @variable(m, [v in set_names_1, t in time])
x = @variable(m, [v in set_names_2, t in time])
param_array = Matrix(undef, size_1, size_2)
for i in 1:size_1, t in 1:size_2
param_array[i, t] = JuMP.@variable(m, variable_type = ParameterJuMP.Param())
ParameterJuMP.set_value(param_array[i, t], rand())
end
expression_container = JuMP.Containers.DenseAxisArray{ParameterJuMP.ParametrizedGenericAffExpr{Float64, JuMP.VariableRef}}(undef, 1:size_1, 1:size_2)
for i in 1:size_1, t in 1:size_2
expression_container[i, t] = z[set_names_1[i], t] + 5.0*x[set_names_2[i], t] + param_array[i, t] + rand()
end
_, build_time, build_bytes_alloc, build_sec_in_gc =
@timed _make_constraint(m, rand(size_1,), expression_container, size_1)
return build_time, build_bytes_alloc, build_sec_in_gc
end
function model_without_params(size_1, size_2)
m2 = Model()
set_names_1 = [randstring() for _ in 1:size_1]
set_names_2 = [randstring() for _ in 1:size_1]
time = 1:size_2
z = @variable(m2, [v in set_names_1, t in time])
x = @variable(m2, [v in set_names_2, t in time])
expression_container = JuMP.Containers.DenseAxisArray{JuMP.GenericAffExpr{Float64, JuMP.VariableRef}}(undef, 1:size_1, 1:size_2)
for i in 1:size_1, t in 1:size_2
expression_container[i, t] = z[set_names_1[i], t] + 5.0*x[set_names_2[i], t] + rand()
end
_, build_time, build_bytes_alloc, build_sec_in_gc =
@timed _make_constraint(m2, rand(size_1,), expression_container, size_1)
return build_time, build_bytes_alloc, build_sec_in_gc
end
function model_with_mixed_params(size_1, size_2)
m = Model()
ParameterJuMP.enable_parameters(m)
set_names_1 = [randstring() for _ in 1:size_1]
set_names_2 = [randstring() for _ in 1:size_1]
time = 1:size_2
z = @variable(m, [v in set_names_1, t in time])
x = @variable(m, [v in set_names_2, t in time])
param_array = Matrix(undef, size_1, size_2)
for i in 1:size_1, t in 1:size_2
param_array[i, t] = JuMP.@variable(m, variable_type = ParameterJuMP.Param())
ParameterJuMP.set_value(param_array[i, t], rand())
end
expression_container = JuMP.Containers.DenseAxisArray{ParameterJuMP.ParametrizedGenericAffExpr{Float64, JuMP.VariableRef}}(undef, 1:size_1, 1:size_2)
param_ixs = shuffle(1:size_1)[1:Int(size_1/2)]
for i in 1:size_1, t in 1:size_2
if i in param_ixs
expression_container[i, t] = z[set_names_1[i], t] + 5.0*x[set_names_2[i], t] + param_array[i, t] + rand()
else
expression_container[i, t] = z[set_names_1[i], t] + 5.0*x[set_names_2[i], t] + rand()
end
end
_, build_time, build_bytes_alloc, build_sec_in_gc =
@timed _make_constraint(m, rand(size_1,), expression_container, size_1)
return build_time, build_bytes_alloc, build_sec_in_gc
end
# FIrst compile run
model_with_params(10, 48)
model_without_params(10, 48)
model_with_mixed_params(10, 48)
results = Dict()
results_params = Dict()
results_mixed = Dict()
for i in [100, 1000, 5000, 10_000, 50_000, 100_000]
println(i)
@show results[i] = model_without_params(i, 48)
GC.gc()
@show results_params[i] = model_with_params(i, 48)
GC.gc()
@show results_mixed = model_with_mixed_params(i, 48)
end Looks like up to 10000 expressions, the penalty is about 2x as expected and over 50000 the penalty is 4x or more 100
results[i] = model_without_params(i, 48) = (9.7006e-5, 30384, 0.0)
results_params[i] = model_with_params(i, 48) = (0.000171875, 76912, 0.0)
results_mixed = model_with_mixed_params(i, 48) = (0.000131348, 68880, 0.0)
1000
results[i] = model_without_params(i, 48) = (0.000831869, 205200, 0.0)
results_params[i] = model_with_params(i, 48) = (0.001245529, 569072, 0.0)
results_mixed = model_with_mixed_params(i, 48) = (0.001065928, 470624, 0.0)
5000
results[i] = model_without_params(i, 48) = (0.003808976, 1281296, 0.0)
results_params[i] = model_with_params(i, 48) = (0.008973684, 3581152, 0.0)
results_mixed = model_with_mixed_params(i, 48) = (0.005882702, 2980096, 0.0)
10000
results[i] = model_without_params(i, 48) = (0.008080068, 2557840, 0.0)
results_params[i] = model_with_params(i, 48) = (0.01753953, 7151472, 0.0)
results_mixed = model_with_mixed_params(i, 48) = (0.01423789, 6212016, 0.0)
50000
results[i] = model_without_params(i, 48) = (0.050672775, 10944336, 0.0)
results_params[i] = model_with_params(i, 48) = (0.228111582, 31853632, 0.116043022)
results_mixed = model_with_mixed_params(i, 48) = (0.085427278, 25748320, 0.0)
100000
results[i] = model_without_params(i, 48) = (0.117079187, 19783200, 0.0)
results_params[i] = model_with_params(i, 48) = (0.428364249, 55302560, 0.13931007)
results_mixed = model_with_mixed_params(i, 48) = (0.668327524, 49384144, 0.377572102) |
That's all just GC differences though. If you subtract the GC time, you get 2.5x again: julia> (0.428364249 - 0.13931007) / 0.117079187
2.4688775725782923
julia> (0.228111582 - 0.116043022) / 0.050672775
2.211612843385822
julia> (0.668327524 - 0.377572102) / 0.117079187
2.4834082764855543 |
So the question is how to reduce the GC times. |
It'd be good to confirm why the 2x exists first. Is it creating the extra dictionary? Extra key lookups? Resizing the two dictionaries? One option that might be faster is to refactor everything into a single dictionary with That might also help GC by creating less dictionaries. |
I dont understand why having two dicts would be much worse than having a single larger one. |
I have been doing |
Have you tried: https://github.com/timholy/ProfileView.jl ? |
Okay. I have a MWE julia> using JuMP, ParameterJuMP
julia> function foo(M, N; param)
model = Model()
@variable(model, x[1:N])
if param
@variable(model, p[1:N], ParameterJuMP.Param())
else
@variable(model, p[1:N])
end
@expression(model, y[i=1:N], x[i] + p[i])
for j in 1:M
@constraint(model, sum(j * i * y[i] for i in 1:N) == 0)
end
end
foo (generic function with 2 methods)
julia> M, N = 1000, 100
(1000, 100)
julia> GC.gc(); @time foo(M, N; param = true);
0.219966 seconds (432.51 k allocations: 60.510 MiB, 4.07% gc time, 72.76% compilation time)
julia> GC.gc(); @time foo(N, N; param = false);
0.023459 seconds (19.14 k allocations: 3.442 MiB, 87.16% compilation time) There's a big difference in allocations there. |
I think I understand the problem julia> using JuMP, ParameterJuMP
julia> model = Model()
A JuMP Model
Feasibility problem with:
Variables: 0
Model mode: AUTOMATIC
CachingOptimizer state: NO_OPTIMIZER
Solver name: No optimizer attached.
julia> @variable(model, x[1:2])
2-element Vector{VariableRef}:
x[1]
x[2]
julia> @variable(model, p, ParameterJuMP.Param())
p
julia> f(x) = ScalarConstraint(x, MOI.EqualTo(0.0))
f (generic function with 1 method)
julia> A = x[1] + x[2]
x[1] + x[2]
julia> B = x[1] + p
x[1] + p
julia> @benchmark f($A)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 1.665 ns (0.00% GC)
median time: 1.845 ns (0.00% GC)
mean time: 1.813 ns (0.00% GC)
maximum time: 20.717 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
julia> @benchmark f($B)
BenchmarkTools.Trial:
memory estimate: 1.56 KiB
allocs estimate: 19
--------------
minimum time: 669.399 ns (0.00% GC)
median time: 696.278 ns (0.00% GC)
mean time: 917.536 ns (18.24% GC)
maximum time: 81.787 μs (98.82% GC)
--------------
samples: 10000
evals/sample: 153 This is Julia's heap/stack optimization in practice. The heap allocations explain why there is much greater GC in the parameter version, and why creating a parameter thing is slower, even though there isn't a good explanation. |
we saw something similar with structs in PowerSystems that julia spent quite some time moving them between the stack and the heap, so we made them mutable and prevent that. @daniel-thom, do you recall how did we measure that? |
|
As a reference here is the result of @code_llvm for each case in the example above julia> @code_llvm f(A)
; @ REPL[8]:1 within `f'
define void @julia_f_3849({ {}*, [1 x double] }* noalias nocapture sret %0, [1 x {}*]* noalias nocapture %1, {}* nonnull align 8 dereferenceable(16) %2) {
top:
%3 = getelementptr inbounds [1 x {}*], [1 x {}*]* %1, i64 0, i64 0
store {}* %2, {}** %3, align 8
%.repack = getelementptr inbounds { {}*, [1 x double] }, { {}*, [1 x double] }* %0, i64 0, i32 0
store {}* %2, {}** %.repack, align 8
%4 = getelementptr inbounds { {}*, [1 x double] }, { {}*, [1 x double] }* %0, i64 0, i32 1, i64 0
store double 0.000000e+00, double* %4, align 8
ret void
} julia> @code_llvm f(B)
; @ REPL[8]:1 within `f'
define void @julia_f_3834({ {}*, [1 x double] }* noalias nocapture sret %0, [1 x {}*]* noalias nocapture %1, {}* nonnull align 8 dereferenceable(16) %2) {
top:
%3 = alloca [2 x {}*], align 8
%.sub = getelementptr inbounds [2 x {}*], [2 x {}*]* %3, i64 0, i64 0
; ┌ @ /Users/jdlara/cache/JuMP.jl/src/constraints.jl:437 within `ScalarConstraint' @ /Users/jdlara/cache/JuMP.jl/src/constraints.jl:437
store {}* inttoptr (i64 5878957680 to {}*), {}** %.sub, align 8
%4 = getelementptr inbounds [2 x {}*], [2 x {}*]* %3, i64 0, i64 1
store {}* %2, {}** %4, align 8
%5 = call nonnull {}* @j1_convert_3836({}* inttoptr (i64 4637115040 to {}*), {}** nonnull %.sub, i32 2)
; └
%6 = getelementptr inbounds [1 x {}*], [1 x {}*]* %1, i64 0, i64 0
store {}* %5, {}** %6, align 8
%.repack = getelementptr inbounds { {}*, [1 x double] }, { {}*, [1 x double] }* %0, i64 0, i32 0
store {}* %5, {}** %.repack, align 8
%7 = getelementptr inbounds { {}*, [1 x double] }, { {}*, [1 x double] }* %0, i64 0, i32 1, i64 0
store double 0.000000e+00, double* %7, align 8
ret void
} |
@odow I can't seem to reproduce the issue outside of JuMP using BenchmarkTools
using DataStructures
struct fooOD{T, U, V}
a::OrderedDict{T, V}
b::OrderedDict{U, V}
end
struct barOD{T, V}
a::OrderedDict{T, V}
end
struct FooBar{F,S}
a::F
b::S
end
f(x) = FooBar(x, 1.0)
test_foo_od = fooOD(OrderedDict("A" => 10.0, "B" => 11.0), OrderedDict(3 => 13.0, 4 => 19.0))
test_bar_od = barOD(OrderedDict(1 => 10.0, 2 => 11.0, 3 => 13.0, 4 => 19.0))
@benchmark f($test_bar_od)
@benchmark f($test_foo_od)
bar: one dict
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 1.664 ns (0.00% GC)
median time: 2.064 ns (0.00% GC)
mean time: 2.210 ns (0.00% GC)
maximum time: 53.048 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
foo: two dicts
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 1.662 ns (0.00% GC)
median time: 2.051 ns (0.00% GC)
mean time: 2.281 ns (0.00% GC)
maximum time: 133.248 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000 |
This is a second attempt to reproduce without any success. using BenchmarkTools
using DataStructures
mutable struct model_
field1::Dict
end
model_foobar = model_(Dict("key" => 12.3))
abstract type keys end
struct Key1 <: keys
index::Int
model::model_
end
struct Key2 <: keys
index::Int
model::model_
end
struct FooBar{F,S}
a::F
b::S
end
mutable struct fooOD{T, U, V}
a::OrderedDict{T, V}
b::OrderedDict{U, V}
end
mutable struct barOD{T, V}
a::OrderedDict{T, V}
end
test_foo_od = fooOD(OrderedDict(Key1(1, model_foobar) => 10.0, Key1(2, model_foobar) => 11.0), OrderedDict(Key2(1, model_foobar) => 13.0, Key2(2, model_foobar) => 19.0))
test_bar_od = barOD(OrderedDict(Key1(1, model_foobar) => 10.0, Key1(2, model_foobar) => 11.0, Key1(3, model_foobar) => 13.0, Key1(4, model_foobar) => 19.0))
f(x) = FooBar(x, 1.0)
@benchmark f($test_bar_od)
@benchmark f($test_foo_od)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 1.757 ns (0.00% GC)
median time: 1.871 ns (0.00% GC)
mean time: 1.994 ns (0.00% GC)
maximum time: 33.388 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 1.357 ns (0.00% GC)
median time: 1.413 ns (0.00% GC)
mean time: 1.533 ns (0.00% GC)
maximum time: 60.137 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000 |
Yeah. What's weird is that the following with julia> using JuMP, ParameterJuMP, BenchmarkTools
julia> model = Model()
A JuMP Model
Feasibility problem with:
Variables: 0
Model mode: AUTOMATIC
CachingOptimizer state: NO_OPTIMIZER
Solver name: No optimizer attached.
julia> x = VariableRef(model, MOI.VariableIndex(1))
noname
julia> p = ParameterJuMP.ParameterRef(1, model);
julia> struct Foo{A,B}
a::A
b::B
end
julia> A = 2 * x;
julia> B = 2 * p;
julia> foo(x) = Foo(x, 0.0)
foo (generic function with 1 method)
julia> @benchmark foo($A)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 1.617 ns (0.00% GC)
median time: 1.873 ns (0.00% GC)
mean time: 1.929 ns (0.00% GC)
maximum time: 31.496 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
julia> @benchmark foo($B)
BenchmarkTools.Trial:
memory estimate: 1.27 KiB
allocs estimate: 16
--------------
minimum time: 502.902 ns (0.00% GC)
median time: 520.249 ns (0.00% GC)
mean time: 653.548 ns (15.73% GC)
maximum time: 30.005 μs (97.00% GC)
--------------
samples: 10000
evals/sample: 193
julia> using JuMP, ParameterJuMP, BenchmarkTools
julia> model = Model()
A JuMP Model
Feasibility problem with:
Variables: 0
Model mode: AUTOMATIC
CachingOptimizer state: NO_OPTIMIZER
Solver name: No optimizer attached.
julia> x = VariableRef(model, MOI.VariableIndex(1))
noname
julia> struct MyVar <: JuMP.AbstractVariableRef
x::Int64
model::Model
end
julia> p = MyVar(1, model);
julia> struct Foo{A,B}
a::A
b::B
end
julia> A = 2 * x;
julia> B = 2 * p;
julia> foo(x) = Foo(x, 0.0)
foo (generic function with 1 method)
julia> @benchmark foo($A)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 1.617 ns (0.00% GC)
median time: 1.877 ns (0.00% GC)
mean time: 1.904 ns (0.00% GC)
maximum time: 20.480 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
julia> @benchmark foo($B)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 1.619 ns (0.00% GC)
median time: 1.870 ns (0.00% GC)
mean time: 1.903 ns (0.00% GC)
maximum time: 22.370 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000 |
@odow's last example suggests that we can try commenting out stuff from the package to find the culprit. |
from the same for @odow pasted Just the creation of a |
using JuMP
using ParameterJuMP
using Profile
using PProf
using BenchmarkTools
model = Model()
x = VariableRef(model, MOI.VariableIndex(1))
p = ParameterJuMP.ParameterRef(1, model)
struct Foo{A,B}
a::A
b::B
end
A1 = 2 * x
B1 = 2 * p
A2 = JuMP._build_aff_expr(0.0, 2.0, x)
B2 = JuMP._build_aff_expr(0.0, 2.0, p)
foo(x) = Foo(x, 0.0)
@benchmark foo($A1)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 1.393 ns (0.00% GC)
median time: 1.401 ns (0.00% GC)
mean time: 1.473 ns (0.00% GC)
maximum time: 42.435 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
@benchmark foo($B1)
BenchmarkTools.Trial:
memory estimate: 1.27 KiB
allocs estimate: 16
--------------
minimum time: 537.564 ns (0.00% GC)
median time: 562.479 ns (0.00% GC)
mean time: 755.226 ns (19.27% GC)
maximum time: 43.067 μs (98.49% GC)
--------------
samples: 10000
evals/sample: 188
@benchmark foo($A2)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 1.473 ns (0.00% GC)
median time: 1.519 ns (0.00% GC)
mean time: 1.697 ns (0.00% GC)
maximum time: 55.618 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
@benchmark foo($B2)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 1.474 ns (0.00% GC)
median time: 1.642 ns (0.00% GC)
mean time: 1.744 ns (0.00% GC)
maximum time: 38.107 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
@joaquimg seems that in fact there is something off with the code for PAE's |
|
Ah. I'm an idiot. I assumed that julia> typeof(A1)
AffExpr (alias for GenericAffExpr{Float64, VariableRef})
julia> typeof(B1)
ParameterJuMP.DoubleGenericAffExpr{Int64, VariableRef, ParameterRef} This is just the difference between It seems reasonable that you could put the first on the stack and the second on the heap. |
The rationale of creating directly a |
Few other things to consider are: 2 - Replace: mutable struct DoubleGenericAffExpr{C,V,P} <: JuMP.AbstractJuMPScalar
v::JuMP.GenericAffExpr{C,V}
p::JuMP.GenericAffExpr{C,P}
end by: mutable struct GenericAffExpr{CoefType,VarType,ParType} <: AbstractJuMPScalar
constant::CoefType
v_terms::OrderedDict{VarType,CoefType}
p_terms::OrderedDict{ParType,CoefType}
end |
This might work. Option 2 is unlikely to make a difference. If Julia won't put a |
@odow in this example with 2 ordered dicts it worked. |
This is the reason for suggestion 2 of my post. |
I started implementing suggestion 2. Progress has been slow. |
I made a test of the performance impact of having Parameters in an affine expression for a different number of expressions in the vector product operation.
The test compares 3 possible cases.
ParameterJuMP.ParametrizedGenericAffExpr{Float64, JuMP.VariableRef}
JuMP.GenericAffExpr{Float64, JuMP.VariableRef}
JuMP.GenericAffExpr{Float64, JuMP.VariableRef}
andJuMP.GenericAffExpr{Float64, JuMP.VariableRef}
Here is some reproducible code:
Results:
The text was updated successfully, but these errors were encountered: