Skip to content

Commit

Permalink
Merge pull request #203 from alyst/refactor_semspec
Browse files Browse the repository at this point in the history
Refactor SEM Specification
Maximilian-Stefan-Ernst authored May 31, 2024
2 parents cdc4415 + b2012b0 commit 55f9087
Showing 60 changed files with 1,328 additions and 1,331 deletions.
4 changes: 3 additions & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -22,10 +22,11 @@ Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
StenoGraphs = "78862bba-adae-4a83-bb4d-33c106177f81"
Symbolics = "0c5d862f-8b57-4792-8d23-62f2024744c7"
SymbolicUtils = "d1185830-fcd6-423d-90d6-eec64667417b"

[compat]
julia = "1.9, 1.10"
StenoGraphs = "0.2"
StenoGraphs = "0.2, 0.3"
DataFrames = "1"
Distributions = "0.25"
FiniteDiff = "2"
@@ -36,6 +37,7 @@ Optim = "1"
PrettyTables = "2"
StatsBase = "0.33, 0.34"
Symbolics = "4, 5"
SymbolicUtils = "1.4 - 1.5"

[extras]
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
4 changes: 2 additions & 2 deletions docs/src/developer/imply.md
Original file line number Diff line number Diff line change
@@ -30,10 +30,10 @@ To make stored computations available to loss functions, simply write a function

Additionally, you can specify methods for `gradient` and `hessian` as well as the combinations described in [Custom loss functions](@ref).

The last thing nedded to make it work is a method for `n_par` that takes your imply type and returns the number of parameters of the model:
The last thing nedded to make it work is a method for `nparams` that takes your imply type and returns the number of parameters of the model:

```julia
n_par(imply::MyImply) = ...
nparams(imply::MyImply) = ...
```

Just as described in [Custom loss functions](@ref), you may define a constructor. Typically, this will depend on the `specification = ...` argument that can be a `ParameterTable` or a `RAMMatrices` object.
9 changes: 5 additions & 4 deletions docs/src/developer/loss.md
Original file line number Diff line number Diff line change
@@ -60,11 +60,12 @@ graph = @StenoGraph begin
end
partable = ParameterTable(
graph,
latent_vars = latent_vars,
observed_vars = observed_vars,
graph = graph)
observed_vars = observed_vars
)
parameter_indices = get_identifier_indices([:a, :b, :c], partable)
parameter_indices = param_indices([:a, :b, :c], partable)
myridge = Ridge(0.01, parameter_indices)
model = SemFiniteDiff(
@@ -269,4 +270,4 @@ model_ml = SemFiniteDiff(
model_fit = sem_fit(model_ml)
```

If you want to differentiate your own loss functions via automatic differentiation, check out the [AutoDiffSEM](https://github.com/StructuralEquationModels/AutoDiffSEM) package (spoiler allert: it's really easy).
If you want to differentiate your own loss functions via automatic differentiation, check out the [AutoDiffSEM](https://github.com/StructuralEquationModels/AutoDiffSEM) package.
3 changes: 2 additions & 1 deletion docs/src/developer/sem.md
Original file line number Diff line number Diff line change
@@ -11,7 +11,8 @@ struct SemFiniteDiff{
observed::O
imply::I
loss::L
optimizer::Dend
optimizer::D
end
```

Additionally, we need to define a method to compute at least the objective value, and if you want to use gradient based optimizers (which you most probably will), we need also to define a method to compute the gradient. For example, the respective fallback methods for all `AbstractSemSingle` models are defined as
5 changes: 3 additions & 2 deletions docs/src/performance/simulation.md
Original file line number Diff line number Diff line change
@@ -43,9 +43,10 @@ graph = @StenoGraph begin
end
partable = ParameterTable(
graph,
latent_vars = latent_vars,
observed_vars = observed_vars,
graph = graph)
observed_vars = observed_vars
)
```

```@example swap_observed
2 changes: 1 addition & 1 deletion docs/src/performance/sorting.md
Original file line number Diff line number Diff line change
@@ -13,7 +13,7 @@ To automatically reorder your variables in a way that makes this optimization po
We use it as

```julia
sort!(parameter_table)
sort_vars!(parameter_table)

model = Sem(
specification = parameter_table,
6 changes: 3 additions & 3 deletions docs/src/tutorials/collection/multigroup.md
Original file line number Diff line number Diff line change
@@ -61,8 +61,8 @@ You can then use the resulting graph to specify an `EnsembleParameterTable`
```@example mg; ansicolor = true
groups = [:Pasteur, :Grant_White]
partable = EnsembleParameterTable(;
graph = graph,
partable = EnsembleParameterTable(
graph,
observed_vars = observed_vars,
latent_vars = latent_vars,
groups = groups)
@@ -71,7 +71,7 @@ partable = EnsembleParameterTable(;
The parameter table can be used to create a `Dict` of RAMMatrices with keys equal to the group names and parameter tables as values:

```@example mg; ansicolor = true
specification = RAMMatrices(partable)
specification = convert(Dict{Symbol, RAMMatrices}, partable)
```

That is, you can asses the group-specific `RAMMatrices` as `specification[:group_name]`.
35 changes: 21 additions & 14 deletions docs/src/tutorials/constraints/constraints.md
Original file line number Diff line number Diff line change
@@ -16,13 +16,13 @@ graph = @StenoGraph begin
# loadings
ind60 → fixed(1)*x1 + x2 + x3
dem60 → fixed(1)*y1 + y2 + y3 + y4
dem60 → fixed(1)*y1 + label(:λ₂)*y2 + label(:λ₃)*y3 + y4
dem65 → fixed(1)*y5 + y6 + y7 + y8
# latent regressions
ind60 → dem60
dem60 → dem65
ind60 → dem65
ind60 → label(:λₗ)*dem65
# variances
_(observed_vars) ↔ _(observed_vars)
@@ -31,15 +31,15 @@ graph = @StenoGraph begin
# covariances
y1 ↔ y5
y2 ↔ y4 + y6
y3 ↔ y7
y8 ↔ y4 + y6
y3 ↔ label(:y3y7)*y7
y8 ↔ label(:y8y4)*y4 + y6
end
partable = ParameterTable(
graph,
latent_vars = latent_vars,
observed_vars = observed_vars,
graph = graph)
observed_vars = observed_vars)
data = example_data("political_democracy")
@@ -64,17 +64,19 @@ Let's introduce some constraints:

(Of course those constaints only serve an illustratory purpose.)

We first need to get the indices of the respective parameters that are invoved in the constraints. We can look up their labels in the output above, and retrieve their indices as
We first need to get the indices of the respective parameters that are invoved in the constraints.
We can look up their labels in the output above, and retrieve their indices as

```@example constraints
parameter_indices = get_identifier_indices([:θ_29, :θ_30, :θ_3, :θ_4, :θ_11], model)
parind = param_indices(model)
parind[:y3y7] # 29
```

The bound constraint is easy to specify: Just give a vector of upper or lower bounds that contains the bound for each parameter. In our example, only parameter number 11 has an upper bound, and the number of total parameters is `n_par(model) = 31`, so we define
The bound constraint is easy to specify: Just give a vector of upper or lower bounds that contains the bound for each parameter. In our example, only the parameter labeled `:λₗ` has an upper bound, and the number of total parameters is `n_par(model) = 31`, so we define

```@example constraints
upper_bounds = fill(Inf, 31)
upper_bounds[11] = 0.5
upper_bounds[parind[:λₗ]] = 0.5
```

The equailty and inequality constraints have to be reformulated to be of the form `x = 0` or `x ≤ 0`:
@@ -84,6 +86,8 @@ The equailty and inequality constraints have to be reformulated to be of the for
Now they can be defined as functions of the parameter vector:

```@example constraints
parind[:y3y7] # 29
parind[:y8y4] # 30
# θ[29] + θ[30] - 1 = 0.0
function eq_constraint(θ, gradient)
if length(gradient) > 0
@@ -94,6 +98,8 @@ function eq_constraint(θ, gradient)
return θ[29] + θ[30] - 1
end
parind[:λ₂] # 3
parind[:λ₃] # 4
# θ[3] - θ[4] - 0.1 ≤ 0
function ineq_constraint(θ, gradient)
if length(gradient) > 0
@@ -109,7 +115,7 @@ If the algorithm needs gradients at an iteration, it will pass the vector `gradi
With `if length(gradient) > 0` we check if the algorithm needs gradients, and if it does, we fill the `gradient` vector with the gradients
of the constraint w.r.t. the parameters.

In NLopt, vector-valued constraints are also possible, but we refer to the documentation fot that.
In NLopt, vector-valued constraints are also possible, but we refer to the documentation for that.

### Fit the model

@@ -153,10 +159,11 @@ As you can see, the optimizer converged (`:XTOL_REACHED`) and investigating the

```@example constraints
update_partable!(
partable,
model_fit_constrained,
partable,
:estimate_constr,
params(model_fit_constrained),
solution(model_fit_constrained),
:estimate_constr)
)
sem_summary(partable)
```
4 changes: 2 additions & 2 deletions docs/src/tutorials/construction/build_by_parts.md
Original file line number Diff line number Diff line change
@@ -39,9 +39,9 @@ graph = @StenoGraph begin
end
partable = ParameterTable(
graph,
latent_vars = latent_vars,
observed_vars = observed_vars,
graph = graph)
observed_vars = observed_vars)
```

Now, we construct the different parts:
2 changes: 1 addition & 1 deletion docs/src/tutorials/construction/outer_constructor.md
Original file line number Diff line number Diff line change
@@ -74,7 +74,7 @@ model = Sem(
specification = partable,
data = data,
imply = RAMSymbolic,
loss = SemWLS
loss = SemWLS,
wls_weight_matrix = W
)

4 changes: 2 additions & 2 deletions docs/src/tutorials/first_model.md
Original file line number Diff line number Diff line change
@@ -83,9 +83,9 @@ We then use this graph to define a `ParameterTable` object

```@example high_level; ansicolor = true
partable = ParameterTable(
graph,
latent_vars = latent_vars,
observed_vars = observed_vars,
graph = graph)
observed_vars = observed_vars)
```

load the example data
10 changes: 5 additions & 5 deletions docs/src/tutorials/inspection/inspection.md
Original file line number Diff line number Diff line change
@@ -31,9 +31,9 @@ graph = @StenoGraph begin
end
partable = ParameterTable(
graph,
latent_vars = latent_vars,
observed_vars = observed_vars,
graph = graph)
observed_vars = observed_vars)
data = example_data("political_democracy")
@@ -87,8 +87,8 @@ We can also update the `ParameterTable` object with other information via [`upda
se_bs = se_bootstrap(model_fit; n_boot = 20)
se_he = se_hessian(model_fit)
update_partable!(partable, model_fit, se_he, :se_hessian)
update_partable!(partable, model_fit, se_bs, :se_bootstrap)
update_partable!(partable, :se_hessian, params(model_fit), se_he)
update_partable!(partable, :se_bootstrap, params(model_fit), se_bs)
sem_summary(partable)
```
@@ -130,7 +130,7 @@ df
minus2ll
n_man
n_obs
n_par
nparams
p_value
RMSEA
```
8 changes: 4 additions & 4 deletions docs/src/tutorials/meanstructure.md
Original file line number Diff line number Diff line change
@@ -39,9 +39,9 @@ graph = @StenoGraph begin
end
partable = ParameterTable(
graph,
latent_vars = latent_vars,
observed_vars = observed_vars,
graph = graph)
observed_vars = observed_vars)
```

```julia
@@ -77,9 +77,9 @@ graph = @StenoGraph begin
end

partable = ParameterTable(
graph,
latent_vars = latent_vars,
observed_vars = observed_vars,
graph = graph)
observed_vars = observed_vars)
```

that is, all observed variable means are estimated freely.
11 changes: 6 additions & 5 deletions docs/src/tutorials/regularization/regularization.md
Original file line number Diff line number Diff line change
@@ -86,9 +86,10 @@ graph = @StenoGraph begin
end
partable = ParameterTable(
graph,
latent_vars = latent_vars,
observed_vars = observed_vars,
graph = graph)
observed_vars = observed_vars
)
data = example_data("political_democracy")
@@ -101,7 +102,7 @@ model = Sem(
We labeled the covariances between the items because we want to regularize those:

```@example reg
ind = get_identifier_indices([:cov_15, :cov_24, :cov_26, :cov_37, :cov_48, :cov_68], model)
ind = param_indices([:cov_15, :cov_24, :cov_26, :cov_37, :cov_48, :cov_68], model)
```

In the following, we fit the same model with lasso regularization of those covariances.
@@ -145,7 +146,7 @@ fit = sem_fit(model)
update_estimate!(partable, fit)
update_partable!(partable, fit_lasso, solution(fit_lasso), :estimate_lasso)
update_partable!(partable, :estimate_lasso, params(fit_lasso), solution(fit_lasso))
sem_summary(partable)
```
@@ -179,7 +180,7 @@ fit_mixed = sem_fit(model_mixed)
Let's again compare the different results:

```@example reg
update_partable!(partable, fit_mixed, solution(fit_mixed), :estimate_mixed)
update_partable!(partable, :estimate_mixed, params(fit_mixed), solution(fit_mixed))
sem_summary(partable)
```
8 changes: 4 additions & 4 deletions docs/src/tutorials/specification/graph_interface.md
Original file line number Diff line number Diff line change
@@ -16,9 +16,9 @@ observed_vars = ...
latent_vars = ...

partable = ParameterTable(
graph,
latent_vars = latent_vars,
observed_vars = observed_vars,
graph = graph)
observed_vars = observed_vars)

model = Sem(
specification = partable,
@@ -65,9 +65,9 @@ As you saw above and in the [A first model](@ref) example, the graph object need

```julia
partable = ParameterTable(
graph,
latent_vars = latent_vars,
observed_vars = observed_vars,
graph = graph)
observed_vars = observed_vars)
```

The `ParameterTable` constructor also needs you to specify a vector of observed and latent variables, in the example above this would correspond to
4 changes: 2 additions & 2 deletions docs/src/tutorials/specification/ram_matrices.md
Original file line number Diff line number Diff line change
@@ -59,7 +59,7 @@ spec = RAMMatrices(;
A = A,
S = S,
F = F,
parameters = θ,
params = θ,
colnames = [:x1, :x2, :x3, :y1, :y2, :y3, :y4, :y5, :y6, :y7, :y8, :ind60, :dem60, :dem65]
)

@@ -90,7 +90,7 @@ spec = RAMMatrices(;
A = A,
S = S,
F = F,
parameters = θ,
params = θ,
colnames = [:x1, :x2, :x3, :y1, :y2, :y3, :y4, :y5, :y6, :y7, :y8, :ind60, :dem60, :dem65]
)
```
4 changes: 2 additions & 2 deletions docs/src/tutorials/specification/specification.md
Original file line number Diff line number Diff line change
@@ -18,9 +18,9 @@ graph = @StenoGraph begin
end

partable = ParameterTable(
graph,
latent_vars = latent_vars,
observed_vars = observed_vars,
graph = graph)
observed_vars = observed_vars)

model = Sem(
specification = partable,
Loading

0 comments on commit 55f9087

Please sign in to comment.