Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple refactorings reformatted #193

Draft
wants to merge 66 commits into
base: devel
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
0c84903
RAM ctor: use random parameters instead of NaNs
Apr 1, 2024
3b2901c
move check_acyclic() to abstract.jl
alyst Jun 13, 2024
f0338bd
WIP SemImplyState
Mar 17, 2024
c3b61a4
move params() to common.jl
alyst Aug 11, 2024
f7a6e61
AbstractSem: improve imply/observed API redirect
alyst Dec 22, 2024
289e3c2
SemObservedMissing: refactor
alyst Dec 24, 2024
c49339e
minus2ll(): cleanup method signatures
alyst Mar 18, 2024
9d4490a
fix chi2
alyst Jun 13, 2024
d3d96c7
fix RMSEA
Mar 19, 2024
c514cd8
FIML: update
alyst Jan 9, 2025
dbda7a8
declare cov matrices symmetric
alyst Jun 13, 2024
9e58074
EM: optimizations
Mar 20, 2024
1fb8396
start_simple(SemEnsemble): simplify
Mar 20, 2024
a9bde45
RAM: reuse sigma array
Mar 23, 2024
1529df5
RAM: optional sparse Sigma matrix
Apr 1, 2024
a610587
RAM: declare (I-A)^-1 up/low tri too
Mar 23, 2024
cd7ab28
ML: refactor to minimize allocs
alyst Apr 23, 2024
da4bd18
add PackageExtensionCompat
Mar 12, 2024
f92a4d3
variance_params(SEMSpec)
Mar 26, 2024
86ca820
predict_latent_vars()
alyst Aug 1, 2024
204c9bb
lavaan_model()
Apr 1, 2024
944c8a6
EM: move code refs to docstring
Apr 10, 2024
e778934
EM MVN: decouple from SemObsMissing
alyst Dec 22, 2024
e240268
test/fiml: set EM MVN rtol=1e-10
alyst Apr 14, 2024
2e2460e
SemObsMissing: fix obs_mean() test
alyst Aug 11, 2024
9d01396
MissingPattern: transpose data
Apr 17, 2024
2e70d1d
EM MVN: report rel_error if not converged
Apr 17, 2024
dfb9bda
EM: max_nsamples_em opt to limit samples used
alyst Jun 13, 2024
44a8b9c
EM: optimize mean handling
alyst Aug 11, 2024
6a783fd
p_values(): use ccdf()
May 28, 2024
35699a1
test_grad/hess(): check that alt calls give same results
May 29, 2024
6c0d5bf
RAMMatrices formatting fixes
alyst Jun 13, 2024
7751d97
political democracy formatting fixes
alyst Jun 13, 2024
4709f33
start_simple(): code cleanup
alyst Aug 11, 2024
fe7c9a5
start_simple(): start vals for lat and obs means
Jul 9, 2024
59bc445
observed_vars(RAMMatrices; order): rows/cols order
alyst Dec 22, 2024
6cc1c1a
observed_var_indices(::RAMMatrices; order=:columns)
Sep 22, 2024
61b3386
move sparse mtx utils to new file
alyst Dec 22, 2024
fab3c9c
EM: min_eigval kw for regularization
alyst Dec 22, 2024
b432452
fix sem_summary() ws
alyst Dec 22, 2024
be0724b
fix ParTable ws
alyst Dec 22, 2024
a94ed68
fix batch_sym_inv_updates() ws
alyst Dec 22, 2024
580ae05
fix RAM (generic) ws
alyst Dec 22, 2024
29b0152
fix test/multigroup ws
alyst Dec 22, 2024
13b1559
fix test/multigroup ws
alyst Dec 22, 2024
b3c0d8b
fix EnsParTable ws
alyst Dec 22, 2024
2078723
reorder_observed_vars!(spec) method
alyst Dec 24, 2024
43dc340
vech() and vechinds() functions
alyst Dec 24, 2024
43b1314
SemImply/SemLossFun: drop meanstructure kwarg
May 30, 2024
554e47c
RAMMatrices(): ctor to replace params
May 27, 2024
a071d43
RAMSymbolic: rename _func to _eval!
alyst Dec 24, 2024
604dfdf
observed vars check
May 9, 2024
39655e0
imply -> implied, SemImply -> SemImplied
alyst Dec 23, 2024
a8b6e34
imply -> implied: file renames
alyst Dec 23, 2024
7f8a4aa
md ws fixes
alyst Dec 23, 2024
169f3ad
use `@printf` to limit signif digits printed
alyst Dec 24, 2024
1becd3d
refactor Sem, SemEnsemble, SemLoss
alyst Dec 24, 2024
fcb0b8c
refactor Sem, SemEnsemble, SemLoss
alyst Dec 24, 2024
6f75919
ProxAlgo: fix doc typo
alyst Dec 24, 2024
61dacb0
test/Proximal: move usings to the central file
alyst Dec 24, 2024
74ad749
ML/FIML: workaround generic_matmul issue
alyst Dec 24, 2024
ffd2f0e
tests: move usings in the top file
alyst Dec 24, 2024
7d94993
remove multigroup2 tests
alyst Dec 24, 2024
875e0e5
tests: revert kwless ctors
alyst Dec 24, 2024
b3082b6
tests/data_inp_formats: refactor
alyst Dec 24, 2024
739a359
BlackBoxOptim.jl backend support
alyst Dec 23, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,10 @@ LineSearches = "d3d80556-e9d4-5f37-9878-2ab0fcc64255"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
NLSolversBase = "d41bc354-129a-5804-8e4c-c37616107c6c"
Optim = "429524aa-4258-5aef-a3af-852621145aeb"
PackageExtensionCompat = "65ce6f38-6b18-4e1d-a461-8949797d7930"
Pkg = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f"
PrettyTables = "08abe8d2-0d0c-5749-adfa-8a2ac140af0d"
Printf = "de0858da-6303-5e67-8744-51eddeeeb8d7"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
SparseArrays = "2f01184e-e22b-5df5-ae63-d93ebab69eaf"
Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
Expand Down Expand Up @@ -46,11 +48,14 @@ Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
test = ["Test"]

[weakdeps]
BlackBoxOptim = "a134a8b2-14d6-55f6-9291-3336d3ab0209"
NLopt = "76087f3c-5699-56af-9a33-bf431cd00edd"
Optimisers = "3bd65402-5787-11e9-1adc-39752487f4e2"
ProximalAlgorithms = "140ffc9f-1907-541a-a177-7475e0a401e9"
ProximalCore = "dc4f5ac2-75d1-4f31-931e-60435d74994b"
ProximalOperators = "a725b495-10eb-56fe-b38b-717eba820537"

[extensions]
SEMNLOptExt = "NLopt"
SEMProximalOptExt = ["ProximalCore", "ProximalAlgorithms", "ProximalOperators"]
SEMBlackBoxOptimExt = ["BlackBoxOptim", "Optimisers"]
2 changes: 1 addition & 1 deletion docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ makedocs(
"Developer documentation" => [
"Extending the package" => "developer/extending.md",
"Custom loss functions" => "developer/loss.md",
"Custom imply types" => "developer/imply.md",
"Custom implied types" => "developer/implied.md",
"Custom optimizer types" => "developer/optimizer.md",
"Custom observed types" => "developer/observed.md",
"Custom model types" => "developer/sem.md",
Expand Down
38 changes: 19 additions & 19 deletions docs/src/developer/imply.md → docs/src/developer/implied.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# Custom imply types
# Custom implied types

We recommend to first read the part [Custom loss functions](@ref), as the overall implementation is the same and we will describe it here more briefly.

Imply types are of subtype `SemImply`. To implement your own imply type, you should define a struct
Implied types are of subtype `SemImplied`. To implement your own implied type, you should define a struct

```julia
struct MyImply <: SemImply
struct MyImplied <: SemImplied
...
end
```
Expand All @@ -15,37 +15,37 @@ and at least a method to compute the objective
```julia
import StructuralEquationModels: objective!

function objective!(imply::MyImply, par, model::AbstractSemSingle)
function objective!(implied::MyImplied, par, model::AbstractSemSingle)
...
return nothing
end
```

This method should compute and store things you want to make available to the loss functions, and returns `nothing`. For example, as we have seen in [Second example - maximum likelihood](@ref), the `RAM` imply type computes the model-implied covariance matrix and makes it available via `Σ(imply)`.
To make stored computations available to loss functions, simply write a function - for example, for the `RAM` imply type we defined
This method should compute and store things you want to make available to the loss functions, and returns `nothing`. For example, as we have seen in [Second example - maximum likelihood](@ref), the `RAM` implied type computes the model-implied covariance matrix and makes it available via `Σ(implied)`.
To make stored computations available to loss functions, simply write a function - for example, for the `RAM` implied type we defined

```julia
Σ(imply::RAM) = imply
Σ(implied::RAM) = implied
```

Additionally, you can specify methods for `gradient` and `hessian` as well as the combinations described in [Custom loss functions](@ref).

The last thing nedded to make it work is a method for `nparams` that takes your imply type and returns the number of parameters of the model:
The last thing nedded to make it work is a method for `nparams` that takes your implied type and returns the number of parameters of the model:

```julia
nparams(imply::MyImply) = ...
nparams(implied::MyImplied) = ...
```

Just as described in [Custom loss functions](@ref), you may define a constructor. Typically, this will depend on the `specification = ...` argument that can be a `ParameterTable` or a `RAMMatrices` object.

We implement an `ImplyEmpty` type in our package that does nothing but serving as an imply field in case you are using a loss function that does not need any imply type at all. You may use it as a template for defining your own imply type, as it also shows how to handle the specification objects:
We implement an `ImpliedEmpty` type in our package that does nothing but serving as an `implied` field in case you are using a loss function that does not need any implied type at all. You may use it as a template for defining your own implied type, as it also shows how to handle the specification objects:

```julia
############################################################################
### Types
############################################################################

struct ImplyEmpty{V, V2} <: SemImply
struct ImpliedEmpty{V, V2} <: SemImplied
identifier::V2
n_par::V
end
Expand All @@ -54,7 +54,7 @@ end
### Constructors
############################################################################

function ImplyEmpty(;
function ImpliedEmpty(;
specification,
kwargs...)

Expand All @@ -63,25 +63,25 @@ function ImplyEmpty(;

n_par = length(ram_matrices.parameters)

return ImplyEmpty(identifier, n_par)
return ImpliedEmpty(identifier, n_par)
end

############################################################################
### methods
############################################################################

objective!(imply::ImplyEmpty, par, model) = nothing
gradient!(imply::ImplyEmpty, par, model) = nothing
hessian!(imply::ImplyEmpty, par, model) = nothing
objective!(implied::ImpliedEmpty, par, model) = nothing
gradient!(implied::ImpliedEmpty, par, model) = nothing
hessian!(implied::ImpliedEmpty, par, model) = nothing

############################################################################
### Recommended methods
############################################################################

identifier(imply::ImplyEmpty) = imply.identifier
n_par(imply::ImplyEmpty) = imply.n_par
identifier(implied::ImpliedEmpty) = implied.identifier
n_par(implied::ImpliedEmpty) = implied.n_par

update_observed(imply::ImplyEmpty, observed::SemObserved; kwargs...) = imply
update_observed(implied::ImpliedEmpty, observed::SemObserved; kwargs...) = implied
```

As you see, similar to [Custom loss functions](@ref) we implement a method for `update_observed`. Additionally, you should store the `identifier` from the specification object and write a method for `identifier`, as this will make it possible to access parameter indices by label.
10 changes: 5 additions & 5 deletions docs/src/developer/loss.md
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ function MyLoss(;arg1 = ..., arg2, kwargs...)
end
```

All keyword arguments that a user passes to the Sem constructor are passed to your loss function. In addition, all previously constructed parts of the model (imply and observed part) are passed as keyword arguments as well as the number of parameters `n_par = ...`, so your constructor may depend on those. For example, the constructor for `SemML` in our package depends on the additional argument `meanstructure` as well as the observed part of the model to pre-allocate arrays of the same size as the observed covariance matrix and the observed mean vector:
All keyword arguments that a user passes to the Sem constructor are passed to your loss function. In addition, all previously constructed parts of the model (implied and observed part) are passed as keyword arguments as well as the number of parameters `n_par = ...`, so your constructor may depend on those. For example, the constructor for `SemML` in our package depends on the additional argument `meanstructure` as well as the observed part of the model to pre-allocate arrays of the same size as the observed covariance matrix and the observed mean vector:

```julia
function SemML(;observed, meanstructure = false, approx_H = false, kwargs...)
Expand Down Expand Up @@ -221,9 +221,9 @@ To keep it simple, we only cover models without a meanstructure. The maximum lik
F_{ML} = \log \det \Sigma_i + \mathrm{tr}\left(\Sigma_{i}^{-1} \Sigma_o \right)
```

where ``\Sigma_i`` is the model implied covariance matrix and ``\Sigma_o`` is the observed covariance matrix. We can query the model implied covariance matrix from the `imply` par of our model, and the observed covariance matrix from the `observed` path of our model.
where ``\Sigma_i`` is the model implied covariance matrix and ``\Sigma_o`` is the observed covariance matrix. We can query the model implied covariance matrix from the `implied` par of our model, and the observed covariance matrix from the `observed` path of our model.

To get information on what we can access from a certain `imply` or `observed` type, we can check it`s documentation an the pages [API - model parts](@ref) or via the help mode of the REPL:
To get information on what we can access from a certain `implied` or `observed` type, we can check it`s documentation an the pages [API - model parts](@ref) or via the help mode of the REPL:

```julia
julia>?
Expand All @@ -233,7 +233,7 @@ help?> RAM
help?> SemObservedCommon
```

We see that the model implied covariance matrix can be assessed as `Σ(imply)` and the observed covariance matrix as `obs_cov(observed)`.
We see that the model implied covariance matrix can be assessed as `Σ(implied)` and the observed covariance matrix as `obs_cov(observed)`.

With this information, we write can implement maximum likelihood optimization as

Expand All @@ -245,7 +245,7 @@ import StructuralEquationModels: Σ, obs_cov, objective!

function objective!(semml::MaximumLikelihood, parameters, model::AbstractSem)
# access the model implied and observed covariance matrices
Σᵢ = Σ(imply(model))
Σᵢ = Σ(implied(model))
Σₒ = obs_cov(observed(model))
# compute the objective
if isposdef(Symmetric(Σᵢ)) # is the model implied covariance matrix positive definite?
Expand Down
2 changes: 1 addition & 1 deletion docs/src/developer/observed.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ nsamples(observed::MyObserved) = ...
nobserved_vars(observed::MyObserved) = ...
```

As always, you can add additional methods for properties that imply types and loss function want to access, for example (from the `SemObservedCommon` implementation):
As always, you can add additional methods for properties that implied types and loss function want to access, for example (from the `SemObservedCommon` implementation):

```julia
obs_cov(observed::SemObservedCommon) = observed.obs_cov
Expand Down
14 changes: 7 additions & 7 deletions docs/src/developer/sem.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
# Custom model types

The abstract supertype for all models is `AbstractSem`, which has two subtypes, `AbstractSemSingle{O, I, L, D}` and `AbstractSemCollection`. Currently, there are 2 subtypes of `AbstractSemSingle`: `Sem`, `SemFiniteDiff`. All subtypes of `AbstractSemSingle` should have at least observed, imply, loss and optimizer fields, and share their types (`{O, I, L, D}`) with the parametric abstract supertype. For example, the `SemFiniteDiff` type is implemented as
The abstract supertype for all models is `AbstractSem`, which has two subtypes, `AbstractSemSingle{O, I, L, D}` and `AbstractSemCollection`. Currently, there are 2 subtypes of `AbstractSemSingle`: `Sem`, `SemFiniteDiff`. All subtypes of `AbstractSemSingle` should have at least observed, implied, loss and optimizer fields, and share their types (`{O, I, L, D}`) with the parametric abstract supertype. For example, the `SemFiniteDiff` type is implemented as

```julia
struct SemFiniteDiff{
O <: SemObserved,
I <: SemImply,
L <: SemLoss,
O <: SemObserved,
I <: SemImplied,
L <: SemLoss,
D <: SemOptimizer} <: AbstractSemSingle{O, I, L, D}
observed::O
imply::I
implied::I
loss::L
optimizer::D
end
Expand All @@ -19,13 +19,13 @@ Additionally, we need to define a method to compute at least the objective value

```julia
function objective!(model::AbstractSemSingle, parameters)
objective!(imply(model), parameters, model)
objective!(implied(model), parameters, model)
return objective!(loss(model), parameters, model)
end

function gradient!(gradient, model::AbstractSemSingle, parameters)
fill!(gradient, zero(eltype(gradient)))
gradient!(imply(model), parameters, model)
gradient!(implied(model), parameters, model)
gradient!(gradient, loss(model), parameters, model)
end
```
Expand Down
2 changes: 1 addition & 1 deletion docs/src/internals/files.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ All source code is in the `"src"` folder:
- `"StructuralEquationModels.jl"` defines the module and the exported objects
- `"types.jl"` defines all abstract types and the basic type hierarchy
- `"objective_gradient_hessian.jl"` contains methods for computing objective, gradient and hessian values for different model types as well as generic fallback methods
- The four folders `"observed"`, `"imply"`, `"loss"` and `"diff"` contain implementations of specific subtypes (for example, the `"loss"` folder contains a file `"ML.jl"` that implements the `SemML` loss function).
- The four folders `"observed"`, `"implied"`, `"loss"` and `"diff"` contain implementations of specific subtypes (for example, the `"loss"` folder contains a file `"ML.jl"` that implements the `SemML` loss function).
- `"optimizer"` contains connections to different optimization backends (aka methods for `sem_fit`)
- `"optim.jl"`: connection to the `Optim.jl` package
- `"NLopt.jl"`: connection to the `NLopt.jl` package
Expand Down
2 changes: 1 addition & 1 deletion docs/src/internals/types.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,6 @@ The type hierarchy is implemented in `"src/types.jl"`.
- `SemFiniteDiff`: models whose gradients and/or hessians should be computed via finite difference approximation
- `AbstractSemCollection <: AbstractSem` is an abstract supertype of all models that contain multiple `AbstractSem` submodels

Every `AbstractSemSingle` has to have `SemObserved`, `SemImply`, `SemLoss` and `SemOptimizer` fields (and can have additional fields).
Every `AbstractSemSingle` has to have `SemObserved`, `SemImplied`, `SemLoss` and `SemOptimizer` fields (and can have additional fields).

`SemLoss` is a container for multiple `SemLossFunctions`.
2 changes: 1 addition & 1 deletion docs/src/performance/symbolic.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,6 @@ If the model is acyclic, we can compute
```

for some ``n < \infty``.
Typically, the ``S`` and ``A`` matrices are sparse. In our package, we offer symbolic precomputation of ``\Sigma``, ``\nabla\Sigma`` and even ``\nabla^2\Sigma`` for acyclic models to optimally exploit this sparsity. To use this feature, simply use the `RAMSymbolic` imply type for your model.
Typically, the ``S`` and ``A`` matrices are sparse. In our package, we offer symbolic precomputation of ``\Sigma``, ``\nabla\Sigma`` and even ``\nabla^2\Sigma`` for acyclic models to optimally exploit this sparsity. To use this feature, simply use the `RAMSymbolic` implied type for your model.

This can decrase model fitting time, but will also increase model building time (as we have to carry out the symbolic computations and compile specialised functions). As a result, this is probably not beneficial to use if you only fit a single model, but can lead to great improvements if you fit the same modle to multiple datasets (e.g. to compute bootstrap standard errors).
Loading
Loading