Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor SEM Specification #203

Merged

Conversation

alyst
Copy link
Contributor

@alyst alyst commented May 3, 2024

This is a part of #193 that mostly refactors the code for SEM specification types (RAMMatrices, ParameterTable, StenoGraph and EnsembleParameterTable):

  • introduces an abstract SemSpecification type, which is a parent to all the classes that specify SEM
  • make SEM constructors use Julia type dispatch (which constructor gets called depend on the type of the positional arguments; keyword arguments types are not used by the type-based dispatch) -- i.e. for StenoGraph-based constructor the graph kw argument made positional.
  • replaced no-op constructors (e.g. RAMMatrices(a::RAMMatrices) = a) with convert(RAMMatrices, a::RAMMatrices) = a -- that's how Base.convert() is designed to work in Julia.
  • in ParameterTable: switched from partable.variables[:observed_vars] to partable.observed_vars
  • renamed parameter_type to relation in the ParameterTable columns (part of Variables/Parameters/Observations terminology & API Cleanup #199)
  • renamed identifier to param in the ParameterTable columns (part of Variables/Parameters/Observations terminology & API Cleanup #199)
  • replaced identifier(), get_par_npar_identifier() methods with params() call which returns a vector of SEM parameter IDs (part of Variables/Parameters/Observations terminology & API Cleanup #199)
  • renamed n_par() into nparams() (part of Variables/Parameters/Observations terminology & API Cleanup #199)
  • added params field to ParameterTable to explicitly define the order of parameters -- simplifies the logic of SEM ensembles and checking/updating parameters
  • added code for checking the correctness of parameter and variables (dimension matching, uniqueness) at the time of SEM construction
  • param_values/lavaan_param_values() -- moved the code for extracting the parameter values from the parameter tables from tests into the SEM API.
  • renamed Base.sort!(::ParameterTable) to sort_vars!() as it was not clear what this function sorts

@alyst alyst force-pushed the refactor_semspec branch from ae89fc7 to 66b47d6 Compare May 4, 2024 02:43
Copy link

codecov bot commented May 4, 2024

Codecov Report

Attention: Patch coverage is 75.05721% with 109 lines in your changes are missing coverage. Please review.

Project coverage is 69.64%. Comparing base (c1e7a69) to head (b2012b0).
Report is 1 commits behind head on devel.

Files Patch % Lines
src/frontend/fit/summary.jl 0.00% 39 Missing ⚠️
src/frontend/specification/ParameterTable.jl 82.50% 21 Missing ⚠️
...c/frontend/specification/EnsembleParameterTable.jl 63.33% 11 Missing ⚠️
src/frontend/specification/RAMMatrices.jl 89.89% 10 Missing ⚠️
src/objective_gradient_hessian.jl 72.72% 9 Missing ⚠️
src/types.jl 73.33% 4 Missing ⚠️
...c/additional_functions/start_val/start_partable.jl 0.00% 3 Missing ⚠️
src/frontend/specification/StenoGraphs.jl 91.66% 3 Missing ⚠️
src/additional_functions/helper.jl 81.81% 2 Missing ⚠️
src/loss/ML/FIML.jl 77.77% 2 Missing ⚠️
... and 4 more
Additional details and impacted files
@@            Coverage Diff             @@
##            devel     #203      +/-   ##
==========================================
- Coverage   69.66%   69.64%   -0.03%     
==========================================
  Files          52       51       -1     
  Lines        2423     2421       -2     
==========================================
- Hits         1688     1686       -2     
  Misses        735      735              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@alyst alyst force-pushed the refactor_semspec branch from 66b47d6 to 700ccc1 Compare May 4, 2024 03:20
@alyst alyst closed this May 11, 2024
@alyst alyst reopened this May 11, 2024
@alyst alyst force-pushed the refactor_semspec branch from 9330cc4 to d38081c Compare May 11, 2024 20:22
@alyst
Copy link
Contributor Author

alyst commented May 11, 2024

It looks like SymbolicsUtils 1.6 broke sparsehessian, so I downgraded it to 1.5.
Also, allowed StenoGraphs 0.3.

@alyst alyst force-pushed the refactor_semspec branch from d38081c to 7a33231 Compare May 11, 2024 23:35
Copy link
Collaborator

@Maximilian-Stefan-Ernst Maximilian-Stefan-Ernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally done with the review^^ Very nice changes, the only things to do before merging are

  • resolve my comments
  • fix docs
  • decide on params or parameters

I can have a look at what's preventing the docs from building. As we discussed in the issue, I would prefer parameters over params, and then have parameters() and nparameters() accordingly. @aaronpeikert maybe you can give a preference, and then we decide by majority^^ (Altough there are many occurences, finding and replacing all instances of param with parameter should do the trick.)

@aaronpeikert
Copy link
Collaborator

I prefer parameters. But its more like a 60/40 preference.

@alyst
Copy link
Contributor Author

alyst commented May 17, 2024

@Maximilian-Stefan-Ernst Thank you for the review, I'll address your comments in a next few days.

Regarding the params vs parameters. I prefer params, but I can change it to parameters if the stakeholders decide so :)
Just to summarize my approach:

  • the core idea is to have sort-of "Huffman coding" for the terms in the API: frequently used concepts should use shorter, unambiguous, intuitive terms ("vars", "params", "cov");
    less frequently used terms, or the ones that don't have an intuitive short version, could be longer (objective, gradient, hessian); also object type names could be longer because they are less frequently used in the code.
    I think it makes the code more readable -- code lines are shorter and faster to read; the urge to introduce local variables just for the sake of avoiding using full names (e.g. npar = nparameters(model)) is more subdued :)
  • that is in line with Julia naming conventions (struct, Dict, Ref, Val, eval, undef, Stats, sortperm, cov), there are precedents in the other popular packages, e.g. HTTP.getparam()
  • it has to be consistent with the other terms: if it's decided on parameters, it probably also means switching to variables, covariation etc

@Maximilian-Stefan-Ernst
Copy link
Collaborator

Docs are fixed.

@Maximilian-Stefan-Ernst
Copy link
Collaborator

W.r.t. the params vs parameters debate - let's settle on params then, I am also more like a 60/40 for parameters, but since you have a clear preference for params we can settle on it by weighted majority vote^^.

Fun fact: I tried to see what Julia Base is using; params matches 19 files, parameters 42 - so Base settled on using both^^.

alyst and others added 24 commits May 27, 2024 15:00
for clarity and aligning to Julia naming conventions
Co-authored-by: Maximilian-Stefan-Ernst <[email protected]>
* do tests inside
* use param_values()/lavaan_param_values()
so the downstream code doesn't rely on the order of tuple elements
causes problems with sparsehessian().
It is a temporary fix until the compatibility issues are resolved in Symbolics.jl
Co-authored-by: Maximilian-Stefan-Ernst <[email protected]>
Co-authored-by: Maximilian-Stefan-Ernst <[email protected]>
@alyst alyst force-pushed the refactor_semspec branch from 6f6fb84 to b2012b0 Compare May 27, 2024 22:02
@alyst
Copy link
Contributor Author

alyst commented May 27, 2024

@Maximilian-Stefan-Ernst I've revised your param_indices() addition.
param_indices(semobj) is a useful one-liner method, which I actually wanted to add myself.
However, I removed param_indices(semobj, params) before it becomes available:

  • it is potentially confusing that the return type of methods with the same name is different
  • internally, param_indices(semobj, params) constructs the parameters dictionary at each call, which may get misused in certain contexts for large models (e.g. getting parameter indices in a loop). I think it is just a tiny bit inconvenient if instead the user has to call param_indices(semobj) and then use it as a dictionary, but it is a bit less likely to be misused.

Eventually, SEM specification objects could cache param_indices map, and the removed method could be readded. OTOH I find using param_indices(sem) Dict directly is quite straightforward.

@alyst
Copy link
Contributor Author

alyst commented May 30, 2024

@Maximilian-Stefan-Ernst Let me know if there's anything else to tweak for this PR. Otherwise I am ready to rebase #193 once this one gets merged and extract the next round of changes :)

@Maximilian-Stefan-Ernst Maximilian-Stefan-Ernst merged commit 55f9087 into StructuralEquationModels:devel May 31, 2024
5 checks passed
@Maximilian-Stefan-Ernst
Copy link
Collaborator

Sorry, just too much other stuff to do :)

@alyst alyst deleted the refactor_semspec branch May 31, 2024 10:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants