Version 5.6
This is an important release that has a critical bugfix as well as useful improvements.
Bugfixes
- Fixed critical bug in computation of standardized mean differences. The denominator for SMDs should be using population standard deviations, not the ones computed over the subgroups themselves.
- Added converters to the notebook header to allow correct treatment of candidate IDs with leading zeros.
- Modified the test utility functions to catch discrepancies caused by missing leading zero.
Improvements
- The tables generated by
rsmsummarize
are now saved in the same way as for other tools. rsmsummarize
now shows a table with standardized coefficients for all models.- The predictions for the post-processed training set are now also saved.
- Added a new notebook that shows differential feature functioning (DFF) plots by subgroup. To use it, add
dff_by_group
to thegeneral_section
configuration option. Read more here. - The features that have not been used in the model are now excluded from the datasets before they are sent to SKLL for prediction. This makes the prediction step much faster for large datasets.
- When testing whether the feature std. dev. in the training set is zero, we currently set tolerance to 1e-06. This is not sufficient with features with very low values (these can result from an inverse transform of acoustic likelihoods which are logs of very small values). This tolerance is now increased to 1e-07.
Other Minor Changes
- Update the utility script
update_skll_model.py
to allow it to be used with other tools. - Update tests and documentation.