v1.0.0
Release Summary
We retrained Pythia using additional data and now include full support of morphological data 🎉
Our new set of training data consists of:
- 3250 empirical DNA and Protein datasets obtained from TreeBase (same as in version 0.0.1)
- 538 additional empirical DNA and Protein datasets obtained via our RAxML-Grove
- 474 additional morphological datasets obtained from TreeBase
- = 4262 datasets in total
The resulting predictor has about the same accuracy as the previous predictor, with a slight improvement of the mean absolute percentage error:
- Mean absolute error: 0.09
- Mean absolute percentage error: 2.5%
We are now using LightGBM’s boosted trees instead of scikit-learn’s random forest
- Pythia 1.0.0 is backwards compatible to the scikit-learn random forest predictor of Pythia version 0.0.1. This predictor is still available in
predictors/predictor_sklearn_rf_v0.0.1.pckl
Breaking Changes
- The default predictor changed to the new LightGBM predictor (
predictors/predictor_lgb_v1.0.0.pckl). Since this predictor was retrained using additional data, the predictions between previous versions and this version will likely differ. This introduces an additional dependency: LightGBM - Identical sequences in the MSA:
- per default: Pythia refuse to predict the difficulty for MSAs that contain identical sequences
- new
--removeDuplicatesoption: if the MSA contains duplicate sequences Pythia stores a reduced alignment and predicts the difficulty for this reduced alignment
- The exceptions in
msa.pychanged: instead ofValueError, Pythia now raises a customPyPythiaException. - We changed the
DataTypetype definition to an Enum instead of a string, seecustom_types.pyfor more details. - We renamed the
predictor_pathparameter inpredictor.DifficulyPredictortopredictor_handle.
Minor Changes
- Improved logging for command line interface
- new
--quietmode to suppress intermediate information predictor.DifficulyPredictornow accepts a set of features in it's constructor, allowing predictions with experimental difficulty predictors that were trained using a different set of features than our PyPythia
Full Changelog: 0.0.1...1.0.0