Skip to content

v1.0.0

Choose a tag to compare

@tschuelia tschuelia released this 10 Oct 15:02
· 72 commits to master since this release

Release Summary

We retrained Pythia using additional data and now include full support of morphological data 🎉
Our new set of training data consists of:

  • 3250 empirical DNA and Protein datasets obtained from TreeBase (same as in version 0.0.1)
  • 538 additional empirical DNA and Protein datasets obtained via our RAxML-Grove
  • 474 additional morphological datasets obtained from TreeBase
  • = 4262 datasets in total

The resulting predictor has about the same accuracy as the previous predictor, with a slight improvement of the mean absolute percentage error:

  • Mean absolute error: 0.09
  • Mean absolute percentage error: 2.5%

We are now using LightGBM’s boosted trees instead of scikit-learn’s random forest

  • Pythia 1.0.0 is backwards compatible to the scikit-learn random forest predictor of Pythia version 0.0.1. This predictor is still available in predictors/predictor_sklearn_rf_v0.0.1.pckl

Breaking Changes

  • The default predictor changed to the new LightGBM predictor ( predictors/predictor_lgb_v1.0.0.pckl). Since this predictor was retrained using additional data, the predictions between previous versions and this version will likely differ. This introduces an additional dependency: LightGBM
  • Identical sequences in the MSA:
    • per default: Pythia refuse to predict the difficulty for MSAs that contain identical sequences
    • new --removeDuplicates option: if the MSA contains duplicate sequences Pythia stores a reduced alignment and predicts the difficulty for this reduced alignment
  • The exceptions in msa.py changed: instead of ValueError, Pythia now raises a custom PyPythiaException.
  • We changed the DataType type definition to an Enum instead of a string, see custom_types.py for more details.
  • We renamed the predictor_path parameter in predictor.DifficulyPredictor to predictor_handle.

Minor Changes

  • Improved logging for command line interface
  • new --quiet mode to suppress intermediate information
  • predictor.DifficulyPredictor now accepts a set of features in it's constructor, allowing predictions with experimental difficulty predictors that were trained using a different set of features than our PyPythia

Full Changelog: 0.0.1...1.0.0