A specialised Python library for Automated Machine Learning (AutoML) of Longitudinal machine learning classification tasks built upon GAMA
-
Bye Bye PDM!: We are now leveraging UV from Astral (alongside Ruff)!
-
Documentation: For a deep dive into
Auto-Sklong
, check out our official docs. -
PyPi: The library's latest version is published on PyPi here.
Auto-Scikit-Longitudinal
, also called Auto-Sklong
is an automated machine learning (AutoML) library designed to analyse
longitudinal data (Classification tasks focussed as of today) using various search methods. Namely,
Bayesian Optimisation
via SMAC3, Asynchronous Successive Halving
,
Evolutionary Algorithms
, and Random Search
via the General Automated Machine Learning Assistant (GAMA).
Auto-Sklong
built upon GAMA
, offers a brand-new search space to tackle the Longitudinal Machine Learning classification problems,
with a user-friendly interface, similar to the popular Scikit
paradigm.
Please for further information, visit the official documentation.
To install Auto-Sklong
, take these two easy steps:
- β
Install the latest version of
Auto-Sklong
:
pip install Auto-Sklong
You could also install different versions of the library by specifying the version number,
e.g. pip install Auto-Sklong==0.0.1
.
Refer to Release Notes
- π¦ [MANDATORY] Update the required dependencies (Why? See here)
Auto-Sklong
incorporates via Sklong
a modified version of Scikit-Learn
called Scikit-Lexicographical-Trees
,
which can be found at this Pypi link.
This revised version guarantees compatibility with the unique features of Scikit-longitudinal
.
Nevertheless, conflicts may occur with other dependencies in Auto-Sklong
that also require Scikit-Learn
.
Follow these steps to prevent any issues when running your project.
π«΅ Simple Setup: Command Line Installation
Say you want to try Auto-Sklong
in a very simple environment. Such as without a proper project.toml
file (Poetry
, PDM
, etc).
Run the following command:
pip uninstall scikit-learn scikit-lexicographical-trees && pip install scikit-lexicographical-trees
π«΅ Project Setup: Using `UV`
Imagine you are managing your project with UV, a powerful and flexible project management tool. Below is an example configuration for integrating UV in your pyproject.toml
file.
To ensure smooth operation and avoid dependency conflicts, you can override specific dependencies like Scikit-Learn
. Add the following configuration to your pyproject.toml
:
[tool.uv]
package = true
override-dependencies = [
"scikit-learn ; sys_platform == 'never'",
]
This setup ensures that UV will manage your projectβs dependencies efficiently, while avoiding conflicts with Scikit-Learn.
π«΅ Project Setup: Using `PDM`
Imagine you have a project being managed by PDM
, or any other package manager. The example below demonstrates PDM
.
Nevertheless, the process is similar for Poetry
.
Therefore, to prevent dependency conflicts, you can exclude Scikit-Learn
by adding the provided configuration
to your pyproject.toml
file.
[tool.pdm.resolution]
excludes = ["scikit-learn"]
This exclusion ensures Scikit-Lexicographical-Trees (used as Scikit-learn
) is used seamlessly within your project.
We enhanced @PGijsbers' open-source GAMA
initiative by introducing a brand-new search space designed specifically for tackling longitudinal classification problems. This search space is powered by our custom library, Scikit-Longitudinal
(Sklong), enabling Combined Algorithm Selection and Hyperparameter Optimization (CASH Optimization).
Unlike GAMA
or other existing AutoML libraries, Auto-Sklong
offers out-of-the-box support for
longitudinal classification tasksβa capability not previously available.
To better understand our proposed search space, refer to the visualisation below (read from left to right, each step being one new component to a final pipeline candidate configuration):
While GAMA
offers some configurability for search spaces, we improved its functionality to better suit our needs. You can find the details of our contributions in the following pull requests:
- ConfigSpace Technology Integration for Enhanced GAMA Configuration and Management π₯
- Search Methods Enhancements to Avoid Duplicate Evaluated Pipelines π₯
- SMAC3 Bayesian Optimisation Integration π
For developers looking to contribute, please refer to the Contributing
section of GAMA
here
and Scikit-Longitudinal
here.
Auto-Sklong
is compatible with the following operating systems:
- MacOS ο£Ώ (Careful, you may need to force your settings to be under intel x86_64 and not apple silicon if you hold an M-based chip)
- Linux π§
- On Windows πͺ, you are recommended to run the library within a Docker container under a Linux distribution.
To perform AutoML on your longitudinal analysis with Auto-Sklong
, use the following two-easy-steps.
-
First, load and prepare your dataset using the
LongitudinalDataset
class ofSklong
. -
Second, use the
GamaLongitudinalClassifier
class ofAuto-Sklong
. Following instantiating it set up itshyperparameters
or let default, you can apply the popular fit, predict, prodict_proba, methods in the same way thatScikit-learn
does, as shown in the example below. It will then automatically search for the best model and hyperparameters for your dataset.
Refer to the documentation for more information on the GamaLongitudinalClassifier
class.
from sklearn.metrics import classification_report
from scikit_longitudinal.data_preparation import LongitudinalDataset
from gama.GamaLongitudinalClassifier import GamaLongitudinalClassifier
# Load your longitudinal dataset
dataset = LongitudinalDataset('./stroke.csv')
dataset.load_data_target_train_test_split(
target_column="class_stroke_wave_4",
)
# Pre-set or manually set your temporal dependencies
dataset.setup_features_group(input_data="elsa")
# Instantiate the AutoML system
automl = GamaLongitudinalClassifier(
features_group=dataset.features_group(),
non_longitudinal_features=dataset.non_longitudinal_features(),
feature_list_names=dataset.data.columns.tolist(),
)
# Run the AutoML system to find the best model and hyperparameters
model.fit(dataset.X_train, dataset.y_train)
# Predictions and prediction probabilities
label_predictions = automl.predict(X_test)
probability_predictions = automl.predict_proba(X_test)
# Classification report
print(classification_report(y_test, label_predictions))
# Export a reproducible script of the champion model
automl.export_script()
Auto-Sklong
paper has been accepted to the International Conference on Bioinformatics and Biomedicine (BIBM) 2024 edition. Awaiting for the proceeding to be released.
In the meantime, for the repository, utilise the button top right corner of the
repository "How to cite?", or open the following citation file: CITATION.cff.