Skip to content

Latest commit

 

History

History
258 lines (183 loc) · 11.3 KB

dev_guide.md

File metadata and controls

258 lines (183 loc) · 11.3 KB

Developer's Guide

If you are looking to contribute to TemporAI, this guide is a good starting point.

Contributing

✅ Please read the Contributing Guide first.

Project layout

Project root

.
├── .github/: GitHub workflows (CI/CD)
├── docs/: Sphinx documentation content
├── src/: Source code
├── tests/: Tests (pytest)
├── tutorials/: Jupyter notebook tutorials
└── <root files>: Configuration files, README, and LICENSE

Source code

The source code is located under src/tempor, the code organization is shown below.

.
└── tempor/
    ├── automl: AutoML tools
    ├── benchmarks: Benchmarking tools
    ├── config: Library config
    ├── core: Core code, such as global utils
    ├── data: Data format
    ├── datasources: Data sources for provided datasets
    ├── exc: Exceptions
    ├── log: Logger
    ├── models: Model components
    ├── plugins: Plugins/
    │   ├── core: The core plugin mechanism, base classes
    │   ├── pipeline: Pipeline mechanism
    │   ├── prediction: Prediction plugins
    │   ├── preprocessing: Preprocessing plugins
    │   ├── time_to_event: Time-to-event plugins
    │   └── treatments: Treatment effects plugins
    └── utils: Utilities/
        └── serialization: Serialization tools

Tests

Tests located in the tests directory follow the directory structure of the source code.

Config files

The project contains the following config files:

  • .coveragerc: coverage config.
  • .isort.cfg: isort config.
  • .pre-commit-config.yaml: pre-commit config.
  • .readthedocs.yml: ReadTheDocs config.
  • codecov.yml: codecov config.
  • mypy.ini: mypy config.
  • pyproject.toml: miscellaneous setup and tools configuration in the pyproject.toml form.
  • pytest.ini: pytest config.
  • setup.cfg: miscellaneous setup and tools configuration in the setuptools setup.cfg config.
  • tox.ini: tox config.

Data format

Please familiarize yourself with the data format we use.

  1. Go through the data tutorials.
  2. See the following parts of the full API (module) reference:
    1. Data samples reference,
    2. Dataset reference,
    3. Dataloader reference.

User guide

Usage

It is recommended to go through the usage tutorials before contributing.

Extending

🔥 The extending guide provides a great starring point for developing new methods.

Base Classes

The sklearn-inspired fit/transform/predict API is used for the methods. This section briefly describes the base classes used to achieve this. Full details are available in the API documentation linked for each class.

:::{caution} TemporAI is in alpha, and the details of the API may change. :::

Core base classes

BaseEstimator

All method plugins derive from BaseEstimator:

BaseEstimator_class_diagram{w=500px}

Methods worth noting are: fit (fits model), hyperparameter_space (abstract method, returns the default hyperparameter space), sample_hyperparameters (implements sampling of said hyperparameters). The _fit method is the abstract method that the concrete plugins should implement (it will be called by fit).

Note also that the BaseEstimator inherits from the Plugin interface, which facilitates loading of the methods by the PluginLoader.

ParamsDefinition attribute facilitates accessing the class parameters using self.params in the derived classes.

BasePredictor

All predictive method plugins derive from BasePredictor:

BasePredictor_class_diagram{w=550px}

This base class adds prediction-specific methods (analogously to _fit/fit):

  • predict (public API) and _predict (for concrete implementation by each plugin): return predicted values.
  • predict_proba and _predict_proba: return predicted probabilities, classification setting only.
  • predict_counterfactuals and _predict_counterfactuals: return counterfactual predictions, treatment effects setting only.

Some of these methods are defined further in task-specific base classes.

BaseTransformer

All preprocessing data transformation method plugins derive from BaseTransformer:

BaseTransformer_class_diagram{w=550px}

This base class provides the transform (public API) and _transform (for concrete implementation by each plugin) methods that take in, and return a transformed version of, a Dataset.

Task-specific base classes

BaseScaler, BaseImputer

These are the base classes for imputation ("preprocessing.imputation") and scaling imputation ("preprocessing.scaling"). These currently do not differ from BaseTransformer.

BaseTimeToEventAnalysis

BaseTimeToEventAnalysis is the base class for time-to-event (survival) analysis. Note that the predict method requires a horizons argument to specify time points for risk estimation.

BaseTimeToEventAnalysis_class_diagram{w=750px}

This class expects TimeToEventAnalysisDataset.

BaseOneOffTreatmentEffects

BaseOneOffTreatmentEffects is the base class for one-off treatment effects plugins. predict_counterfactuals returns StaticSamples.

BaseOneOffTreatmentEffects_class_diagram{w=600px}

This class expects OneOffTreatmentEffectsDataset.

BaseTemporalTreatmentEffects

BaseTemporalTreatmentEffects is the base class for temporal treatment effects plugins. predict_counterfactuals returns TimeSeriesSamples.

BaseTemporalTreatmentEffects_class_diagram{w=600px}

This class expects BaseTemporalTreatmentEffectsDataset.

BaseOneOff{Classifier,Regressor}

These are the base classes for the one-off prediction setting, where the targets are static values.

These classes expects OneOffPredictionDataset.

BaseTemporal{Classifier,Regressor}

These are the base classes for the temporal prediction setting, where the targets are time series.

These classes expects TemporalPredictionDataset.

Pipelines

The Pipeline functionality is provided by the tempor.methods.pipeline module, see especially:

  • PipelineBase base class (defines pipeline interface),
  • PipelineMeta metaclass (dynamically generates pipelines),
  • pipeline function (creates pipeline from its definition).

Models

The tempor.models namespace contains various underlying model components (currently mostly torch modules), such as MLP, general-purpose TimeSeriesModel, NeuralODE, etc. Feel free to use or build upon these in your methods.

Data Formats

For implementing custom data formats (experimental), se the "Custom Data Format" tutorial in the extending guide.