Releases: webis-de/small-text
v1.0.0
This is the first stable release 🎉! The release mainly consists of code cleanup, documentation, and repository organization.
- Datasets:
SklearnDataset
now checks if the dimensions of features and labels match.
- Query Strategies:
- ExpectedGradientLengthMaxWord: Cleaned up code and added checks to detect invalid configurations.
- Documentation:
- The html documentation uses the full screen width.
- Repository:
- This repository can now be referenced using the respective Zenodo DOI.
v1.0.0b4
This release adds two no query strategies, improves the Dataset
interface, and introduces optional dependencies.
Added
- General:
- We now have a concept for optional dependencies which allows components to rely on soft dependencies, i.e. python dependencies which can be installed on demand (and only when certain functionality is needed).
- Datasets:
- The
Dataset
interface now has aclone()
method that creates an identical copy of the respective dataset.
- The
- Query Strategies:
- New strategies: DiscriminativeActiveLearning and SEALS.
Changed
- Datasets:
- Separated the previous
DatasetView
implementation into interface (DatasetView
) and implementation (SklearnDatasetView
). - Added
clone()
method which creates an identical copy of the dataset.
- Separated the previous
- Query Strategies:
EmbeddingBasedQueryStrategy
now only embeds instances that are either in the label or in the unlabeled pool (and no longer the entire dataset).
- Code examples:
- Code structure was unified.
- Number of iterations can now be passed via an cli argument.
small_text.integrations.pytorch.utils.data
:- Method
get_class_weights()
now scales the resulting multi-class weights so that the smallest class weight is equal to1.0
.
- Method
v1.0.0b3
This release adds a new query strategy, improves the docs, and cleans up the interfaces in preparation of v1.0.0.
Added
- Added new query strategy: ContrastiveActiveLearning.
- Added Reproducibility Notes.
Changed
-
Cleaned up and unified argument naming: The naming of variables related to datasets and
indices has been improved and unified. The naming of datasets had been inconsistent,
and the previousx_
notation for indices was a relict of earlier versions of this library and
did not reflect the underlying object anymore.-
PoolBasedActiveLearner
:- attribute
x_indices_labeled
was renamed toindices_labeled
- attribute
x_indices_ignored
was unified toindices_ignored
- attribute
queried_indices
was unified toindices_queried
- attribute
_x_index_to_position
was named to_index_to_position
- arguments
x_indices_initial
,x_indices_ignored
, andx_indices_validation
were
renamed toindices_initial
,indices_ignored
, andindices_validation
. This affects most
methods of thePoolBasedActiveLearner
.
- attribute
-
QueryStrategy
- old:
query(self, clf, x, x_indices_unlabeled, x_indices_labeled, y, n=10)
- new:
query(self, clf, dataset, indices_unlabeled, indices_labeled, y, n=10)
- old:
-
StoppingCriterion
- old:
stop(self, active_learner=None, predictions=None, proba=None, x_indices_stopping=None)
- new:
stop(self, active_learner=None, predictions=None, proba=None, indices_stopping=None)
- old:
-
-
Renamed environment variable which sets the small-text temp folder from
ALL_TMP
toSMALL_TEXT_TEMP
v1.0.0b2
This release fixes some broken links which were caused due to the recent change in naming the git tags (1.0.0a8 -> v1.0.0b1).
Fixed
- Fix links to the documentation in README.md and notebooks.
v1.0.0b1
First beta release with multi-label functionality and stopping criteria. Added/revised large parts of the documentation.
Added
- Added a changelog.
- All provided classifiers are now capable of multi-label classification.
Changed
- Documentation has been overhauled considerably.
PoolBasedActiveLearner
: Renamedincremental_training
kwarg toreuse_model
.SklearnClassifier
: Changed__init__(clf)
to__init__(model, num_classes, multi_Label=False)
SklearnClassifierFactory
:__init__(clf_template, kwargs={})
to__init__(base_estimator, num_classes, kwargs={})
.- Refactored
KimCNNClassifier
andTransformerBasedClassification
.
Removed
- Removed
device
kwarg fromPytorchDataset.__init__()
,
PytorchTextClassificationDataset.__init__()
andTransformersDataset.__init__()
.