Releases: webis-de/small-text
v1.4.1
Bugfix release.
Fixed
- Fixed an out of bounds error that occurred when
DiscriminativeActiveLearning
queries all remaining unlabeled data. - Fixed typos/wording in PoolBasedActiveLearner docstrings.
- Pinned SetFit version in notebook example. (#64)
- Fixed an out of bounds error that could occur in
SetFitClassification
for both 32bit systems and Windows. (#66) - Fixed errors in notebook examples that occurred with more recent seaborn / matplotlib versions.
Changed
- Documentation: added links to bibliography. (#65)
v1.4.0
Fixes SetFit seed control and adds the AnchorSubsampling query strategy.
Added
- New query strategy: AnchorSubsampling.
Fixed
- Changed the way how the seed is controlled in
SetFitClassification
since the seed was fixed unless explicitly set via the respective trainer keyword argument.
Changed
- Documentation: Added a section where compatible transformer models are listed.
- Documentation: Updated showcase section.
v1.3.3
v1.3.2
v1.3.1
v1.3.0
SetFitClassification now also supports dropout sampling (like KimCNNClassifier and TransformerBasedClassification).
Added
- Added dropout sampling to SetFitClassification.
Fixed
- Fixed broken link in README.md.
- Fixed typo in README.md. (#26)
Changed
Stopping Criteria
- The ClassificationChange stopping criterion now supports multi-label classification.
Documentation
- Updated the active learning setup figure.
- The documentation of integrations has been reorganized.
Contributors
v1.2.0
This release adds a SetFit classifier, the BALD query strategy, and two new example notebooks.
Added
Active Learning
- PoolBasedActiveLearner now handles keyword arguments passed to the classifier's
fit()
during theupdate()
step. - New strategy: BALD.
- SubsamplingQueryStrategy now uses the remaining unlabeled pool when more samples are requested than are available.
Classification
- Added new classifier: SetFitClassification which wraps huggingface/setfit.
Examples
- Revised both existing notebook examples.
- Added a notebook example for active learning with SetFit classifiers.
- Added a notebook example for cold start initialization with SetFit classifiers.
Documentation
- A showcase section has been added to the documentation.
Fixed
- Distances in lightweight_coreset were not correctly projected onto the [0, 1] interval (but ranking was unaffected).
Changed
- Coreset implementations now use the distance-based (as opposed to the similarity-based) formulation.
v1.1.1
v1.1.0
This release adds a conda package, more convenient imports, and improves many aspects of the classifcation functionality. Moreover, one new query strategy and three stopping criteria have been added.
Added
General
- Small-Text package is now available via conda-forge.
- Imports have been reorganized. You can import all public classes and methods from the top-level package (
small_text
):from small_text import PoolBasedActiveLearner
Classification
- All classifiers now support weighting of training samples.
- Early stopping has been reworked, improved, and documented (#18).
- Model selection has been reworked and documented.
- [!]
KimCNNClassifier.__init()__
: The default value of the (now deprecated) keyword argumentearly_stopping_acc
has been changed from0.98
to-1
in order to matchTransformerBasedClassification
. - [!] Removed weight renormalization after gradient clipping.
Datasets
- The
target_labels
keyword argument in__init()__
will now raise a warning if not passed. - Added
from_arrays()
toSklearnDataset
,PytorchTextClassificationDataset
, andTransformersDataset
to construct datasets more conveniently.
Query Strategies
- New multi-label strategy: CategoryVectorInconsistencyAndRanking.
Stopping Criteria
- New stopping criteria: ClassificationChange, OverallUncertainty, and MaxIterations.
Deprecated
small_text.integrations.pytorch.utils.misc.default_tensor_type()
is deprecated without replacement (#2).TransformerBasedClassification
andKimCNNClassifier
:
The keyword arguments for early stopping (early_stopping / early_stopping_no_improvement, early_stopping_acc) that are passed to__init__()
are now deprecated. Use theearly_stopping
keyword argument in thefit()
method instead (#18).
Fixed
Classification
KimCNNClassifier.fit()
andTransformerBasedClassification.fit()
now correctly
process thescheduler
keyword argument (#16).
Removed
- Removed the strict check that every target label has to occur in the training data.
(This is intended for multi-label settings with many labels; apart from that it is still recommended to make sure that all labels occur.)
v1.0.1
Minor bug fix release.
Fixed
Links to notebooks and code examples will now always point to the latest release instead of the latest main branch.