Releases · tensordot/syntaxdot

22 Sep 07:52

github-actions

v0.5.0

5539744

Torch 2.0.0, biaffine parser improvements Latest

Latest

Changed

Change parser dependency relation prediction to use a biaffine layer rather than a pairwise biaffine layer. This simplified some code and can be slightly faster.
Normalize distillation hidden layer loss using squared l2 norm.
Update to libtorch 2.0.0 and
tch 0.11.0.
Update to clap 4.
Update to sentencepiece 0.11.
Absorb ohnomore into SyntaxDot.

Fixed

Use the correct ID for unknown pieces in XlmRobertaTokenizer.
Linux AArch64 builds.

Assets 4

16 Aug 07:03

github-actions

0.4.1

a7f5fad

Release 0.4.1

Fixed

Update to rand 0.8 in the syntaxdot crate. This voids a dependency on both
rand 0.7 and 0.8.

Assets 3

15 Aug 09:43

github-actions

0.4.0

586d6e4

Release 0.4.0

Added

Add support for parallelizing annotation at the batch level. SyntaxDot has
so far used PyTorch inter/intraop parallelization. This change adds
support for parallelization at the batch level. Annotation-level
parallelization can be configured with the annotation-threads
command-line option of syntaxdot annotate.
Add ReLU (relu) as an option as the non-linearity in the feed-forward
transformer layers. This is much faster for systems where no vectorized
version of the normal distribution CDF is available (currently Apple M1).
The non-linearity that is used in the biaffine feed-forward layers is
now configurable. For example:
```
[biaffine]
activation = "relu"
```
When this option is absent, the GELU activation (gelu) will be used as
the default.

Changed

The license of SyntaxDot has changed from the Blue Oak Model License 1.0
to the MIT License or Apache License version 2.0 (at your option).
SyntaxDot now uses dynamic batch sizes. Before this change, the batch
size (--batch-size) was specified as the number of sentences per
batch. Since sentences are sorted by length before batching, annotation
is performed on batches with roughly equisized sequences. However,
later batches required more computations per batch due to longer
sequence lengths.

This change replaces the --batch-size option by the --max-batch-pieces
option. This option specifies the number of word/sentence pieces that
a batch should contain. SyntaxDot annotation creates batches that contains
at most that number of pieces. The only exception are single sentences
that are longer than the maximum number of batch pieces.

With this change, annotating each batch is approximately the same amount
of work. This leads to approximately 10% increase in performance.

Since the batch size is not fixed anymore, the readahead (--readahead)
is now specified in number of sentences.
Update to libtorch
1.9.0 and
tch 0.5.0.
Change the default number of inter/intraop threads to 1. Use 4 threads for
annotation-level parallelization. This has shown to be faster for all models,
both on AMD Ryzen and Apple M1.

Assets 3

29 Jun 14:16

danieldk

0.3.1

aefefed

0.3.1

Fixed

Apply biaffine dependency encoding before sequence labeling, so that
the TüBa-D/Z lemma decoder has access to dependency relations.

Assets 2

22 Mar 16:17

github-actions

0.3.0

84c1985

Release 0.3.0

You can also download ready-to-use models.

Added

Support for biaffine dependency parsing (Dozat & Manning, 2016). Biaffine parsing is enabled through the biaffine configuration
option.
Support for pooling the pieces of a token by taking the mean of the pieces. This type of pooling is enabled by setting the model.pooler option to mean. The old behavior of discarding continuation pieces is used when this option is set to discard.
Add the keep-best option to the finetune and distill subcommands. With this option only the parameter files for the N best epochs/steps are retained during distillation.
Support for hidden layer distillation loss. This loss uses the mean squared error of the teacher's hidden layer representations and student representations for faster convergence.

Changed

Update to libtorch 1.8.0 and tch 0.4.0.
Pretrained models are now loaded from the libtorch OutputArchive format, rather than the HDF5 format. This removes HDF5 as a dependency.
Properly prefix embeddings with embeddings rather than encoder in BERT/RoBERTa models. Warning: This breaks compatibility with BERT and RoBERTa models from prior versions of SyntaxDot and sticker2, which should be retrained.
Implementations of Tokenizer are now required to put a piece that marks the beginning of a sentence before the first token piece. BertTokenizer was the only tokenizer that did not fulfill this requirement. BertTokenizer is updated to insert the [CLS] piece as a beginning of sentence marker. Warning: this breaks existing models with tokenizer = "bert", which should be retrained.
Replace calls to the Rust Torch crate (tch) by fallible counterparts, this makes exceptions thrown by Torch far easier to read.
Uses of the eprintln! macro are replaced by logging using log and env_logger. The verbosity of the logs can be controlled with the RUST_LOG environment variable (e.g. RUST_LOG=info).
Replace tfrecord by our own minimalist TensorBoard summary writing, removing 92 dependencies.

Removed

Support for hard loss is removed from the distillation subcommand. Hard loss never worked well compared to soft loss.

Fixed

Fix an off-by-one slicing error in SequenceClassifiers::top_k.

Assets 3

19 Mar 10:00

github-actions

0.3.0-beta.2

47245e5

Release 0.3.0-beta.2

Third beta of 0.3.0.

Assets 3

19 Mar 10:01

github-actions

0.3.0-beta.1

6271b5a

Release 0.3.0-beta.1

Second beta for 0.3.0.

Assets 3

26 Feb 09:32

danieldk

0.2.2

6360fd7

0.2.2

Add keep-best option to the finetune command. With this option only the parameter files for the N best epochs are retained during distillation. The same option for distill is renamed from keep-best-steps to keep-best.

Assets 2

26 Feb 09:31

danieldk

0.2.1

340cafe

0.2.1

Add keep-best-steps option to the distill subcommand.

Assets 2

19 Nov 11:31

danieldk

0.2.0

89f603a

0.2.0

Add the SqueezeBERT model (Iandola et al., 2020). The SqueezeBERT model replaces the matrix multiplications in the self-attention mechanism and feed-forwared layers by grouped convolutions. This results in a smaller number of parameters and better computational performance.
Add the SqueezeAlbert model. This model combines SqueezeBERT (Iandola et al., 2020) and ALBERT (Lan et al., 2020)
distill: add the attention-loss option. Enabling this option adds the mean squared error (MSE) of the teacher and student attentions to the loss. This can speed up convergence, because the student learns to attend to the same pieces as the teacher.

Attention loss can only be computed when the teacher and student have the same sequence lengths. This means practically that they should use the same piece tokenizers.
Switch to the AdamW optimizer provided by libtorch. The tch binding now has support for the AdamW optimizer and for parameter groups. Consequently, we do not need our own AdamW optimizer implementation anymore. Switching to the Torch optimizer also speeds up training a bit.
Move the subword tokenizers into a separate syntaxdot-tokenizers crate.
Update to libtorch 1.7.0.
Remove the server subcommand. The new REST server is a better replacement, which supports proper error handling, etc.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changed

Fixed

Fixed

Added

Changed

Fixed

Added

Changed

Removed

Fixed

Releases: tensordot/syntaxdot

Torch 2.0.0, biaffine parser improvements

Changed

Fixed

Release 0.4.1

Fixed

Release 0.4.0

Added

Changed

0.3.1

Fixed

Release 0.3.0

Added

Changed

Removed

Fixed

Release 0.3.0-beta.2

Release 0.3.0-beta.1

0.2.2

0.2.1

0.2.0