Benchmarking different textcat architectures on different datasets.
The project.yml
defines the data assets required by the
project, as well as the available commands and workflows. For details, see the
spaCy projects documentation.
The following commands are defined by the project. They
can be executed using spacy project run [name]
.
Commands are only re-run if their inputs have changed.
Command | Description |
---|---|
install |
Install dependencies. |
data |
Extract the datasets from their archives. |
train |
Run customized training runs: 3 textcat architectures trained on 2 datasets. |
summarize |
Summarize the results from the runs and print the best & last scores for each run. |
The following workflows are defined by the project. They
can be executed using spacy project run [name]
and will run the specified commands in order. Commands are only re-run if their
inputs have changed.
Workflow | Steps |
---|---|
all |
data → train → summarize |
The following assets are defined by the project. They can
be fetched by running spacy project assets
in the project directory.
File | Source | Description |
---|---|---|
assets/aclImdb_v1.tar.gz |
URL | Movie Review Dataset by Maas et al., ACL 2011. |
assets/dbpedia_csv.tgz |
URL | DBPedia ontology with 14 nonoverlapping classes by Zhang et al., 2015. |