Skip to content

Latest commit

 

History

History
 
 

textcat_architectures

🪐 spaCy Project: Textcat performance benchmarks

Benchmarking different textcat architectures on different datasets.

📋 project.yml

The project.yml defines the data assets required by the project, as well as the available commands and workflows. For details, see the spaCy projects documentation.

⏯ Commands

The following commands are defined by the project. They can be executed using spacy project run [name]. Commands are only re-run if their inputs have changed.

Command Description
install Install dependencies.
data Extract the datasets from their archives.
train Run customized training runs: 3 textcat architectures trained on 2 datasets.
summarize Summarize the results from the runs and print the best & last scores for each run.

⏭ Workflows

The following workflows are defined by the project. They can be executed using spacy project run [name] and will run the specified commands in order. Commands are only re-run if their inputs have changed.

Workflow Steps
all datatrainsummarize

🗂 Assets

The following assets are defined by the project. They can be fetched by running spacy project assets in the project directory.

File Source Description
assets/aclImdb_v1.tar.gz URL Movie Review Dataset by Maas et al., ACL 2011.
assets/dbpedia_csv.tgz URL DBPedia ontology with 14 nonoverlapping classes by Zhang et al., 2015.