Chunking is a shallow form of parsing that identifies continuous spans of tokens that form syntactic units such as noun phrases or verb phrases.
Example:
Vinken | , | 61 | years | old |
---|---|---|---|---|
B-NLP | I-NP | I-NP | I-NP | I-NP |
The Penn Treebank is typically used for evaluating chunking. Sections 15-18 are used for training, section 19 for development, and and section 20 for testing. Models are evaluated based on F1.
Model | F1 score | Paper / Source |
---|---|---|
Low supervision (Søgaard and Goldberg, 2016) | 95.57 | Deep multi-task learning with low level tasks supervised at lower layers |
Suzuki and Isozaki (2008) | 95.15 | Semi-Supervised Sequential Labeling and Segmentation using Giga-word Scale Unlabeled Data |