Skip to content

An annotated corpus of entities in NYT Restaurant Reviews.

Notifications You must be signed in to change notification settings

aparnadutta/ner-restaurant-reviews

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NER for Restaurant Reviews

Set Up Virtual Environment

$ python3 -m venv ner-reviews
$ source ner-reviews/bin/activate
(ner-reviews) $ pip3 install -r requirements.txt

If you prefer conda:

$ conda create -n ner-reviews python=3.9
$ conda activate ner-reviews
(ner-reviews) $ pip3 install -r requirements.txt

Data

This directory contains the HTML files for New York Times food reviews. It includes url_list.txt listing the corresponding URLs.

This directory contains txt files for a processed dataset of New York Times food reviews. It includes cleaned_reviews.json and edit_cleaned_reviews.json, which contain all of the processed food review data. It includes unprocessed_URLs.txt, listing any unprocessed URLs.

This file contains entity types to import into an annotation tool.

This directory contains the JSON files for the annotated data from the initial small batch annotation.

This directory contains the JSON files for the annotated data, along with corresponding files that were corrected to remove encoding issues.

This file contains all annotations from all annotators in CONLL format.

This file contains adjudicated annotations in CONLL format.

This file contains counts of entities in adjudicated_annotations.txt.

This directory contains the train, dev, and test sets for the adjudicated annotations.

Scripts

This script downloads food reviews from New York Times and stores them in html_reviews.

This script processes and cleans html_reviews, and outputs raw_data.

This script prepares raw_data for annotation.

This script fixes encoding issues in annotated_data.

This script accepts annotated_data and processes the data to output all_annotations.txt.

This script includes additional functions to support data_process_pipeline.py.

This script calculates inter-annotator agreement with the processed data in all_annotations.txt.

This script runs adjudication to produce gold standard annotations and outputs adjudicated_annotations.txt.

This script accepts adjudicated_annotations.txt and splits the data into train, dev, and test sets.

This script runs the model and experiments using final_data_splits.

About

An annotated corpus of entities in NYT Restaurant Reviews.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages