Skip to content

Juliane2210/Corpus_Analysis_and_Sentence_Embeddings

Repository files navigation

Corpus_Analysis_and_Sentence_Embeddings

The data for part 1 can be found here:

Use the Atticus dataset of legal contacts: https://zenodo.org/record/4595826#.YyXT6HbMI2w

Download the file CUAD_v1.zip, unzip, and see the folder full_contact_txt/

It contains 510 files with full text contracts (a collection of TXT files of the underlying contracts). Each file is named as “[document name].txt”. These contracts are in a plaintext format and are not labeled. You will need to concatenate all the text files to form a corpus.

The data for part 2 is in a zipped file, and can be found here:

Use the dataset from the Semeval 2016-Task1 Semantic Textual Similarity (STS).

Use the test data STS Core (English Monolingual subtask) - test data with gold labels. Do not use the training data. Read more about the task at https://alt.qcri.org/semeval2016/task1/#

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages