Skip to content

This is a companion repository to seq2rel (https://github.com/JohnGiorgi/seq2rel) which aims to make it easy to generate training data.

Notifications You must be signed in to change notification settings

JohnGiorgi/seq2rel-ds

Repository files navigation

seq2rel: Datasets

ci codecov Checked with mypy GitHub

This is a companion repository to seq2rel, which makes it easy to preprocess training data.

Installation

This repository requires Python 3.8 or later.

Setting up a virtual environment

Before installing, you should create and activate a Python virtual environment. If you need pointers on setting up a virtual environment, please see the AllenNLP install instructions.

Installing the library and dependencies

If you do not plan on modifying the source code, install from git using pip

pip install git+https://github.com/JohnGiorgi/seq2rel-ds.git

Otherwise, clone the repository and install from source using Poetry:

# Install poetry for your system: https://python-poetry.org/docs/#installation
curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python

# Clone and move into the repo
git clone https://github.com/JohnGiorgi/seq2rel-ds
cd seq2rel-ds

# Install the package with poetry
poetry install

Usage

Installing this package gives you access to a simple command-line tool, seq2rel-ds. To see the list of available commands, run:

seq2rel-ds --help

Note, you can also call the underlying python files directly, e.g. python path/to/seq2rel_ds/main.py --help.

To preprocess a dataset (and in most cases, download it), call one of the commands, e.g.

seq2rel-ds cdr main "path/to/cdr"

Note, you have to include main because typer does not support default commands.

This will create the preprocessed tsv files under the specified output directory, e.g.

cdr
 ┣ train.tsv
 ┣ valid.tsv
 ┗ test.tsv

which can then be used to train a seq2rel model.

Releases

No releases published

Packages

No packages published

Languages