I2B2 2012 Preprocessing

This repo contains code for convert tar.gz file (downloaded from n2c2 data portal) to labels_SPLIT.txt and text_SPLIT.txt, where SPLIT is in [train, dev, test]. This data format is compatible for NeMo TokenClassification Model.

The exact steps of conversion is as follows:

Convert .xml file to brat format
Convert brat to bio/iob2 format
Convert bio to nemo-comptabile format

Usage

python i2b2_2012_preprocessing.py

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
dataset_small		dataset_small
.gitignore		.gitignore
README.md		README.md
bio2nemo.py		bio2nemo.py
i2b2_2012_datasets.json		i2b2_2012_datasets.json
i2b2_2012_preprocessing.ipynb		i2b2_2012_preprocessing.ipynb
i2b2_2012_preprocessing.py		i2b2_2012_preprocessing.py
uf_LICENSE		uf_LICENSE
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

I2B2 2012 Preprocessing

About

Releases

Packages

Languages

License

nyuolab/i2b2_2012_preprocessing

Folders and files

Latest commit

History

Repository files navigation

I2B2 2012 Preprocessing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages