Linalgo is a Python module to help Machine Learning team create and curate datasets for Natural Language Processing. It tries to follow the W3C Web Annotation Data Model and to provides a powerful system to add metadata to most commonly used text and image formats: TXT, PDF, HTML, etc.
The documentation is available at https://linalgo.github.io/linalgo
pip install linalgo
pytest
By default, linalgo stores annotations on a dedicated hub at https://hub.linalgo.com. There are also connectors to retrieve data from Google BigQuery.