Skip to content

Pytaxon em seu formato de CLI, projeto open source de auxílio à pesquisa para identificação de erros e correção de nomenclatura taxonômica das espécies da biodiversidade

License

Notifications You must be signed in to change notification settings

pytaxon/pytaxon-cli

Repository files navigation

DOI

Pytaxon: A Python software package for the identification and correction of errors in the taxonomic data of biodiversity species

We present pytaxon, a Python software designed to resolve and correct taxonomic names in biodiversity data by leveraging the Global Names Verifier (GNV) API and employing fuzzy matching techniques to suggest corrections for discrepancies and nomenclatural inconsistencies. The pytaxon offers both a Command Line Interface (CLI) and a Graphical User Interface (GUI), ensuring accessibility to users with different levels of computing expertise.


Installation Guide

Dependencies

  • Listed at requirements.txt

Install the package from PyPI:

$ pip install pytaxon

To download the Pytaxon GUI .exe:

Windows Linux
.zip Link Link
.rar Link Link

Workflow

Firstly, you will want to check your spreadsheet for errors, then the program will return you and Excel file (.xlsx) containing all the incorrect data depending on the selected data source.

Then, you may select which data are to be corrected with the "Change" column, after this, you may run the second command to correct automatically the original spreadsheet with the checked spreadsheet.

$ pytaxon -r <column names> -os <path to original spreadsheet> -ss <name of suggestion spreadsheet> -si <source id>

$ pytaxon -os <path to original spreadsheet> -cs <path of checked spreadsheet> -o <name of corrected spreadsheet>

Explore the options for these commands with the --help flag.


Illustrative Examples

CLI

Pytaxon CLI running on the Visual Studio Code terminal (Powershell) with a modified version of the Uropygi dataset

The to correct spreadsheet of the modified Uropygi dataset

Pytaxon GUI application running with a modified version of the Uropygi dataset

Pytaxon's CLI and GUI workflow


Citing

If you use the source code of Pytaxon in any form, please cite the following manuscript (we encorage citing Global Names Resolver as well):

Proença Neto MA, De Sousa MPA (2025) Pytaxon: A Python software for resolving and correcting taxonomic names in biodiversity data. Biodiversity Data Journal 13: e138257. https://doi.org/10.3897/BDJ.13.e138257


Acknowledgements

We thank the following institutions, which contributed to ensuring the success of our work:

Museu Paraense Emílio Goeldi (MPEG)

Centro Universitário do Estado do Pará (CESUPA)


Funding

This research was supported by Centro Universitário do Pará - CESUPA with the PIBICT scientific initiation scholarship project.


Authors

Marco Aurélio Proença Neto

Marcos Paulo Alves de Sousa


Contact

Dr. Marcos Paulo Alves de Sousa (Project leader)

Email: [email protected]

Grupo de Estudos Temático em Computação Aplicada (GET-COM)

Centro Universitário do Pará - CESUPA

Av. Perimetral 1901. CEP 66077- 530. Belém, Pará, Brazil.

About

Pytaxon em seu formato de CLI, projeto open source de auxílio à pesquisa para identificação de erros e correção de nomenclatura taxonômica das espécies da biodiversidade

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages