Skip to content

Commit 79d1028

Browse files
authored
Feature/add remove prune (#13)
* add remove and prune. Fix recurse leaves and errors in check * test new functions * fix tests * fix asserts * fix asserts * small fixes
1 parent f932a94 commit 79d1028

File tree

10 files changed

+643
-120
lines changed

10 files changed

+643
-120
lines changed

README.md

Lines changed: 43 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# MultiTax [![Build Status](https://travis-ci.org/pirovc/multitax.svg?branch=main)](https://travis-ci.org/pirovc/multitax) [![codecov](https://codecov.io/gh/pirovc/multitax/branch/main/graph/badge.svg)](https://codecov.io/gh/pirovc/multitax) [![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/multitax/README.html)
1+
# MultiTax [![Build Status](https://travis-ci.com/pirovc/multitax.svg?branch=main)](https://travis-ci.com/pirovc/multitax) [![codecov](https://codecov.io/gh/pirovc/multitax/branch/main/graph/badge.svg)](https://codecov.io/gh/pirovc/multitax) [![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/multitax/README.html)
22

33
Python package to obtain, parse and explore biological taxonomies
44

@@ -12,7 +12,7 @@ MultiTax is a Python package that provides a common and generalized set of funct
1212
- Translate taxonomies (partially implemented)
1313
- Convert taxonomies (not yet implemented)
1414

15-
MultiTax does not link sequence identifiers to taxonomic nodes, it just handles the taxonomy alone. Some kind of integration to work with sequence or external identifiers is planned, but not yet implemented.
15+
MultiTax does not link sequence identifiers to taxonomic nodes, it just handles the taxonomy alone. Some integration to work with sequence or external identifiers is planned, but not yet implemented.
1616

1717
## API Documentation
1818

@@ -84,7 +84,16 @@ tax.parent("g__Escherichia")
8484

8585
# List children nodes
8686
tax.children("g__Escherichia")
87-
# ['s__Escherichia flexneri', 's__Escherichia coli', 's__Escherichia dysenteriae', 's__Escherichia coli_D', 's__Escherichia albertii', 's__Escherichia marmotae', 's__Escherichia coli_C', 's__Escherichia sp005843885', 's__Escherichia sp000208585', 's__Escherichia fergusonii', 's__Escherichia sp001660175', 's__Escherichia sp004211955', 's__Escherichia sp002965065']
87+
# ['s__Escherichia coli',
88+
# 's__Escherichia albertii',
89+
# 's__Escherichia marmotae',
90+
# 's__Escherichia fergusonii',
91+
# 's__Escherichia sp005843885',
92+
# 's__Escherichia ruysiae',
93+
# 's__Escherichia sp001660175',
94+
# 's__Escherichia sp004211955',
95+
# 's__Escherichia sp002965065',
96+
# 's__Escherichia coli_E']
8897

8998
# Get parent node from a defined rank
9099
tax.parent_rank("s__Lentisphaera araneosa", "phylum")
@@ -142,7 +151,7 @@ tax.stats()
142151

143152
```python
144153
# Filter ancestors (desc=True for descendants)
145-
tax.filter(['g__Escherichia', 's__Pseudomonas aeruginosa'])
154+
tax.filter(["g__Escherichia", "s__Pseudomonas aeruginosa"])
146155
tax.stats()
147156
#{'leaves': 2,
148157
# 'names': 11,
@@ -159,14 +168,33 @@ tax.stats()
159168
# 'ranks': 11}
160169
```
161170

171+
### Add, remove, prune
172+
173+
```python
174+
# Add node to the tree
175+
tax.add("my_custom_node", "g__Escherichia", name="my custom name", rank="strain")
176+
tax.lineage("my_custom_node")
177+
# ['1', 'd__Bacteria', 'p__Proteobacteria', 'c__Gammaproteobacteria', 'o__Enterobacterales', 'f__Enterobacteriaceae', 'g__Escherichia', 'my_custom_node']
178+
179+
# Remove node from tree (warning: removing parent nodes may break tree -> use check_consistency)
180+
tax.remove("s__Pseudomonas aeruginosa", check_consistency=True)
181+
182+
# Prune (remove) full branches of the tree under a certain node
183+
tax.prune("g__Escherichia")
184+
```
185+
162186
### Translate
163187

164188
```python
165189
# GTDB to NCBI
166190
from multitax import GtdbTx, NcbiTx
167191
ncbi_tax = NcbiTx()
168192
gtdb_tax = GtdbTx()
193+
194+
# Build translation
169195
gtdb_tax.build_translation(ncbi_tax)
196+
197+
# Check translated nodes
170198
gtdb_tax.translate("g__Escherichia")
171199
# {'1301', '547', '561', '570', '590', '620'}
172200
```
@@ -228,15 +256,16 @@ from multitax import GtdbTx
228256
tax = GtdbTx()
229257

230258
# Build LCA structure
231-
L = LCA(tax._nodes)
259+
lca = LCA(tax._nodes)
232260

233261
# Get LCA
234-
L("s__Escherichia dysenteriae", "s__Pseudomonas aeruginosa")
262+
lca("s__Escherichia dysenteriae", "s__Pseudomonas aeruginosa")
235263
# 'c__Gammaproteobacteria'
236264
```
237265

238266
## Details
239-
267+
268+
- After downloading/parsing the desired taxonomies, MultiTax works fully offline.
240269
- Taxonomies are parsed into `nodes`. Each node is annotated with a `name` and a `rank`.
241270
- Some taxonomies have a numeric taxonomic identifier (e.g. NCBI) and other use the rank + name as an identifier (e.g. GTDB). In MultiTax all identifiers are treated as strings.
242271
- A single root node is defined by default for each taxonomy (or `1` when not defined). This can be changed with `root_node` when loading the taxonomy (as well as annotations `root_parent`, `root_name`, `root_rank`). If the `root_node` already exists, the tree will be filtered.
@@ -269,15 +298,20 @@ Legend:
269298
- GTDB is a subset of the NCBI repository, so the translation from NCBI to GTDB can be only partial
270299
- Translation in both ways is based on: https://data.gtdb.ecogenomic.org/releases/latest/ar53_metadata.tar.gz and https://data.gtdb.ecogenomic.org/releases/latest/bac120_metadata.tar.gz
271300

272-
## Further ideas
301+
---
273302

274-
- Add/remove/update nodes
303+
## Further ideas to be implemented
304+
305+
- More translations
275306
- Conversion between taxonomies (write on specific format)
276307

308+
277309
## Similar projects
278310

279311
- https://github.com/FOI-Bioinformatics/flextaxd
280312
- https://github.com/shenwei356/taxonkit
281313
- https://github.com/bioforensics/pytaxonkit
282314
- https://github.com/chanzuckerberg/taxoniq
283315
- https://github.com/sherrillmix/taxonomizr
316+
- https://github.com/etetoolkit/ete
317+
- https://github.com/apcamargo/taxopy

0 commit comments

Comments
 (0)