Skip to content

Commit

Permalink
Merge pull request #8 from miRTop/devel
Browse files Browse the repository at this point in the history
version 1.1
  • Loading branch information
lpantano authored Jun 26, 2019
2 parents 11563f8 + 87be0e0 commit e876d20
Show file tree
Hide file tree
Showing 3 changed files with 36 additions and 68 deletions.
6 changes: 6 additions & 0 deletions Changes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# 1.1 October, 24th, 2018

* UID is following exactly MINTplates licenses since this commit: 7f7717d5f23ea638f1a14fccc6386e1dbb8a7e1a in mirtop applied
* iso_5p and iso_3p changed their meaning. Nows the sign means whether the isomiR starts or ends upstream or downstream the refence sequence. This mainly affects iso_5p where the sign will be the opposite than in version 1.0. `-` -> `+` and `+` -> `-`.
* `snp` word was change to `snv` to support any kind of variant
* `iso_add` is renamed to `iso_add3p` and the category `iso_add5p` is added to the list
27 changes: 15 additions & 12 deletions definition.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
**Version**

VERSION: 1.0
VERSION: 1.1

**Description**

Expand All @@ -22,9 +22,11 @@ Please add description for each columns/attribute (R:required, O:optional)
* (R) database: `##source-ontology` using FAIRSharing.org:
* miRBase: (FAIRsharing) doi:10.25504/fairsharing.hmgte8
* mirGeneDB: http://mirgenedb.org
* mirCarta: https://mircarta.cs.uni-saarland.de/
* Custom database: please, provide a link to an archive release if this is the case
* (O) commands used to generate the file. At least information about adapter removal, filtering, aligner, mirna tool. All of them starting like: `## CMD: `. Can be multiple lines starting with this tag.
* (O) genome/database version used (maybe try to get from BAM file if GFF3 generated from it): `## REFERENCE:`
* (R) sample names used in attribute:Expression: `## COLDATA:` separated by spaces
* (R) sample names used in attribute:Expression: `## COLDATA:` separated by comma: `,`.
* (O) Filter tags meaning: See Filter attribute below. Different filter tags should be separated by `,` character. Example: `## FILTER: ` and example would be `## FILTER: PASS(is ok), REJECT(false positive), REJECT lowcount(rejected due to low count in data)`.

## Columns
Expand All @@ -35,25 +37,26 @@ Please add description for each columns/attribute (R:required, O:optional)
* (R) column4/5: start/end: precursor start/end as indicated by alignment tool
* (O) column6: score (Optional): It can be the mapping score or any other score the tool wants to assign to the sequence.
* (R) column7: strand. In the case of mapping against precursor should be always `+`. It should accept mapping against the genome: `+/-` allowed.
* (O) column8: phase: (For features of type "CDS", the phase indicates where the feature begins with reference to the reading frame): Not relevant righ now. This can be: `.`
* (O) column8: phase: (For features of type "CDS", the phase indicates where the feature begins with reference to the reading frame): Not relevant right now. This can be: `.`
* (R) column9: attributes:
* (R) UID: unique ID based on sequence like mintmap has for tRNA: prefix-22-BZBZOS4Y1 (https://github.com/TJU-CMC-Org/MINTmap/tree/master/MINTplates). good way to use it as cross-mapper ID between different naming or future changes. Currently supported by [mirtop](https://github.com/miRTop/mirtop/blob/dev/mirtop/mirna/realign.py#) code.
* (R) UID: unique ID based on sequence like MINTplates has for tRNA: prefix-22-BZBZOS4Y1 (https://github.com/TJU-CMC-Org/MINTmap/tree/master/MINTplates). It is a good way to use it as cross-mapper ID between different naming or future changes. Currently supported by [mirtop](https://github.com/miRTop/mirtop/blob/dev/mirtop/mirna/realign.py#) code.
* (O) Read: read sequence
* (R) Name: mature name
* (R) Parent: hairpin precursor name
* (R) Variant: (categorical types - adapted from isomiR-SEA)
* `iso_5p:+/-N`. `+` indicates extra nucleotides not in the reference miRNA. `-` indicates removed nucleotides not in the sequence. `N` the number of nucleotides of difference. For instance, if the sequence starts 2 nts after the reference miRNA, the label will be: `iso_5p:-2`, but if it starts before, the label will be `iso_5p:+2`.
* `iso_5p:+/-N`. `+` indicates the start is shifted to the right. `-` indicates the start is shifted to the left. `N` the number of nucleotides of difference. For instance, if the sequence starts 2 nts after the reference miRNA, the label will be: `iso_5p:+2`, but if it starts before, the label will be `iso_5p:-2`.
* `iso_3p:+/-N`. Same explanation applied.
* `iso_add:+N`. Same explanation applied.
* `iso_snp_seed`: when affected nucleotides are between [2-7].
* `iso_snp_central_offset`: when affected nucleotides is at position [8].
* `iso_snp_central`: when affected nucleotides are betweem [9-12].
* `iso_snp_central_supp`: when affected nucleotides are betweem [13-17].
* `iso_snp`: anything else.
* `iso_add3p:N`. Number of non-template nucleotides added at 3p.
* `iso_add5p:N`. Number of non-template nucleotides added at 5p.
* `iso_snv_seed`: when affected nucleotides are between [2-7].
* `iso_snv_central_offset`: when affected nucleotides is at position [8].
* `iso_snv_central`: when affected nucleotides are between [9-12].
* `iso_snv_central_supp`: when affected nucleotides are between [13-17].
* `iso_snv`: anything else.
* (O) Changes (optional): similar to previous one but indicating the nucleotides being changed.
* additions are in capital case
* deletions are in lower case
* example: `Changes iso_5p:0,iso_3p:TT,iso_add:GTC` where `Variant iso_add:+3,iso_3p:-2`.
* example: `Changes iso_5p:0,iso_3p:TT,iso_add3p:GTC` where `Variant iso_add3p:3,iso_3p:+2`.
* (R) Cigar: CIGAR string as indicated [here](https://samtools.github.io/hts-specs/SAMv1.pdf). It is the standard CIGAR for aligners. With the restriction that `M` means exact match always. That's a difference with some aligners where `M` includes mismatches. In this case, if there is a mismatch, then it should be output like: `11MA7M` to indicates there is a mismatch at position 12, where `A` is the reference nucleotide.
* (R) Hits: number of hits in the database.
* (O) Alias (Optional): get names from miRBase/miRgeneDB or other database separated by `,`
Expand Down
Loading

0 comments on commit e876d20

Please sign in to comment.