Merge pull request #8 from miRTop/devel

version 1.1
miRTop · Jun 26, 2019 · e876d20 · e876d20
2 parents 11563f8 + 87be0e0
commit e876d20
Show file tree

Hide file tree

Showing 3 changed files with 36 additions and 68 deletions.
diff --git a/Changes.md b/Changes.md
@@ -0,0 +1,6 @@
+# 1.1 October, 24th, 2018
+
+* UID is following exactly MINTplates licenses since this commit: 7f7717d5f23ea638f1a14fccc6386e1dbb8a7e1a in mirtop applied
+* iso_5p and iso_3p changed their meaning. Nows the sign means whether the isomiR starts or ends upstream or downstream the refence sequence. This mainly affects iso_5p where the sign will be the opposite than in version 1.0. `-` -> `+` and `+` -> `-`.
+* `snp` word was change to `snv` to support any kind of variant
+* `iso_add` is renamed to `iso_add3p` and the category `iso_add5p` is added to the list
diff --git a/definition.md b/definition.md
@@ -1,6 +1,6 @@
 **Version**
 
-VERSION: 1.0
+VERSION: 1.1
 
 **Description**
 
@@ -22,9 +22,11 @@ Please add description for each columns/attribute (R:required, O:optional)
   * (R) database: `##source-ontology` using FAIRSharing.org:
     * miRBase: (FAIRsharing) doi:10.25504/fairsharing.hmgte8
     * mirGeneDB: http://mirgenedb.org
+    * mirCarta: https://mircarta.cs.uni-saarland.de/
+    * Custom database: please, provide a link to an archive release if this is the case
   * (O) commands used to generate the file. At least information about adapter removal, filtering, aligner, mirna tool. All of them starting like: `## CMD: `. Can be multiple lines starting with this tag.
   *  (O) genome/database version used (maybe try to get from BAM file if GFF3 generated from it): `## REFERENCE:`
-  * (R) sample names used in attribute:Expression: `## COLDATA:` separated by spaces
+  * (R) sample names used in attribute:Expression: `## COLDATA:` separated by comma: `,`.
   * (O) Filter tags meaning: See Filter attribute below. Different filter tags should be separated by `,` character. Example: `## FILTER: ` and example would be `## FILTER: PASS(is ok), REJECT(false positive), REJECT lowcount(rejected due to low count in data)`.
 
 ## Columns
@@ -35,25 +37,26 @@ Please add description for each columns/attribute (R:required, O:optional)
 * (R) column4/5: start/end: precursor start/end as indicated by alignment tool
 * (O) column6: score (Optional): It can be the mapping score or any other score the tool wants to assign to the sequence.
 * (R) column7: strand. In the case of mapping against precursor should be always `+`. It should accept mapping against the genome: `+/-` allowed.
-* (O) column8: phase: (For features of type "CDS", the phase indicates where the feature begins with reference to the reading frame): Not relevant righ now. This can be: `.`
+* (O) column8: phase: (For features of type "CDS", the phase indicates where the feature begins with reference to the reading frame): Not relevant right now. This can be: `.`
 * (R) column9: attributes:
-  * (R) UID: unique ID based on sequence like mintmap has for tRNA: prefix-22-BZBZOS4Y1 (https://github.com/TJU-CMC-Org/MINTmap/tree/master/MINTplates). good way to use it as cross-mapper ID between different naming or future changes. Currently supported by [mirtop](https://github.com/miRTop/mirtop/blob/dev/mirtop/mirna/realign.py#) code.
+  * (R) UID: unique ID based on sequence like MINTplates has for tRNA: prefix-22-BZBZOS4Y1 (https://github.com/TJU-CMC-Org/MINTmap/tree/master/MINTplates). It is a good way to use it as cross-mapper ID between different naming or future changes. Currently supported by [mirtop](https://github.com/miRTop/mirtop/blob/dev/mirtop/mirna/realign.py#) code.
   * (O) Read: read sequence
   * (R) Name: mature name
   * (R) Parent: hairpin precursor name
   * (R) Variant: (categorical types - adapted from isomiR-SEA)
-    * `iso_5p:+/-N`. `+` indicates extra nucleotides not in the reference miRNA. `-` indicates removed nucleotides not in the sequence. `N` the number of nucleotides of difference. For instance, if the sequence starts 2 nts after the reference miRNA, the label will be: `iso_5p:-2`, but if it starts before, the label will be `iso_5p:+2`.
+    * `iso_5p:+/-N`. `+` indicates the start is shifted to the right. `-` indicates the start is shifted to the left. `N` the number of nucleotides of difference. For instance, if the sequence starts 2 nts after the reference miRNA, the label will be: `iso_5p:+2`, but if it starts before, the label will be `iso_5p:-2`.
     * `iso_3p:+/-N`. Same explanation applied.
-    * `iso_add:+N`. Same explanation applied.
-    * `iso_snp_seed`: when affected nucleotides are between [2-7].
-    * `iso_snp_central_offset`: when affected nucleotides is at position [8].
-    * `iso_snp_central`: when affected nucleotides are betweem [9-12].
-    * `iso_snp_central_supp`: when affected nucleotides are betweem [13-17].
-    * `iso_snp`: anything else.
+    * `iso_add3p:N`. Number of non-template nucleotides added at 3p.
+    * `iso_add5p:N`. Number of non-template nucleotides added at 5p.
+    * `iso_snv_seed`: when affected nucleotides are between [2-7].
+    * `iso_snv_central_offset`: when affected nucleotides is at position [8].
+    * `iso_snv_central`: when affected nucleotides are between [9-12].
+    * `iso_snv_central_supp`: when affected nucleotides are between [13-17].
+    * `iso_snv`: anything else.
   * (O) Changes (optional): similar to previous one but indicating the nucleotides being changed.
     * additions are in capital case
     * deletions are in lower case
-    * example: `Changes iso_5p:0,iso_3p:TT,iso_add:GTC` where `Variant iso_add:+3,iso_3p:-2`.
+    * example: `Changes iso_5p:0,iso_3p:TT,iso_add3p:GTC` where `Variant iso_add3p:3,iso_3p:+2`.
   * (R) Cigar: CIGAR string as indicated [here](https://samtools.github.io/hts-specs/SAMv1.pdf). It is the standard CIGAR for aligners. With the restriction that `M` means exact match always. That's a difference with some aligners where `M` includes mismatches. In this case, if there is a mismatch, then it should be output like: `11MA7M` to indicates there is a mismatch at position 12, where `A` is the reference nucleotide.
   * (R) Hits: number of hits in the database.
   * (O) Alias (Optional): get names from miRBase/miRgeneDB or other database separated by `,`