Skip to content

Commit

Permalink
docs: polish toml/README
Browse files Browse the repository at this point in the history
  • Loading branch information
dzhang32 committed Jun 8, 2024
1 parent e0c5b65 commit 2985c0c
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 13 deletions.
3 changes: 3 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@ name = "tuni"
version = "0.1.0"
edition = "2021"
license = "MIT"
readme = "README.md"
repository = "https://github.com/dzhang32/tuni"
keywords = ["gtf", "gff", "transcript-assembly"]

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

Expand Down
19 changes: 6 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# tuni

The goal of `tuni` is to generate unified IDs for identical transcripts called across different samples.
The goal of `tuni` is to unify transcripts across different samples.

## Background
## Overview

Transcript assembly tools can generate arbitary transcript IDs that differ between identical transcripts across samples.
Transcript assembly tools can generate arbitary transcript IDs, which may lead to the same transcript being labelled with a different ID across samples.

For instance, given two samples, `sample_1.gtf` and `sample_2.gtf`:
For example, given two samples `sample_1.gtf` and `sample_2.gtf`:

**sample_1.gtf**

Expand All @@ -28,7 +28,7 @@ chr1 test exon 50 100 . + . transcript_id "B";

The transcript displayed above is identical between the two samples, however the provided `transcript_id` is different for each sample, "A" vs "B".

Given a list of `.gtf`/`.gff` files, `tuni` outputs a `tuni_id` that is unified for identical transcripts across different samples.
`tuni` generates a `.tuni.gtf`/`.tuni.gff` for each input `.gtf`/`.gff`. These output files will contain an additional attribute field `tuni_id` which contains a unified ID that will be same for identical transcripts across different samples.

**sample_1.tuni.gtf**

Expand All @@ -54,15 +54,8 @@ TODO: upload `tuni` to crates.io.

## Usage

`tuni` expects as input:

1. A `.txt` file that contains the paths to each input `.gtf` or `.gff` detailing transcripts to be unified. Currently, only [version 2](https://www.ensembl.org/info/website/upload/gff.html) `.gff` files are accepted.
2. A path to the output directory.

Executing `tuni`:

```bash
tuni -gtf-gff-path /path/to/gtf_paths.txt -output-dir /path/to/output/directory/
```

In the output directory, `tuni` will create a `.tuni.gtf`/`.tuni.gff` for each input `.gtf`/`.gff`. These `.tuni.*` output files will contain an additional attribute field `tuni_id` which contains unified ID that will be same for identical transcripts across different samples.
*Note: currently, only [version 2](https://www.ensembl.org/info/website/upload/gff.html) `.gff` files are accepted by `tuni`.*

0 comments on commit 2985c0c

Please sign in to comment.