Skip to content

sjdb_score bonus is added to motif score instead of replacing it on annotated junctions #50

@pinin4fjords

Description

@pinin4fjords

Summary

When stitching an annotated splice junction, rustar adds the motif score AND the sjdb_score bonus to the alignment score. STAR uses one OR the other (replacement, not addition) — sjdb_score for annotated junctions, motif_score for unannotated ones. Net effect: rustar over-credits annotated junctions by motif_score (e.g. +0 for GT/AG → no visible difference, -4 for GC/AG → wrong by 4, -8 for AT/AC or non-canonical → wrong by 8).

Currently invisible because of #27 (annotated lookup always returns false so the additive branch never fires). The moment #27 lands (currently in PR #45), this becomes the dominant remaining AS-divergence source on annotated splices.

Surfaced during the AS-divergence investigation on #33.

Current rustar code

src/align/stitch.rs:1316-1319:

d_score += motif_score;
if is_annotated {
    d_score += scorer.sjdb_score;
}

STAR reference behaviour

source/Transcript_alignScore.cpp (and the matching logic in Stitch.cpp) treats annotated and unannotated junctions as mutually exclusive scoring paths:

if (sjAnnot == 1) {
    score += sjdbScore;   // annotated: replace motif score with sjdb bonus
} else {
    score += motifScore;  // unannotated: motif score only
}

So an annotated GT/AG junction in STAR contributes +sjdbScore (default +2), not motif_score (+0) + sjdbScore (+2) = +2. For motifs where motif_score != 0 this matters:

Motif motif_score (STAR default) rustar additive STAR replacement Per-junction error
GT/AG (canonical) 0 +0 + +2 = +2 +2 0 (invisible)
GC/AG -4 -4 + +2 = -2 +2 4
AT/AC -8 -8 + +2 = -6 +2 8
Non-canonical -8 -8 + +2 = -6 +2 8
(unannotated GT/AG) 0 +0 +0 0

Most annotated junctions in well-curated annotations are canonical GT/AG, so the visible impact is small on typical data. But for any genome with appreciable GC/AG, AT/AC, or rescue-by-novel junctions in the sjdb, rustar's AS is systematically off.

Suggested fix

Switch to replacement semantics matching STAR:

d_score += if is_annotated {
    scorer.sjdb_score
} else {
    motif_score
};

Or equivalently:

if is_annotated {
    d_score += scorer.sjdb_score;
} else {
    d_score += motif_score;
}

4-line patch. Gated on #27 — until that lands, is_annotated is always false and the change has no observable effect (the else branch is identical to today's unconditional d_score += motif_score).

Test plan

  1. Build with Log.final.out always reports Number of splices: Annotated (sjdb) = 0 despite --sjdbGTFfile #27 + this PR applied.
  2. Run on a paired-end input with a known mix of annotated motif types (or use the yeast test profile — most splices are canonical GT/AG, so the per-record delta on this profile is small).
  3. Compare AS values against STAR's on identical-CIGAR records. On annotated GT/AG records the delta should be 0; on annotated GC/AG records the delta should drop from -4 (pre-fix) to 0.

Severity

Medium-low. Score-only divergence on annotated junctions, observable only after #27 lands. Doesn't affect alignment selection (since the relative ordering of candidates is preserved — both annotated branches gain +motif_score - sjdb_score relative to STAR, so the ranking among annotated alternatives is consistent). Affects any downstream tool that consumes the absolute AS value (multi-mapper EM weights, score-threshold filters keyed on STAR's AS distribution).


Filed during nf-core/rnaseq integration testing (nf-core/rnaseq#1855). All sibling issues from this exercise: author:pinin4fjords or grep for nf-core/rnaseq#1855.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions