Skip to content

Refine confidence files #880

@mshannon-sil

Description

@mshannon-sil

Currently, when confidences are saved for a book, it produces a [book_id].SFM.confidences.tsv file containing confidences for verses and tokens, a [book_id].SFM.confidences.chapters.tsv file for chapter confidences, and a confidences.books.tsv file for book confidences. There are some improvements that can be made to this system.

  1. Having a single file for both verse level and token level sequences can make it difficult for users to visually parse the sheet if all they need is the verse confidences. Separating the sheets out into separate ...confidences.verses.tsv and ...confidences.tokens.tsv sheets would help.
  2. The confidences sheets currently contain extraneous data like headers or table of contents info. The default should be to only include verses in the confidence files, with an option to include the other info if desired.
  3. When confidences are being generated from .txt files rather than .sfm files, there isn't a vref to use for the confidences.tsv as a label to go along with the verse score. So instead, they're currently labeled using zero-based indexing. However, in editors, usually the first row is row 1, not row 0, so the confidence files should use one-based indexing in this case for easy lookup.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingenhancementNew feature or request

Type

No type

Projects

Status

🏗 In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions