-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Saving the "executable" not just the mapping #30
Comments
I think I'm starting to get tangled in what all we want out of different sorts of reproducibility, and what that implies for what we want to do here. This is me thinking out loud about it. I think we should establish a divide between exact reproducibility and portability.
I think there are actually two layers to portability.
1: sequence-level portabilityNote that the The only way to ensure sequence-level portability starts by taking the cladetime approach and re-running the sequences through the appropriately versioned (i.e., identical) lineage assignment tool. This means that future evolution is not a "problem," and we only have to ensure that the same set of all possible input taxa on that day (tip taxa that were assignable) end up in the same aggregated taxa. 2: taxon-level portabilityThis second question is within cladecombiner's scope. The mapping will change if the tree changes, or if some part of the descision-making process changes. The relationships between taxa within an alias are fixed. The tree for Changes to cladecombiner source code could result in either changes to the tree (via bugs or by how recombinants are handled, possibly other things I'm not currently seeing) or by how it makes mapping decisions given a tree. So tracking versioning information of cladecombiner itself is also going to be important. 3: approximate sequence-level portabilityVia something like #9, we could approximate sequence-level portability. That is, if on future date F we call lineage
There are a number of assumptions being made here, which I'm not currently crystal clear on, but which we would need to spell out if trying this. Taxon recognitionSo far we have ignored the issue of whether a particular taxon is recognized, i.e., in https://github.com/cov-lineages/pango-designation/blob/master/lineages.csv. That is, a taxon considered valid and which could show up in data today might not be considered valid and show up in data next month. This is not an issue for true sequence-level portability (the taxon would be valid under the correct assigner) or taxon-level portability (it's out of scope) but would be for approximate sequence-level portability. I'm not sure there's anything to do about it but if we pursue this we could perhaps warn users (check for demotions). |
To make sure I'm following, the idea here is to be able to say what modeling unit you would have mapped a particular taxon to, if that taxon was not present when you called cladecombiner in the past? |
Originally posted by @swo in #24 (comment)
The text was updated successfully, but these errors were encountered: