-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use CURIEs to link Germline to Repertoire/Rearrangement #553
Comments
That task item is old, it's saying the germline database link is at the rearrangement level, but this is no longer true, it was put in DataProcessing awhile ago. Unless this is referring to something else... |
For me this is addressing something else. The issue that this is capturing is formalizing that we will use CURIEs to refer to Germline objects, either Germline sets or Germline Genes - currently we have no formal indication as to what format fields like germline_set_id and germline_database take on and the formats we do have aren't really "computable". A generic string designation is "unsatisfactory" and CURIEs solve this problem 8-) From #157 (comment) (with some edits) I suggested we could use CURIE nomenclature for Germline IDs as follows:
|
@williamdlees a question regarding connecting OGRDB Germline Sets to the AIRR Schema. I was just looking at OGRDB's germline sets, and was wondering if I understood how things were working. On OGRDB the ID of the Germline Set is referenced in the URL as such: https://ogrdb.airr-community.org/germline_set/3 On the History tab for this germline set it says this is G00003. If I wanted to refer to this Germline Set in the AIRR Schema how would I go about doing that? There seem to be two places where this might occur:
If we used the AIRR CURIE schema, we could have an OGRDB CURIEMap as follows:
If I then set https://ogrdb.airr-community.org/germline_set/3 Now this is different than what you have in the AIRR Spec description, as the description for
What are your thoughts on using the CURIE mechanism above to resolve this field? If you look at the versions tab on OGRDB for this Germline Set it has all the above information:
|
Pasting this here as the mail reply didn't make it in to the thread It’s probably best to get the set from the REST API at https://ogrdb.airr-community.org/api/rather than the UI. Sorry, I could publish this a bit better, I will put some details on the Germline Sets page for a start. The germline set will always have an identifier G followed by a number and the identifier will not change between versions. From the API you’d retrieve the set as, for example, https://ogrdb.airr-community.org/api/germline/set/G00003/1 . It sounds as though this would map quite nicely – maybe OGRDB_GERMLINESET:G00003:1 ?? If that’s ok I can change the examples |
This comment was marked as duplicate.
This comment was marked as duplicate.
@williamdlees has this been resolved? I am triaging AIRR v2.0 issues 8-) |
Currently it doesn't look like |
See my note from 2022(!) in the thread. I am no expert in CURIES but if representing them in the way I suggest is compatible with the way you outline further up the thread, there’s no work involved, it’s very do-able.
|
@williamdlees I am almost 100% sure that CURIEs only have a single IRI tag followed by a single identifier. So something like "OGRDB_GERMLINESET:G00003:1" mapping to "https://ogrdb.airr-community.org/api/germline/set/G00003/1" would not be valid CURIE processing/parsing. Don't get me wrong, the ID "OGRDB_GERMLINESET:G00003:1" is easily parsed as an ID, but it does not fit the CURIE format. If that was a CURIE and the IRI tag "OGRDB_GERMLINESET" was mapped to "https://ogrdb.airr-community.org/api/germline/set/" then this would resolve to: https://ogrdb.airr-community.org/api/germline/set/G00003:1 I think it is fine to have the ID as you have it defined if that fits your needs. It just isn't CURIE parseable, and it can't go into the CURIEMap object in the spec. So we could consider this resolved as is. We have decided that CURIEs don't fit the needs of germline set IDs. Therefore we don't need to change your ID definition and we don't need to update the CURIEMap. I think that is the most simple path forward. This can always be revisited later... |
And this is somewhat of an aside, but as part of the AKC work, the OGRDB API needs to be reviewed and updated to bring it more in compliance as well as add missing functionality. It might be more efficient to do all that together instead of piecemeal. Nevertheless, my opinion is that |
It’s not a question of fitting my needs, the choice of : as a delimiter between the germline set and version was arbitrary. Is there a convention for that delimiter in the curie world? If so I am happy to follow it, otherwise we can just choose something that won’t crash the syntax, maybe . or /. I’m happy to make the change.
|
@williamdlees The relavant documentation can be found here:
In a nutshell: You can have a |
@bussec is that correct? Would not '/' cause problems. CURIEs rely on IRIs and '/' is a special character in IRI space. If you have a '/' in the CURIE reference it would be interpreted as a '/' in the IRI and interpreted as an IRI path, no? Now I suppose if you had https://ogrdb.airr-community.org/api/germline/set/G00003/1 That is what @williamdlees is looking for, and it would work I suppose, but encoding IRI path in the CURIE reference doesn't seem to be how CURIEs were intended to be used??? |
@williamdlees I think the question that I am unclear on is are there two name/ID spaces, each with their own set of identifiers ("germline set" and "version") or can there be one name/ID space ("versioned germline set"). With one name space you could:
Would give you: Or
Which would give you a get query: The query approach might be the better one, as then you can parse the ID in what ever way you want. You could encode whatever you wanted in the ID and the query would parse it and return the correct information for that ID. Note if you needed both API interfaces, you could make it such that: https://ogrdb.airr-community.org/api/germline/set?G00003-1 gave the same information, the first being the one that was used for CURIE resolution. |
Thanks Brian. Sorry for the delay in replying, I have looked at this from time and failed to come to a decision on which approach to take, because it doesn’t make much odds from a coding point of view. I think the one you suggest is probably the cleanest:
OGRDB_GERMLINESET:G00003-1 with "OGRDB_GERMLINESET=https://ogrdb.airr-community.org/api/germline/set?"
Which would give you a get query: https://ogrdb.airr-community.org/api/germline/set?G00003-1
|
@bcorrie After some meditation on the sacred scripture of RFC3987 and the epiphany that their notation is indentation-sensitive, one correction and one comment to my previous statement:
|
@williamdlees I think that would work. I think all we would need to do is update the descriptions for the three instances of |
@williamdlees , can we remove this from the AIRR v2.0 milestone and move it to the AKC milestone? Do we need this functionality in the AIRR schema for v2.0? Do you think we can resolve questions about prefix and target uri? |
I'm happy to work with whatever milestone makes sense to the group. You may be aware that, with substantial input from Scott, we have drafted an implemented a revised API for ogrdb which is openapi3 compatible. Details here: https://ogrdb.airr-community.org/api_v2/swagger/. I'm afraid I don't feel confident to draft a CURIE definition that will pass muster with the group, as the history on this thread shows, but if someone else can propose one that is satisfactory, and complies with our API schema, I am very happy to implement it in OGRDB. |
Should resolve Task 3 in #157
See #157 (comment) for discussion.
The text was updated successfully, but these errors were encountered: