Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create service info endpoint #289

Open
6 tasks
korikuzma opened this issue Feb 13, 2024 · 6 comments
Open
6 tasks

Create service info endpoint #289

korikuzma opened this issue Feb 13, 2024 · 6 comments
Labels
enhancement New feature or request priority:medium Medium priority

Comments

@korikuzma
Copy link
Member

korikuzma commented Feb 13, 2024

Conform to https://www.ga4gh.org/product/service-info/

Info to provide:

  • VRS schema version
  • UTA version
  • SeqRepo version
  • URL/identifier for CDM
  • more?

Other things

  • Write GitHub action to test endpoint adherence to schema
@korikuzma korikuzma added enhancement New feature or request priority:medium Medium priority labels Feb 13, 2024
Copy link

This issue is stale because it has been open 45 days with no activity. Please make a comment for triaging or closing the issue.

@github-actions github-actions bot added the stale label May 11, 2024
@korikuzma korikuzma removed the stale label May 13, 2024
@korikuzma
Copy link
Member Author

@bwalsh suggested also including URLs to the public s3 CDMs used

@jsstevenson
Copy link
Member

jsstevenson commented May 14, 2024

I think this is a deceptively complicated question. If we're just talking about being able to fast-forward a local database to match a snapshot given on the cloud instance, I think the code is ~there already -- we should be thinking about how to automate data updates/dumps in the cloud, of course, but the basic CLI command exists. There is a complicating factor that search outcomes are highly dependent on software and data versioning of each of the normalizers, though. Reproducing the result of a search isn't just a matter of matching your MetaKB graph to the remote instance, you also need to match a smattering of library versions, 3 dynamo DB tables, a SeqRepo snapshot, and a UTA database version (plus some coolseqtool stuff), depending on the kind of question that you ask.

IMO, a good starting point would be to implement the GA4GH service info spec, and include a bunch of extra fields that display as much data and software versioning as we can. Like, a totally excessive amount of information.

In the long term, we could work towards a set of features along the lines of

  • a node or neighborhood on the graph containing some metadata about how the rest of the graph was produced
  • an auto-populated S3 bucket containing regular data snapshots
  • optionally, an API endpoint that lists the bucket's content (to make it browseable)
  • a CLI method that takes a given date and populates the local DB with the snapshot for that date, if it exists.

^ That doesn't solve the issue of normalizer snapshots/reproduceability, but maybe we could tackle them the same way (publish regular snapshots for each and write methods to copy them to a local DB) and then incorporate that into the metakb update-from-snapshot procedure.

@korikuzma
Copy link
Member Author

korikuzma commented May 14, 2024

@jsstevenson I think automation/reproducibility should be a separate issue, maybe even epic. IMO this issue more about having an /info or /metadata endpoint to list versions of UTA/SeqRepo/CDM/VRS/VA/CatVrs etc.

@jsstevenson
Copy link
Member

jsstevenson commented May 14, 2024

Yeah, I think the spec is /service-info, but is data versioning informative if it's piecemeal and incomplete? Schema versioning definitely makes sense for interoperability reasons. But do we have good reasons for giving partial data versioning? Is it also helpful for interoperability?

@korikuzma
Copy link
Member Author

@jsstevenson IMO any metadata we can expose is helpful. We already have a lot of it -- UTA, SeqRepo, dates we pulled CIViC/MOA, GKS schema versions. Is there anything else major I'm missing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request priority:medium Medium priority
Projects
None yet
Development

No branches or pull requests

2 participants