Skip to content
This repository has been archived by the owner on Jan 21, 2024. It is now read-only.

[WIP] add code & workflow to update metagenome catalog #4

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ctb
Copy link
Contributor

@ctb ctb commented Dec 18, 2022

@ctb
Copy link
Contributor Author

ctb commented Dec 18, 2022

currently breaks on downloading from NCBI -

...
Reusing existing connection to www.ncbi.nlm.nih.gov:443.
HTTP request sent, awaiting response... 400 Bad Request. Both list of IDs and query_key are empty
2022-12-18 06:05:19 ERROR 400: Bad Request. Both list of IDs and query_key are empty.

@luizirber
Copy link
Member

currently breaks on downloading from NCBI -

...
Reusing existing connection to www.ncbi.nlm.nih.gov:443.
HTTP request sent, awaiting response... 400 Bad Request. Both list of IDs and query_key are empty
2022-12-18 06:05:19 ERROR 400: Bad Request. Both list of IDs and query_key are empty.

yes, the SRA discontinued that API (I don't think it was ever public...)

Official method is to use entrez to download it, something like
esearch -db sra -query '"METAGENOMIC"[Source] NOT amplicon[All Fields]' | efetch -format runinfo -mode text > catalog.csv

Main issue is that downloading it all... kind of breaks efetch. I can do the daily/small date ranges download, but for all the matches it always breaks after some time.

@luizirber
Copy link
Member

Might need to use bigquery, but not sure how to automate that outside GCP: https://edwards.flinders.edu.au/identifying-metagenomes-from-the-sra-in-the-cloud/

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants