metaquest is a command-line tool designed to help users search through all SRA datasets to find containment of specified genomes. By analyzing the metadata information, it provides insights into where different species may be found.
- Clone the repository:
git clone https://github.com/FOI-Bioinformatics/MetaQuest.git
cd MetaQuest- Install the requirements:
pip install -r requirements.txt- Install MetaQuest:
python setup.py installFirst, visit https://branchwater.jgi.doe.gov/ to search and download containment files for your genomes of interest. Save these CSV files to a designated folder.
Process the downloaded files to prepare them for the MetaQuest pipeline:
metaquest use_branchwater --branchwater-folder /path/to/branchwater/files --matches-folder matchesbranchwater-folder: The directory where Branchwater CSV files are located.matches-folder: The directory where the processed files will be saved.
You can extract basic metadata directly from Branchwater CSV files without downloading from NCBI:
metaquest extract_branchwater_metadata --branchwater-folder /path/to/branchwater/files --metadata-folder metadataAfter processing the Branchwater files, you can summarize the results:
metaquest parse_containment --matches-folder matches --parsed-containment-file parsed_containment.txt --summary-containment-file summary_containment.txt --step-size 0.05 --file-format branchwaterExample output: summary.txt and containment.txt
For more comprehensive metadata, you can download it from NCBI:
metaquest download_metadata --matches-folder matches --metadata-folder metadata --threshold 0.95 --email [EMAIL]matches_folder: Directory containing match files.metadata_folder: Directory where the metadata files will be saved.threshold: Only consider matches with containment above this threshold.
Once the metadata is downloaded, you can parse it to generate a more concise and readable format:
metaquest parse_metadata --metadata-folder metadata --metadata-table-file parsed_metadata.txtExample output: parsed_metadata.txt
This step helps in understanding the distribution of metadata attributes:
metaquest check_metadata_attributes --file-path parsed_metadata.txt --output-file parsed_metadata_overview.txtExample output: parsed_metadata_overview.txt
This step helps in understanding the distribution of genomes across different datasets:
metaquest count_metadata --summary-file parsed_containment.txt --metadata-file parsed_metadata.txt --metadata-column Sample_Scientific_Name --threshold 0.95 --output-file genome_counts.txtExample output: genome_counts.txt
To analyze a single sample from the summary, you can use the single_sample command:
metaquest single_sample --summary-file parsed_containment.txt --metadata-file parsed_metadata.txt --summary-column GCF_000008985.1 --metadata-column Sample_Scientific_Name --threshold 0.95To download the raw SRA data for accessions that match your criteria:
metaquest download_sra --accessions-file accessions.txt --fastq-folder fastq --num-threads 8 --max-workers 4The accessions file should contain one SRA accession per line.
Plot the distribution of containment scores:
metaquest plot_containment --file-path parsed_containment.txt --column max_containment --plot-type rank --save-format png --threshold 0.05Available plot types: rank, histogram, box, violin
Visualize the distribution of metadata attributes:
metaquest plot_metadata_counts --file-path counts_Sample_Scientific_Name.txt --plot-type bar --save-format pngAvailable plot types: bar, pie, radar
For comprehensive documentation including advanced features and technical details, see the docs/ directory:
- Enhanced SRA Features - Advanced SRA downloading with technology detection and statistics
- Branchwater Workflow - Detailed workflow guide for branchwater functionality
- Architecture - Technical architecture and design decisions
We welcome contributions to metaquest! Whether you want to report a bug, suggest a feature, or contribute code, your input is valuable. Here's how to get started:
- Fork the Repository: Create your own fork of the
metaquestrepository. - Clone Your Fork: Clone your fork to your local machine and set the upstream repository.
- Create a New Branch: Make a new branch for your feature or bugfix.
- Make Your Changes: Implement your feature or fix the bug and commit your changes.
- Push to Your Fork: Push your changes to your fork on GitHub.
- Create a Pull Request: From your fork, open a new pull request in the
metaquestrepository.