Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mVIRs #7

Open
valentynbez opened this issue Nov 5, 2024 · 0 comments
Open

mVIRs #7

valentynbez opened this issue Nov 5, 2024 · 0 comments
Assignees

Comments

@valentynbez
Copy link
Contributor

valentynbez commented Nov 5, 2024

Description

Running mVIRs on MAG collection and analyzing the results. Currently searching for viral orthologous groups (VOGs) in mVIRs results and connecting phages to all MAGs (if phage predicted by another tool, but never circularized/active in 70k samples it’s a false positive).

Responsible (Contributor)
Valentyn

05.11.2024

  • collected database of phages from different studies - link
  • benchmarked Vclust for small genome alignment and clustering, R = 1.0 with nucmer for global genome similarity >= 80%
  • clustered mVIRs viruses with other studies
  • clustering analysis - in progress

03.09.2024

  • frozen dataset /nfs/nas22/fs2202/biol_micro_sunagawa/Projects/EAN/PROPHAGE_HUMAN_EAN/data/processed/mvirs_db/clean_database
  • probably of no use, since Taylor has bigger database constructed

02.07.2024

  • Frozen phage dataset
  • Getting structures for the protein and retrieving new genes by structural alignment

02.04.2024

  • Working on features for annotation of things, and potentially search for MGs.
  • Making it work on long reads?

05.03.2024

  • Pipelines are coordinated with Taylor
  • mVIRs sequences classified using database searches and geNomad marker genes
    • 22% phages
    • 16% plasmids
    • 62% unknown
  • Phages annotated with PHROGs - phage database
  • Worse annotation coverage for plasmids and unknown
  • Annotation module for mVIRs
  • Search for structurally similar geNomad markers in unknown sequences
  • TM-Vec allows to identify structurally similar proteins with > 50% identity
  • Building a TM-Vec database for geNomad markers

06.02.2023

V started working on the mvirs set from 97k assemblies, 2.6mio mvirs positives contigs

21.11.2023

  • 70k samples processed on euler - 50k missing
  • Have to rerun last step to predict longer MGEs then 800k

15.08.2023

Refractoring mVIRs core, writing tests to be able to add novel functionalities, ie. analysis of direct terminal repeats

07.07.2023

Valentyn found the first induced phages in the CRC samples. We will need to think about a systematic approach how to lower computation.

09.05.2023

Preprocessing data and establishing methods on how to analyze it / handle it.

25.04.2023

  • Backmapping predicted sequences to MAGs (600k phages to 10M scaffolds)
  • Identify 10-mers present across samples
  • Align subsets with 10-mers in them
  • All turned to be computationally intractable

05.04.2023

  • Collecting data: human gut MAGs + curated metadata. Running mVIRs
  • Can identify bacteriaphages, unclear what the remaining 40% is

31.01.2023

Run mvirs on all (human) assemblies to get phages and circulome

01.11.2022

  • Created the project
  • Developing pipeline for Refseq genomes and running it on Alessio’s genomes.
@valentynbez valentynbez self-assigned this Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant