Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local MSA results different from MSA server #665

Open
sangyeon-hits opened this issue Nov 25, 2024 · 4 comments
Open

Local MSA results different from MSA server #665

sangyeon-hits opened this issue Nov 25, 2024 · 4 comments

Comments

@sangyeon-hits
Copy link

sangyeon-hits commented Nov 25, 2024

Expected Behavior

I expect using colabfold_search with locally prepared DBs to give the same MSA results as those using colabfold_batch with the MSA server.

Current Behavior

The two give different MSA results given the same input .fasta.

Steps to Reproduce (for bugs)

  1. Install colabfold==1.5.5 via pip to a fresh new mamba environment (python==3.11.10).

  2. Build mmseqs2 of commit 71dd32ec43e3ac4dabf111bbc4b124f1c66a85f1 following ColabFold README.

  3. Execute the following to set up the DBs:

    MMSEQS_NO_INDEX=1 bash setup_databases.sh $colabfold_db_dir

    where I use the mmseqs2 built from step 2 for the tsv2exprofiledb commands.

  4. Prepare a sample .fasta file (say sample.fasta) of a single protein sequence.

  5. Get a locally generated MSA by:

    colabfold_search --mmseqs $mmseqs sample.fasta $colabfold_db_dir out_local
    # $mmseqs == mmseqs2 executable from step 2
    # Adding args `--db2 pdb100_230517` gave no change in the MSA outputs.
  6. Independently, get a MSA generated by querying the server like:

    colabfold_batch sample.fasta out_server --msa-only
  7. Compare the .a3m files generated from steps 5 and 6.

ColabFold Output (for bugs)

Omitted; I can attach outputs if necessary.

Context

I want to reproduce results from ColabFold notebooks on my local machine.

Your Environment

  • Git commit used: e2ca9e8
    where I used the ColabFold code only for executing setup_databases.sh. For colabfold_{search,batch} commands, I used v1.5.5 installed via pip.
  • Operating system and version: Red Hat Enterprise Linux 9.3 (Plow)
@sangyeon-hits
Copy link
Author

Related: #263

@sangyeon-hits
Copy link
Author

sangyeon-hits commented Nov 26, 2024

When I tried a short sequence input like

>A
MKTAYIAKQRQISFVKSHFSRQDILDLWIYHTQGYFP

the server MSA and local MSA are the same. But with a longer input:

>A
MVTPEGNVSLVDESLLVGVTDEDRAVRSAHQFYERLIGLWAPAVMEAAHELGVFAALAEAPADSGELARRLDCDARAMRVLLDALYAYDVIDRIHDTNGFRYLLSAEARECLLPGTLFSLVGKFMHDINVAWPAWRNLAEVVRHGARDTSGAESPNGIAQEDYESLVGGINFWAPPIVTTLSRKLRASGRSGDATASVLDVGCGTGLYSQLLLREFPRWTATGLDVERIATLANAQALRLGVEERFATRAGDFWRGGWGTGYDLVLFANIFHLQTPASAVRLMRHAAACLAPDGLVAVVDQIVDADREPKTPQDRFALLFAASMTNTGGGDAYTFQEYEEWFTAAGLQRIETLDTPMHRILLARRATEPSAVPEGQASENLYFQ

The server and local outputs differ.
In this case, ~60% of the sequences in the resulting MSAs coincide (ignoring the order in the MSA files and the numbers in header lines), whereas the remaining ~40% portions are different from each other.

@milot-mirdita
Copy link
Collaborator

Can you post the terminal output of the colabfold_search command please?

@sangyeon-hits
Copy link
Author

sangyeon-hits commented Nov 26, 2024

@milot-mirdita Thank you for the response. The following is the stdout I got.
msa_out_202305.log
(The timestamp just denotes the mmseqs commit date I used.)

I just tried --db2 pdb100_230517 with --use-templates but just got error termination because the file pdb100_230517_seq is missing from the DBs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants