You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I downloaded the raw MSA files you provided using the following commands:
wget https://boltz1.s3.us-east-2.amazonaws.com/rcsb_raw_msa.tar
tar -xf rcsb_raw_msa.tar
rm rcsb_raw_msa.tar
After extracting the archive, I noticed that some MSAs for certain sequences are missing, even though structural data for these sequences exists. (in the rcsb_processed_targets/structures/*.npz)
Upon checking, I found that approximately 16,130 sequences that present in the structure file but do not have corresponding raw msa data.
To illustrate this issue, I have identified some sequences that appear to be missing from the raw MSA dataset
I would like to know if this is expected behavior or if there was an issue with the dataset.
Could you please confirm whether these MSAs were intentionally excluded, or if there is an error in the dataset?
Thank you!
The text was updated successfully, but these errors were encountered:
Hi,
I downloaded the raw MSA files you provided using the following commands:
After extracting the archive, I noticed that some MSAs for certain sequences are missing, even though structural data for these sequences exists. (in the rcsb_processed_targets/structures/*.npz)
Upon checking, I found that approximately 16,130 sequences that present in the structure file but do not have corresponding raw msa data.
To illustrate this issue, I have identified some sequences that appear to be missing from the raw MSA dataset
I would like to know if this is expected behavior or if there was an issue with the dataset.
Could you please confirm whether these MSAs were intentionally excluded, or if there is an error in the dataset?
Thank you!
The text was updated successfully, but these errors were encountered: