You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
specifically for N protein (but will happen in future for other targets too) the PDB is a homodimer. The fitness data (in JSON) is as monomer, so we need to duplicate the fitness data to make sure the alignment works correctly.
For the second point, I had a go (in IO) by making a copy of the fitness DF, then adjusting the residue index column in the copy to start counting from where the first fitness DF ended, and then concatenating both DFs together: this messes up the alignment spectacularly.
Perhaps a more elegant approach would be to align the whole phylogenetics JSON to the PDB, then pick the largest overlapping island? Would still have to deal with mini gaps (may just need to define some lag factor). This wouldn't solve the homodimer issue though - perhaps this could just be exposed in the CLI and then a duplication as above is tried?
The text was updated successfully, but these errors were encountered:
in cases like SARS-CoV-2 N protein we have to use phylogenetics data. Two issues:
For the second point, I had a go (in
IO
) by making a copy of the fitness DF, then adjusting the residue index column in the copy to start counting from where the first fitness DF ended, and then concatenating both DFs together: this messes up the alignment spectacularly.Perhaps a more elegant approach would be to align the whole phylogenetics JSON to the PDB, then pick the largest overlapping island? Would still have to deal with mini gaps (may just need to define some lag factor). This wouldn't solve the homodimer issue though - perhaps this could just be exposed in the CLI and then a duplication as above is tried?
The text was updated successfully, but these errors were encountered: