-
Notifications
You must be signed in to change notification settings - Fork 2
[ENH] pdb to String loader using SEQREQ records #148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Renamed the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this function, at least ideas-wise, not do the same as pdb_to_struct
then struct_to_aaseq
? Recall that we separated the two intentionally, to separate file manipulation from in-memory objects which are abstract representations.
Instead of adding this new function which would link file manipulation back into the in-memory workflow, why not add features to the existing struct_to_aaseq
?
#147 answers your question, essentially 3d data is being captured when biopython converts something to a |
pyaptamer/utils/pdb_to_aaseq.py
Outdated
pdb_file_path : str or os.PathLike | ||
Path to a PDB file. | ||
return_df : bool, optional, default=False | ||
If True, return a pandas.DataFrame with columns: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's try to be rst compatible where we can:
- newlines before and after bullet point lists
- double backticks around code like
'chain'
(note that markdown uses single backtick, rst wants double backtick)
pyaptamer/utils/pdb_to_aaseq.py
Outdated
---------- | ||
pdb_file_path : str or os.PathLike | ||
Path to a PDB file. | ||
return_df : bool, optional, default=False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's call this return_type
with two possible values "pd.df"
and "list"
. This makes it more upwards compatible.
follow-up question: would it be possible to get the sequences from the |
I thought I mentioned this but I didnt, so apologies for that.
So when the Structure object is formed, it also misses this 3d information, hence the complete sequence cannot be extracted from it, as the unknown 3 orientation leads to a lack of a complete sequence. If it didnt, I wouldnt have made this PR. |
Ok, that is concerning. Are you saying the |
Just to have it on record, yes |
Made the 2 changes requested above and renamed all files to private to prevent weird import errors as discussed in todays meeting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should rethink what the internal structure is.
SeqIO
seems to be loading pdb
in a parsable form - so if it is non-lossy, we should add at least a loader that returns the python object (and not a string), see #160
Closes #147