Skip to content

Conversation

satvshr
Copy link
Collaborator

@satvshr satvshr commented Sep 15, 2025

Closes #147

@satvshr
Copy link
Collaborator Author

satvshr commented Sep 17, 2025

Renamed the _pfoa_loader.py file to _pfoa.py for consistency.

Copy link
Contributor

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this function, at least ideas-wise, not do the same as pdb_to_struct then struct_to_aaseq? Recall that we separated the two intentionally, to separate file manipulation from in-memory objects which are abstract representations.

Instead of adding this new function which would link file manipulation back into the in-memory workflow, why not add features to the existing struct_to_aaseq?

@fkiraly fkiraly added the enhancement New feature or request label Sep 22, 2025
@satvshr
Copy link
Collaborator Author

satvshr commented Sep 22, 2025

Instead of adding this new function which would link file manipulation back into the in-memory workflow, why not add features to the existing struct_to_aaseq?

#147 answers your question, essentially 3d data is being captured when biopython converts something to a Structure object using ATOM records, sometimes a part of the sequence cna go missing if the 3d orientation of it is not known. This issue is not faced with SEQRES records, hence the direct converter.

@satvshr satvshr requested a review from fkiraly September 22, 2025 20:12
pdb_file_path : str or os.PathLike
Path to a PDB file.
return_df : bool, optional, default=False
If True, return a pandas.DataFrame with columns:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's try to be rst compatible where we can:

  • newlines before and after bullet point lists
  • double backticks around code like 'chain' (note that markdown uses single backtick, rst wants double backtick)

----------
pdb_file_path : str or os.PathLike
Path to a PDB file.
return_df : bool, optional, default=False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's call this return_type with two possible values "pd.df" and "list". This makes it more upwards compatible.

@fkiraly
Copy link
Contributor

fkiraly commented Sep 25, 2025

Instead of adding this new function which would link file manipulation back into the in-memory workflow, why not add features to the existing struct_to_aaseq?

#147 answers your question, essentially 3d data is being captured when biopython converts something to a Structure object using ATOM records, sometimes a part of the sequence cna go missing if the 3d orientation of it is not known. This issue is not faced with SEQRES records, hence the direct converter.

follow-up question: would it be possible to get the sequences from the Structure object instead of the pdb?

@satvshr
Copy link
Collaborator Author

satvshr commented Sep 25, 2025

would it be possible to get the sequences from the Structure object instead of the pdb?

I thought I mentioned this but I didnt, so apologies for that.

sometimes a part of the sequence cna go missing (in the Structure object) if the 3d orientation of it is not known.

So when the Structure object is formed, it also misses this 3d information, hence the complete sequence cannot be extracted from it, as the unknown 3 orientation leads to a lack of a complete sequence. If it didnt, I wouldnt have made this PR.

@fkiraly
Copy link
Contributor

fkiraly commented Sep 26, 2025

Ok, that is concerning. Are you saying the Structure objec tis not an 1:1 in-memory representation of the full pdb?

@satvshr
Copy link
Collaborator Author

satvshr commented Sep 26, 2025

Are you saying the Structure objec tis not an 1:1 in-memory representation of the full pdb?

Just to have it on record, yes

@satvshr
Copy link
Collaborator Author

satvshr commented Sep 26, 2025

Made the 2 changes requested above and renamed all files to private to prevent weird import errors as discussed in todays meeting.

@satvshr satvshr requested a review from fkiraly September 26, 2025 15:06
Copy link
Contributor

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should rethink what the internal structure is.

SeqIO seems to be loading pdb in a parsable form - so if it is non-lossy, we should add at least a loader that returns the python object (and not a string), see #160

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ENH] pdb to String loader using SEQREQ records
2 participants