Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-identify bulk data as heavy or light chain #121

Merged
merged 3 commits into from
Mar 4, 2025
Merged

Conversation

willdumm
Copy link
Contributor

@willdumm willdumm commented Mar 1, 2025

Previously our only bulk data was heavy chain data, so we assumed that if the pcp file did not distinguish between heavy and light e.g. parent sequences, that we were talking about heavy chains. Now we have light chain data in the same format, and in theory we could have a pcp file with mixed heavy and light chain bulk data.

To handle this, we process pcp_dfs into a format that always contains heavy/light differentiated _h and _l columns, but we automatically infer the chain type for each pcp based on the v family name.
If the pcp file already has differentiated _h and _l columns, as with paired data, then we assume that no inference is necessary and only check for all necessary columns and make sure that all the heavy chain and light chain v families seem to be heavy or light, as claimed.

I also added a more informative error message for when masked parent-child nt pairs are identical, since I moved that filtering step to pre-processing in dnsm-experiments.

@willdumm willdumm marked this pull request as ready for review March 3, 2025 05:52
@willdumm willdumm requested a review from matsen March 3, 2025 05:52
@willdumm willdumm merged commit 25a3a56 into main Mar 4, 2025
2 checks passed
@willdumm willdumm deleted the wd-vanwinkle-data branch March 4, 2025 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants