Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do paired heavy-light modeling #57

Open
matsen opened this issue Sep 13, 2024 · 2 comments · May be fixed by #92
Open

Do paired heavy-light modeling #57

matsen opened this issue Sep 13, 2024 · 2 comments · May be fixed by #92

Comments

@matsen
Copy link
Contributor

matsen commented Sep 13, 2024

Apparently the way to do this is

combining the two variable region chains into a single input with a separator token.

See https://arxiv.org/abs/2403.17889

@willdumm
Copy link
Contributor

Some notes from chat with Erick today:

  • There are kappa light chains and lambda light chains. Sometimes igl means lambda and sometimes light. igk always means kappa.

  • let's use ^ as joining symbol on the AA side.

  • let's keep the correspondence between the codons and amino acid sequences, by adding an ambiguous codon in the nucleotide side. Let's do N^N or ^^^.

  • aa model has to see the wedge.

  • treat wedge as an aa character, in the sense that it will get an embedding. But, then we'll be inferring selection factors of it.

  • we'll have a mix of heavy chain sequences and paired sequences, so we'll want to add the wedge character at the end of all the heavy chain sequences too.

  • Think about designing things so we're not locked into having just one additional AA.

  • framework.load_pcp_df can read csv headers to see if there are heavy and light chains, join as we've talked about in a new function so that there are only parent and child sequences, then do what we're already doing.

  • modify everything downstream, especially stuff in sequences.py, such as index-encoding

  • we'll just train the embedding for this new character for now, even though that seems like a waste.

@matsen matsen linked a pull request Dec 10, 2024 that will close this issue
@matsen
Copy link
Contributor Author

matsen commented Dec 11, 2024

p-IgGen trains with normal and reversed sequences 🙃

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants