Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add convenient Crepe.represent_sequences method #117

Merged
merged 7 commits into from
Feb 14, 2025
Merged

Conversation

willdumm
Copy link
Contributor

@willdumm willdumm commented Feb 13, 2025

This addresses #115, adding Crepe.represent_sequences, and a number of supporting methods on D*SM model classes.

It also eliminates the option of providing non-paired sequences to D*SM model methods that take a string (and therefore also all Crepe methods that call them). These methods now require amino acid sequences to be provided in (heavy_chain, light_chain) tuples, where a missing chain sequence can be represented by the empty string.

The represent_sequences function returns a tensor for each heavy-light pair provided to it, while Crepe.__call__ returns a pair of tensors (one for heavy, one for light chain) for each heavy-light pair provided to it. This seems to me the correct choice, but there could be justification for splitting the embedding tensors returned by represent_sequences on heavy/light boundaries.

@willdumm willdumm marked this pull request as draft February 13, 2025 23:54
@willdumm willdumm marked this pull request as ready for review February 14, 2025 00:14
@willdumm willdumm requested a review from matsen February 14, 2025 00:14
@@ -449,25 +449,3 @@ def worker_optimize_branch_length(burrito_class, model, dataset, optimization_kw
"""The worker used for parallel branch length optimization."""
burrito = burrito_class(None, dataset, copy.deepcopy(model))
return burrito.serial_find_optimal_branch_lengths(dataset, **optimization_kwargs)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function had to be moved to avoid circular dependencies.

@willdumm willdumm linked an issue Feb 14, 2025 that may be closed by this pull request
Copy link
Contributor

@matsen matsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

Q: If we can't supply single chains, does it make sense to require paired tuples? Ablang take lists of lists and I can get those with

one_pair = [train_df.iloc[0][['heavy', 'light']].tolist()]

Of course, I can manage with

def lists_to_tuples(list_of_lists):
    return [tuple(lst) for lst in list_of_lists]

rep = crepe.represent_sequences(lists_to_tuples(one_pair))

No big deal but I thought I'd ask.

Also, it appears that the return type of represent_sequences is a tuple. Is that on purpose?

@willdumm
Copy link
Contributor Author

willdumm commented Feb 14, 2025

Thanks for noticing these things! I changed the check on sequence inputs to allow any non-str type of length two. Also, I changed the return type to list.

@willdumm willdumm changed the title Add convenient Crepe.represent method Add convenient Crepe.represent_sequences method Feb 14, 2025
Copy link
Contributor

@matsen matsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works like a charm. Merge it!

@willdumm willdumm merged commit 954c28c into main Feb 14, 2025
2 checks passed
@willdumm willdumm deleted the 115-crepe-represent branch February 14, 2025 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Give the crepe a represent method
2 participants