-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add convenient Crepe.represent_sequences
method
#117
Conversation
@@ -449,25 +449,3 @@ def worker_optimize_branch_length(burrito_class, model, dataset, optimization_kw | |||
"""The worker used for parallel branch length optimization.""" | |||
burrito = burrito_class(None, dataset, copy.deepcopy(model)) | |||
return burrito.serial_find_optimal_branch_lengths(dataset, **optimization_kwargs) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function had to be moved to avoid circular dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
Q: If we can't supply single chains, does it make sense to require paired tuples? Ablang take lists of lists and I can get those with
one_pair = [train_df.iloc[0][['heavy', 'light']].tolist()]
Of course, I can manage with
def lists_to_tuples(list_of_lists):
return [tuple(lst) for lst in list_of_lists]
rep = crepe.represent_sequences(lists_to_tuples(one_pair))
No big deal but I thought I'd ask.
Also, it appears that the return type of represent_sequences is a tuple. Is that on purpose?
Thanks for noticing these things! I changed the check on sequence inputs to allow any non-str type of length two. Also, I changed the return type to list. |
Crepe.represent_sequences
method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works like a charm. Merge it!
This addresses #115, adding
Crepe.represent_sequences
, and a number of supporting methods on D*SM model classes.It also eliminates the option of providing non-paired sequences to D*SM model methods that take a string (and therefore also all Crepe methods that call them). These methods now require amino acid sequences to be provided in
(heavy_chain, light_chain)
tuples, where a missing chain sequence can be represented by the empty string.The represent_sequences function returns a tensor for each heavy-light pair provided to it, while
Crepe.__call__
returns a pair of tensors (one for heavy, one for light chain) for each heavy-light pair provided to it. This seems to me the correct choice, but there could be justification for splitting the embedding tensors returned by represent_sequences on heavy/light boundaries.